There are three kinds of JPEG artifacts. The first and (arguably) most important one is described in Yuval's answer: To a first order the problem is that under high compression all the high frequency information is discarded, and the lowest frequency information remaining is the average color of each 8x8 square. When you take the inverse but leave out the high frequency information, the result is a square of a single color. A decoder can't perform any low-pass filtering to remove the blocky effect because that would cause significant error in the case that you used a high-quality setting to encode.
The second artifact is exactly the Gibbs phenomenon. Fine detail in the image will get "halos" if highly compressed. For example, here is a 64x64 image with a single dot at pixel 35,35:

Here's the same image saved by GIMP as a Jpeg with low quality (25%):

There is a halo around the pixel in the 8x8 block the pixel is in. (To make it easier to see, here is an image of that block scaled up by a factor of 8:

This effect is called ringing. (In audio processing it really produces a ringing noise.) This occurs because high levels of compression are chopping out the highest frequencies, which leaves ripples at the highest remaining frequency. (I.e., it's the Gibbs phenomenon.)
The third artifact is the inverse of the Gibbs phenomenon. The Gibbs phenom is because the multiplication of the brick-wall filter in the frequency domain is equivalent to convolution with the sinc function in the time domain. The inverse problem is using a box-car filter for downsampling in the time domain leads to a sinc function in the frequency domain, and thus a large amount of high-frequency noise gets aliased into the resulting downsampled image. This is essentially what JPEG is doing when it breaks the image into 8x8 blocks. The lowest-frequency component of each 8x8 block is the average (box-car) of the pixel values in that 8x8 block, so actually contains significant high-frequency aliasing. For example, consider the following image. (It is 64x64 with a horizontal stripe each third pixel. So has significant amount of high frequency information.

Now if I encode with jpeg at very low quality I get this:

The high frequency information from the original image is not being completely filtered by the box-car averaging, and thus gets aliased into a lower-frequency error in the image.