| Page (4) of 4 - 02/27/07 |
|
|
Where do we go from here?
Two other compression techniques deserve to be mentioned, although they are not as common as DCT-based algorithms.
First, there is fractal compression, which works by looking for similar shapes (at different scales and angles) within an image. Despite much promise and the excitement always caused by the word "fractal", the fact is fractal compression has failed to deliver. Although very efficient in theory, real-world fractal compressors tend to produce worse results than JPEG, except at extremely high compression levels (where JPEG images look terrible and fractal-compressed images look slightly less terrible). In addition to that, fractal compression is slow and requires a great deal of human intervention. It is sometimes called "the graduate student algorithm", because the only way to get good compression is to lock a graduate student in a room with a computer until he manages to tweak the fractal compressor for the image you are trying to encode. The poor real-world performance of fractal compressors might be due to the fact that the process is covered by several patents, which limits the number of people and companies allowed to use and improve this technology.
The second method worth mentioning is wavelet-based compression. This encodes a signal as a series of scaled and translated copies of a finite (or fast-decaying) waveform. This concept actually has some similarities with the notion of a fractal, but it is different from fractal compression, since it does not look for similarity in the image itself. It also has some similarities with DCT, but is a considerably more complex process, and therefore slower to compute. Wavelet compression is used by the JPEG-2000 format (file extension ".JP2" or ".J2C"), which was meant to replace the original JPEG format. However, most web browsers don't have built-in support for JPEG-2000, and the format does not have support for EXIF data (used by digital cameras to store information about each photo directly in the JPEG file), so it hasn't quite caught on. Typically, JPEG-2000 files are 20% to 50% smaller than JPEG images, for the same visual quality. They also tend to have less obvious compression artifacts at very high compression levels (they look blurry instead of blocky). Support for JPEG-2000 can be added to browsers and image editing applications by installing the appropriate plug-ins.
The following figure shows a comparison between JPEG and JPEG-2000 at different file sizes.
![]() |
| Figure 19. - JPEG vs. JPEG-2000 |
The JPEG encoder used was unable to produce a file smaller than 24 kB. The 12 kB JPEG-2000 file actually has slightly more detail than the 24 kB JPEG, and with far less offensive artifacts. Even at 6 kB (despite having less detail) the JPEG-2000 version has a more "natural" look. Also worth notig is that the 48 kB JPEG-2000 file, despite apearing to be "visually lossless" (i.e., no noticeable artifacts) does not contain all the detail of the original (compare the three stray hairs near the top of the ear). Still, for a compression ratio of about 20:1 (10:1 compared to PNG), I suppose we can live with some hair loss.
All the images above are reproduced at a 2:1 ratio (with no interpolation) to make the compression artifacts easier to see. The final image file was saved in PNG format (lossless), so that no other compression artifacts were introduced.
Lossless lossy compression
I mentioned that some of the techniques used in lossy compression can be also used to improve lossless compression. Naturally, I don't mean just those very lucky and very rare cases where the lossy compression algorithm manages to describe the source data perfectly.
A common way to evaluate the differences between a (lossily) compressed version of an image and its original is to subtract the pixel values of one from the other and display the result as a new image (known as the "delta"). In that image, darker areas indicate smaller differences, brighter areas indicate bigger differences. The following figure shows the difference between the original image and a JPEG-compressed version at a 25:1 ratio.
![]() |
| Figure 20. - Lossy compression + Delta = Original |
The contrast of the delta image has been increased and all images are enlarged to 200% to improve visibility. Also, the delta image shown here is unsigned, since computer monitors can't display negative numbers (the actual data would contain both positive and negative numbers). Notice how the differences are bigger near edges or areas with small detail (which correspond to higher frequencies), and smaller in low-contrast areas.
Let's use our example from the previous sections (the lossy "LCS" gradient), and see how the difference data is calculated and stored:
| Original data (11 bytes): | [2] [4] [8] [12] [14] [18] [19] [24] [25] [28] [30] |
| Compressed data (4 bytes): | LCS [2] [11] [3] |
| Uncompressed data (11 bytes): | [2] [5] [8] [11] [14] [17] [20] [23] [26] [29] [32] |
| Difference (11 bytes): | [0] [-1] [0] [1] [0] [1] [-1] [1] [-1] [-1] [-2] |
Using only the compressed data and the difference from the original (sometimes called "error data"), we could now rebuild the source data exactly (losslessly), simply by adding the two. Of course, as you probably noticed, adding the size of the compressed data and the difference data, we actually end up with 15 bytes, which is more than the original, so this isn't exactly a good idea. But you probably also noticed that the difference data doesn't vary much. In fact, here it varies only across 4 values (-2, -1, 0 and 1), and since our maximum error margin was defined as "3" during the compression, we know that the difference value is never going to be lower than -3 or higher than 3. In other words, the difference value will always be one of the following seven: -3, -2, -1, 0, 1, 2 or 3. These can be encoded using three bits, as follows:
011 = 3
010 = 2
001 = 1
000 = 0
101 = -1
110 = -2
111 = -3
This means we don't actually need 11 bytes to store the difference data for 11 pixels, we only need 33 bits, which is barely over 4 bytes. Let's say we store it in 5 bytes, so we don't have to deal with "hanging bits".
This means that, with the 4 bytes of compressed data plus 5 bytes of difference data (a total of 9 bytes), we can rebuild the 11 bytes of source data exactly. This might not seem all that impressive, but remember that this is data that did not follow any "neat" rule and did not have any repeated sequences, so RLE and LZ would not have been able to compress it at all. Also, remember that we are looking at an extremely small amount of data (just 11 pixels). In a real image (typically over 500 thousand pixels), using a more advanced algorithm, the lossy compression stage would be a lot more efficient.
On top of this, we could apply one or more lossless compression algorithms (like RLE, LZ or entropy coding) to the difference data, which would bring its size down further (since it can only have seven different values, repeated values and repeated patterns are very common, making it quite compressible).
Several lossless media file formats, such as PNG (image) and FLAC (audio) use a similar technique (prediction plus error coding) to achieve better compression than generic (redundancy-based) lossless algorithms. That is a special case of the lossy + delta approach, that works one byte (or one sample) at a time, instead of using lossy compression on the entire file and calculating the delta at the end.
| NOTE |
| If you're thinking that this sounds similar to the way ADPCM works, you are right. The difference is that ADPCM formats use a fixed (and reduced) number of bits to store the error, which can make the source data unrecoverable if the difference between the predicted value and the real value is greater than that number of bits can represent. To ensure that the original data can always be recovered, the difference must be stored with a number of bits that allow it to correct any error (this means that in extreme cases it must be as big as the sample itself). |
WavPack (a relatively unknown but very versatile open-source audio compression format developed by David Bryant) allows the user to save two separate files: one containing the lossily-compressed audio (similar to an MP3 file), and the other containing the difference information that can be applied to the former to rebuild the exact original data. This makes it possible, for example, to transmit several clips of lossily-compressed audio, listen to them, pick the ones that are going to be used, and then transmit the "correction" (difference) files only for those clips. Another possibility would be to keep one's music collection in this dual file format at home (for listening on a high-end sound system), copying only the lossily-compressed files to a portable music player, to save space.
Conclusion
And so we come to the end of the second part of this series. I hope you found it interesting and accessible.
I deliberately left out a lot of technical details, and used (fictitious) simplified examples to explain some concepts, because, as algorithms get more complex and as file formats start to use multiple algorithms, the amount of space required to describe them accurately grows exponentially. Unless you are a programmer, you probably don't need to worry about those details, but if you are curious, you'll be able to find a lot of information on the internet.
The next article will describe how some of these techniques can be applied to video and how other (video-specific) techniques are used by modern encoders. It will also include a list of links to web sites with more information about compression and media file formats.
Sound frequency sensitivity chart and DCT pattern table copied from public domain sources.
Article text & all other images © Rui del-Negro, 2007
Rui del-Negro works in Portugal as an animator, editor, designer, consultant and occasional pastry chef. After studying systems and software engineering in college, he got caught in a downward spiral of low-level programming and at one point was doing several hundred lines of assembly every day. He managed to recover by clinging on to his original digital love: graphics. Old habits die hard, though, and he still tends to launch into long-winded tirades about how programmers these days can't even code a decent alpha matte handler. You can contact him through his website (http://dvd-hq.info)Related Sites: Digital Producer , IBC News , BN - Broadcast Newsroom , Digital Post Production , Digital Pro Sound , Presentation Master , Oceania , IBN - IT Business Net
Related Newsletter: DMN Newsletter , Waveform Newsletter , Timeline Newsletter , Pixels Newsletter , Loud Newsletter , KNews Newsletter , Digital Media Net , DMNForums , IBN - IT Weekly Newsletter
To Comment on This Article, Click HERE
Most Recent Reader Comments:
Click Here To Read All Posts
Must be Registered to Respond (Free Registration!!!, CLICK HERE)



AV-HS450 16+ Input HD/SD Switcher w/ dual screen MultiViewer
Vegas Pro 8 + Free Vegas Seminar Series




