Technology: Page (1) of 4 - 02/27/07 Email this story to a friend. email article Print this page (Article printing at MyDmn.com).print page facebook
Data Compression Basics - Part 2 Lossy Compression By Rui del-Negro

Foreword

This is the second in a series of three articles about data compression, with particular focus on media (images, audio and video). It does not require any knowledge of information theory, mathematics or programming. I have tried to use language accessible to anyone without sacrificing the correctness and thoroughness of the information provided.

When some issues are considered more complex (or less relevant), they will be placed inside a note with a yellow background. You don't need to read the yellow notes to understand the fundamental principles being described, but you might want to check them out anyway, especially if you noticed a potential problem with some special case of the processes described in the previous paragraph(s).

Binary digit (bit) will sometimes be abbreviated as "b", and bit octet (byte) will sometimes be abbreviated as "B". Those abbreviations might be prefixed by standard SI magnitude symbols like "k" (kilo), "M" (mega) or "G" (giga). So "kb" stands for "kilobit", "kB" stands for "kilobyte", "Mb" stands for "megabit", and so on. As a general rule, the factor used is 1024, not 1000 (i.e., 1 kB = 1024 bytes, 1 MB = 1024 kB, etc.).

Some words in this article may have a dotted red undeline. Those are sponsored links, and their contents is not directly related to this article.

 

Introduction

In part 1 of this article of this article we looked at two types of lossless data compression algorithms (RLE and dictionary encoders), as well as at some of the techniques that can be used to improve their performance (reordering of the source data, entropy coding, etc.).

In part 2 we're going to look at some of the techniques used in lossy compression (where the data obtained after decompression is not necessarily identical to the original), and how some of those techniques can be adapted to improve lossless compression.

NOTE

Most of the concepts and techniques described in this article are quite complex, and their underlying algorithms would take several pages of software code and / or mathematical formulas to explain fully. For this reason, I have opted for a more abstract approach, explaining how the technique works, rather than how it's implemented by specific programs or file formats. So if this article says "file format X uses this technique", that does not mean it uses only that technique, or that it does not refine that technique far beyond what is described here.

Data versus information

In the first article I mentioned that it was important to distinguish data from information. Although the term "information" can be used in a mathematical sense, in practical terms we can think of "data" as what the computer manipulates and "information" as what that data means to us. Completely different chunks of data can have virtually identical meanings, and identical chunks of data can mean different things depending on the software used to interpret or display them. For example:

$ 1,000,000.00

Gives us almost exactly the same information as:

One million dollars.

And this sequence of bytes:

[65][66][67]

Will render as:

  • "ABC" in a text file
  • A sequence of three dark gray pixels in an 8-bit grayscale image file
  • A single dark gray pixel with a slight blue cast in a 24-bit RGB file
  • ...etc.

Or rather, what matters isn't the file itself, it's the software used to load, interpret and display it. Generally, computer programs and operating systems will look at the file name (especially the extension) or the file headers (the data at the beginning of the file) to determine how it should be interpreted and displayed. That is why changing the file extension can make a file (apparently) impossible to open, although none of the data in it has actually been changed.

But this is an article about data (and information) compression, not cryptology, so what really matters to us is the first situation: how can we change the representation of a particular piece of information (an image, a song, etc.) so that it takes up as little space as possible?

Compression as description

We tend to think of compressed files as somehow containing the original data. Basic lossless compression techniques do indeed preserve part of the source data, changing only the representation of values or segments that are repeated. But, in practical terms, "compression" is a generic term for a change in the representation of information that takes up less space, and from which the original information can be recovered. This new representation might in fact not contain any of the original data.

For example, if our source data consisted of this:

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 ...etc. (up to 1000)

We could represent it instead as the following sentence:

A sequence of the first five hundred positive even numbers.

That might not be a highly optimized representation, but it is shorter than the original, it doesn't contain any of the original data in a literal form, and it's easy to "decompress" (assuming our "decompressor" understands the sentence, of course).

Basic RLE and LZ algorithms would be unable to compress the data in that example, because there is no redundancy. But if we create a compressor that looks for that kind of pattern (a linear progression of values) and stores its "description" using some optimized encoding, and if we create a decompressor that can recreate the original data from that description, we might be able to store that information in a much smaller file. For example, the sequence above could also be described as "a sequence of 500 numbers starting with 2 and increasing by 2". And this could be encoded (for example) like this:

IS [2] [500] [2]

Where data copied from the source is shown in blue, compression counters are shown in red and control codes are shown in magenta. In the first part of the article we used a simple marker to toggle between compressed and uncompressed blocks. But if we want our compressor to be able to deal with different types of data patterns (encoding each in a different way), we need to use multiple codes (which I will here identify with a group of letters but, in an actual compressed files, could generally be stored in a single byte.

Upon receiving the data above, our decoder would look at it and say "Aha, an 'IS' code. That means I need to read the next three integers, and then output a sequence starting with the first of those integers, with a total length equal to the second integer, and a step equal to the third integer." And it would recreate our original data exactly (in other words, this is still lossless compression).

This kind of technique, where the compressor looks for, interprets, and encodes patterns of data variation, and not just repeated data, has the potential to achieve much better compression than "simple" compressors (like RLE and LZ), that treat all data the same way. Most real-world LZ-style compressors use some of these techniques, in addition to their "main" compression algorithm.

In a way this concept is similar to one of the first optimizations we looked at (in the section about the RLE algorithm), where we changed the order of the RGB data, sorting it by channel, because we knew that the data in 8-bits-per-channel RGB files tends to follow certain patterns. Applied to other types of data (ex., audio or text files), that reordering wouldn't have had any benefits. Likewise, the example above (treating linear progressions of integers as a special case) is only useful in files where that kind of pattern occurs. In all other files, it will just make our encoder slower (because it's going to waste a lot of time looking for patterns that aren't there).

So, where do these kinds of smooth progression occur? You guessed it: in audio and video files.

But, while computer-generated files might have perfect progressions (ex., a linear gradient between two colors, or a perfect sine wave in an audio file), files created from "natural" sources, such as photographs, films, video or audio recordings, generally will not. This can be either because the source itself doesn't follow a "neat" rule such as a linear or exponential progression, or because noise was introduced during the acquisition process (ex., video grain).

This greatly reduces the efficiency of our data compression algorithms, and forces us to look more closely at the distinction between data and information.

How much information do you need?

The following image shows a sequence of 18 pixels (enlarged) with increasing brightness values:

Grayscale gradient
Figure 1. - An 18-pixel linear gradient (enlarged)

Using an 8-bit (0-255) scale, each pixel's brightness differs from the previous one by 15. In other words, their grayscale values are:

[0] [15] [30] [45] [60] [75] [90] [105] [120] [135] [150] [165] [180] [195] [210] [225] [240] [255]

This kind of progression could be compressed easily using the technique described above:

LCS [0] [18] [15]

Where "LCS" represents our code for "linear character sequences". Remember that "character" is a generic name for a digital value with 1 byte, like the ones used in an 8-bit grayscale image. We won't be discussing text compression in this article, so whenever you see a value described as a "character", that means simply that it's stored as a single byte.

This brings the size down from 18 bytes to just 4 (assuming we use byte values for the three parameters, and also a single byte to define the code).

Now, let's say we change the value of the 10th pixel from 135 to 136. Our encoding technique is now unable to describe the complete sequence, but we can still store our data in three "blocks":

LCS [0] [9] [15] S [1] [136] LCS [150] [8][15]

Where "S" represents our code for "simple" or "straightforward" data (followed by the length of the simple data block, which in this case is just one (byte), and then by the value itself, which in this case is 136. The beginning and end of the gradient are encoded as "linear character sequences", as above.

This (11 bytes) is still shorter than the original uncompressed data, but not nearly as good as when the progression was perfect. And if we "jiggle" a few more values slightly, getting this:

[0] [13] [31] [44] [60] [76] [92] [106] [120] [135] [149] [165] [181] [195] [210] [225] [239] [255]

Our encoding technique becomes completely useless... or does it?

Here is our original sequence of pixels:

Grayscale gradient
Figure 1b. - An 18-pixel linear gradient (enlarged)

And here is our "jiggled" sequence of pixels:

Irregular gradient
Figure 2. - An 18-pixel irregular gradient (enlarged)

Can you tell the difference? Let's put them right next to each other:

Two gradients
Figure 3. - Comparing the two gradients

Now can you tell the difference? If you have a good monitor and look very closely, you might indeed notice a difference between the two (especially on the 7th pixel). Would you still notice it if the pixels were shown in their actual size? Probably not, but, even if you did, that is not the real question. The real question is this: do you feel that the two pixel sequences (the perfectly linear one and the "jiggled" one) supply you with different information? If one was replaced with the other, in the middle of an image, do you think that would make any difference to your perception and interpretation of the image?

 

Page: 1 2 3 4 Next Page


Related Sites: Digital Producer ,   IBC News ,   BN - Broadcast Newsroom ,   Digital Post Production ,   Digital Pro Sound ,   Presentation Master ,   Oceania ,   IBN - IT Business Net
Related Newsletter: DMN Newsletter ,   Waveform Newsletter ,   Timeline Newsletter ,   Pixels Newsletter ,   Loud Newsletter ,   KNews Newsletter ,   Digital Media Net ,   DMNForums ,   IBN - IT Weekly Newsletter
To Comment on This Article, Click HERE

Most Recent Reader Comments:
  • Data Compression Basics - Part 2 by DMN Editorial at Feb. 28, 2007 9:07 pm gmt (Rec'd 5)

    Click Here To Read All Posts
    Must be Registered to Respond (Free Registration!!!, CLICK HERE)

  • DMO TEXT LINKS
    (Click here to place a textlink on this site)

    AV-HS450 16+ Input HD/SD Switcher w/ dual screen MultiViewer
    This 16+ input switcher is the best value in its class.
    Its exclusive dual screen MultiViewer lets you view up to 20 windows on just two displays.
    CLICK HERE!!!

    Vegas Pro 8 + Free Vegas Seminar Series
    Master classes for cutting-edge video production
    A $100 value free with purchase
    CLICK HERE!!!

    Keep your camera rolling with Sony Mobile Storage for XDCAM EX Camcorders.
    Offload SxS memory cards onto 240GB HDD removable cartridges and accelerate your workflow on the go.
    Click Here!!!


    hd camcorder
    VIXIA HF200 SD/SDHC HD Camcorder (15x Opt, 300x Dig, 2.7" LCD)
    By Canon, start from $ 519.00
    With 2 Reviews.
    VIXIA HF20 32GB Flash Drive HD Camcorder (15x Opt, 300x Dig, 2.7" LCD)
    By Canon, start from $ 569.00
    VIXIA HF S10 32GB Flash Drive HD Camcorder (10x Opt, 200x Dig, 2.7" LCD)
    By Canon, start from $ 789.00
    With 1 Reviews.
    Handycam HDR-XR500V 120GB Hard Drive HD Camcorder (12x Opt, 150x Dig, 3.2" LCD)
    By Sony, start from $ 799.00
    With 2 Reviews.
    VIXIA HF S100 SDHC Flash Memory HD Camcorder (10x Opt, 200x Dig, 2.7" LCD)
    By Canon, start from $ 715.00
    With 2 Reviews.
    UltraHD 8GB Flash Drive HD Camcorder - Black (2x Dig, 2" LCD)
    By Flip Video, start from $ 129.99
    With 1 Reviews.
    HOT THREADS on DMN Forums
    Content-type: text/html  Rss  Add to Google Reader or
Homepage    Add to My AOL  Add to Excite MIX  Subscribe in
NewsGator Online 
    Real-Time - what users are saying - Right Now!
        • Re: Reverse Mode in Vegas 9 • gspencer
    @ Copyright, 2010 Digital Media Online, All Rights Reserved