Compression Primer for PNG Review
Version 1.2
10 Mar 96

Updated: Minor corrections to e-mail addresses and URLs
Version 1.2.1
24 Jul 99

Version 1.2.2
03 Aug 00

Vincent Sabio,
U.S. Army Research Laboratory (formerly)
Return Path, Inc. (currently)
Copyright 1996,1999,2000 Vincent Sabio

--------------------------------------------------------
This document can be retrieved as follows:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/compression-primer.txt>
 via HTTP: <http://www.smartbounce.com/png/compression-primer.txt>

There is also an accompanying review of the Portable Network Graphics
(PNG) format:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/png-review.txt>
 via HTTP: <http://www.smartbounce.com/png/png-review.txt>
--------------------------------------------------------

As promised, I have put together a "primer" on compression. Hopefully, 
it will be clear to most of you. It delves into some moderately 
technical areas, but I've tried to either keep the math simple or 
eliminate it altogether (wherever possible).

Please note that this is intended to be a primer for the layman -- it is 
NOT a rigorous treatment of the technical aspects of compression! 
Therefore, please do not send me messages like, "that assumption can 
only be made if the noise process is statistically white, so the signal 
values are diagonalized in the matrix decomposition, and all the off-
diagonal values are identically zero." First of all, you'll be speaking 
about a topic that isn't even discussed in this primer, plus you'll be 
boring me. I have tried to present a very high-level introduction to the 
topic, and have hopefully geared it to the non-technical reader. 

"Okay, so what's the purpose of this thing?" 

The purpose of this primer is to give you some background in compression 
theory, without delving into all the information theory required to 
prove the concepts. This background should enable you to comprehend some 
of the more technical aspects of the forthcoming PNG review. It will 
also allow me to present a straightforward review of the PNG spec--and 
provide JPEG & GIF comparisons--without having to digress into the 
topics covered here (i.e., I can simply assume that all of this 
information is known when I write the PNG review).

"What if I don't understand some of it?" 

Don't panic. Most of the PNG review will probably be in nice, easy-to-
understand terminology. I've packed most of the really technical 
information in here, which frees me to just discuss the specification 
when I get around to writing the review.

"What if I don't understand ANY of it?"

Well, then you'll probably understand less than half of the PNG review 
when it comes out. Not a cause for alarm--if none of it makes any sense 
to you, just press the DELETE key. :-) 

"I really don't care about all this technical stuff--can you just give 
me a sneak preview?"

Sure. PNG is a remarkably well-developed spec that is targeted toward 
the WWW (although other applications are certainly not out of the 
question). It is a lossless compression scheme that incorporates gamma 
correction, an optional alpha channel with full-range transparency 
(unlike GIF's on/off transparency), two-dimensional interlacing (*very* 
nice, BTW), some very good (and very expandable) filtering schemes, and 
a public-domain compression engine that has been around for a while (and 
has thus proven itself). The scheme is designed to be flexible for 
individual purposes, provides for a limited form of forward 
compatibility (Very impressive, BTW. Years down the road, this will mean 
that an application using an "older format" PNG engine can *possibly* 
open and display a "newer format" PNG file--and will be able to evaluate 
for itself, on the fly, whether it will be able to do so correctly), and 
even provides for embedded text strings. How does that sound? 

"Why are you so long winded?"

Beats me. It's certainly not because I've got the *time* to be long 
winded ... it just happens that way. 

"Okay, when can we get started on the compression primer? 

Right now.

------------------------------------------------------------ 

... I assume that what you are asking is, "can we design a compression 
algorithm that can do some extra crunching and get the file extra small, 
so it transfers faster?" 

The answer is a resounding, unqualified "Well, yes and no -- maybe. 
Depends on the file (and the compression algorithm)." And herein lies 
the key to file compression: not all data are created equal. Unlike 
encryption, where one can simply process the heck out of the file, there 
are real limiting factors in file compression. All this is neatly 
bundled in some grad-level courses on information theory, but the lay 
version is thus (with apologies to the information theorists for the 
grossly oversimplified treatment): 

1. Information content:

No matter what, the resultant file must always contain enough 
information to decompress it. The decompression algorithm also contains 
information--and this will decrease the amount of information that must 
be contained within the compressed file to expand it.

2. Correlatedness of data:

If the data in the file are highly "correlated," then it can 
*potentially* be highly compressed. As it becomes less correlated, you 
begin to lose compressibility. And the "correlatedness" of the data in a 
file is partly a function of the compression algorithm being used. 

Take an extreme example:

Suppose we have a compression program that uses what we'll call a 
"contiguous-string" compression algorithm--i.e., it looks for contiguous 
strings of "1"s and "0"s. 

Next, suppose we have a 100-byte file containing all zeros; we'll call 
this "Case A":

Case A: 0000000000000000 ...

Our contiguous-string compression algorithm can compress this file to a 
single command instructing the expander to insert 100 zeros in place of 
the command. Clearly, we have achieved very high compression in this 
example. 

Now, if we insert a "1" somewhere around the middle of the file, then we 
have to break the single command into three commands--one that states 
"insert 42 zeros" (e.g.), one that states "insert a one," and finally 
one that says "insert 57 zeros"--where 42 + 57 + 1 = 100. Inserting the 
"1" has decorrelated the data (however slightly), and results in a 
larger compressed file.

Continuing in this manner (i.e., adding 1's to the file) will generally 
continue to increase the resultant compressed-file size, until we reach 
the worst-case situation--when the file to be compressed contains only 
alternating 1s and 0s; we'll call this "Case B": 

Case B: 1010101010101010 ...

Now, using our contiguous-string algorithm, the compressed file will be 
larger than the original. (Insert "!" here.) 

Suppose, instead, that we select an alternate compression algorithm--one 
that looks for *alternating* 1s and 0s. Now, this "alternating string" 
algorithm will compress the "Case B" file to a single command that 
instructs the expander to insert 50 occurrences of "10" (not "ten," but 
"one-zero"). We have once again achieved very high compression--in a 
situation where the previous compression algorithm resulted in a larger 
file than the original. But if we were to apply the alternating-string 
compression algorithm to the "Case A" file, it would perform quite 
poorly (in fact, the "compressed" file would be larger than the 
original--the same result we had for the "Case B" file with the 
contiguous-string compressor).

Basically, we have seen a simple example of why different compression 
algorithms are best suited to specific data sets--i.e., "correlatedness" 
is defined in terms of the data that is in the file AND the algorithm 
being used to compress the data.

3. Linear decompositions and lossy compression: 

A "lossy" compression scheme--such as JPEG--attempts to trade image 
quality for disk space (or communications channel bandwidth). One 
approach is to try to determine, in a sense, what are the "major" 
contributors to the image and what are the "minor" contributors. Then 
the minor contributors can presumably be removed without substantially 
affecting the quality of the image.

Well, that sounds nice, but how do we determine what qualifies as a 
"major contributor" or a "minor contributor"? This gets a little 
technical if you're not familiar with Fourier series and power spectral 
densities, but I'll give it a try ...

We're all familiar with sound, and what is a high-frequency sound (in 
lay terms, a "high-pitched" sound) or a low- frequency ("low-pitched") 
sound. Music is generally a compilation of many different frequencies 
over the range of human hearing (20Hz to 20KHz). A graphic equalizer 
sorts these frequency components into different frequency "bins"-- where 
each bin represents a range of frequencies (say, 20-100Hz, 100-500Hz, 
500-2500Hz, 2500-10000Hz, 10KHz-20KHz). In signal processing, we refer 
to the "full range"--in this case, 20Hz to 20KHz--as a "band" or a "full 
band," and the divisions are referred to as "subbands." Now, if we had a 
spectrum analyzer (this is becoming one really nice stereo system ...), 
we could look at the display and see the amount of "power" (being 
delivered to the speakers) within each subband. Suppose, as the music 
plays, we notice there is a specific subband that stays at a very low 
power level throughout the song. Theoretically, we could eliminate this 
subband from the music altogether, without suffering any substantial 
degradation in the sound quality--possibly, the difference would not 
even be noticeable. (There are audiophiles wincing right now.)

This is one of the methods used by compressors such as JPEG. A Fourier 
transform (on which JPEG is based) (for those of you who are about to 
say, "No, it's based on a DCT!", note that the discrete cosine transform 
is a modification of the Fourier transform) [start over] A Fourier 
transform is a "mathematical" spectrum analyzer, and splits the signal--
in this case, an image--into many different subbands. Now, the operator 
(that's you) is generally asked how much quality he's willing to 
sacrifice for compression, and this sets an adaptive threshold in the 
compression algorithm--any subbands with more power than the threshold 
are contributing substantial power to the image, and are kept. Anything 
below the threshold is eliminated.

There are several shortcomings of the Fourier transform (FT) family (to 
include the DCT) where their application to compression is concerned; 
we'll briefly discuss two of them here. Note that these shortcomings are 
inherent in JPEG compression, which relies upon these transforms. Note, 
also, that I am referring to the short-time Fourier transform (STFT) and 
its derivatives; the first shortcoming does not hold for the infinite 
Fourier transform--but then, I've never seen an infinitely large image, 
either. :-) And, finally, a technical note: the thing that we generally 
refer to as the Fourier transform (FT) is really the Fourier series 
(FS); I will use the more common "Fourier transform" nomenclature.

i. The STFT and DCT are not orthogonal decompositions, even though many 
people (including many engineers!) believe that they are. So, what does 
this mean to you? Well, "orthogonal" = "lossless"--so, even the 
"lossless JPEG" isn't truly lossless! It will, in fact, introduce 
artifacts into the image. The severity of the artifacts is generally 
minimal, and is a function of the image bandwidth (i.e., the size of the 
"full band"). For those of you who are familiar with the FT and the FFT, 
note that the FFT is a time-localized FT, where the localization has 
been accomplished by square- windowing the FT basis functions. This 
windowing process leads to the introduction of Gibbs-phenomenon noise 
(recall that a finite number of sine waves cannot accurately reproduce a 
square wave). In any event, this is really a technicality--the noise 
introduced is typically very very small, and can be below the threshold 
at which it can even begin toggling bits--so "lossless JPEG" can still 
be considered lossless, for all intents and purposes. 

ii. The FT and derivative transforms are not "constant Q" across 
subbands--i.e., the ratio of their bandwidths to their center 
frequencies is not constant, but it changes from subband to subband. In 
particular, the *bandwidth* of each subband is fixed across all subbands 
for a particular transform. The problem this introduces is one of 
measuring the power spectral density (PSD) accurately for the purposes 
of compression--namely, the higher subbands (smaller Q) are relatively 
"narrow," and will thus have artificially smaller PSD values; vice versa 
for the lower subbands. When analyzing the PSD for compression purposes, 
it's relatively easy to impose a non-linear threshold-weighting 
function, but this is still not ideal.

Note that both shortcomings discussed can be overcome with the wavelet 
transform. Just figured I'd mention that; I won't digress any further by 
launching into a discussion of the WT.

4. Dynamic-range:

"Dynamic range" is commonly referred to as "bit depth." Simply put, a 
greyscale file in which each pixel is represented by 24 bits has 
substantially greater dynamic range than a greyscale file in which each 
pixel is represented by 8 bits.

4a. Excess dynamic range

Suppose we have pixels that are represented as 3-byte unsigned 
integer data--i.e., 24 bits per pixel. And let's say that the image 
pixels only "use" 16 out of each of those 24 bits (for now, we'll take 
the simplest case, and assume that the high-order 8 bits are always 
zero). Then, we can eliminate 8 bits from each 24-bit pixel--resulting 
in an immediate compression of 33%.

4b. Frequency-domain filtering

Now, suppose the PSD of the image pixels looks like this (view with a
mono-spaced font, like Courier):


   |
   |          **
   |        *     *
   |       *          *
   |      *                *
   |     *                       *
   |    *                              *
   |   *                                     *
   |  *                                            *
   | *                                                   *
   --------------------------|----------------------------|---
                             f1                           f1
                             ---
                              2

... where f1 is the bandwidth of the image, and f1/2 is the half-band point. 
Given such a distribution (which is not uncommon in imagery, among other 
applications), we can half-band filter the image data, effectively 
splitting the information into its high-band and low-band components. 
Note, now, that the upper half of the band has much lower overall power-
-and thus requires less dynamic range to represent it--than the lower 
half. So, we can compress the data in the upper half of the band into 
smaller words (smaller, that is, than the word size required to 
represent the full-band data), in a manner similar to that discussed in 
4a. This can also result in substantial savings, with no loss of 
information content.

4c. Time-domain filtering

(Most of the claims made in this section should be preceded by the 
caveat, "Statistically speaking ....") 

Because any given pixel is often well correlated with its neighbors, we 
can use the neighboring pixel values to "predict" the value of the pixel 
in question. If we simply subtract the predicted value from the actual 
value, we decorrelate the pixel from its neighbors. But because the 
correlation was initially quite high, the "residue" is generally pretty 
small. For example, suppose the pixel values in a small neighborhood of 
8-bit greyscale data are as follows:

100 120 110

130 105 100

120 100 130

Suppose, now, that we wish to "estimate" the center pixel by simply 
averaging the surrounding pixels and rounding to the nearest integer:

predicted = average(100+120+110+130+100+120+100+130) = 114 

Subtracting the predicted value from the actual value: 

filtered = 114 - 105 = 9

This is just one example, but suppose that the data in the image is 
correlated enough that this scheme yields a maximum filtered pixel value 
of (say) 31. Then every filtered pixel in the image can be represented 
by 5 bits (5 = log2 (32)) instead of 8 bits--again, a substantial savings. 

If our assumption of correlated data holds over the entire image, we can 
reduce the dynamic range of the majority of the image (say, all but the 
first scan line in the image, or all but the first and last lines, or  all but the 
outermost pixels, etc.) by this technique.

------------------------------------------------------------ 

This has been a very brief discussion of some of the basic concepts 
involved in compression. Certainly, those of you who are very familiar 
with compression would have a lot to add to this--and a lot of 
qualifications to make--that have been left out of the text in the 
course of this treatment. My goal here was to get the concepts across, 
not provide a rigorous treatment of the subject matter--which would 
surely have lost the majority of the audience.

For the rest of you, I hope this was reasonably clear. I will follow up 
sometime soon with the review of the PNG spec*. Meanwhile, if you have 
any questions about the content of this primer, please feel free to drop 
me a note.

And, if you would, please let me know if this was useful and (or?) 
informative to you, or if it was too technical (or too elementary), etc. 
Thanks!

- Vince Sabio
  vince-png@vjs.org


*UPDATE 24 Jul 99: The PNG Review can be found here:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/png-review.txt>
 via HTTP: <http://www.smartbounce.com/png/png-review.txt>

</vs>