Compression Primer for PNG Review Version 1.2 10 Mar 96 Updated: Minor corrections to e-mail addresses and URLs Version 1.2.1 24 Jul 99 Version 1.2.2 03 Aug 00 Vincent Sabio, U.S. Army Research Laboratory (formerly) Return Path, Inc. (currently) Copyright 1996,1999,2000 Vincent Sabio -------------------------------------------------------- This document can be retrieved as follows: via FTP: via HTTP: There is also an accompanying review of the Portable Network Graphics (PNG) format: via FTP: via HTTP: -------------------------------------------------------- As promised, I have put together a "primer" on compression. Hopefully, it will be clear to most of you. It delves into some moderately technical areas, but I've tried to either keep the math simple or eliminate it altogether (wherever possible). Please note that this is intended to be a primer for the layman -- it is NOT a rigorous treatment of the technical aspects of compression! Therefore, please do not send me messages like, "that assumption can only be made if the noise process is statistically white, so the signal values are diagonalized in the matrix decomposition, and all the off- diagonal values are identically zero." First of all, you'll be speaking about a topic that isn't even discussed in this primer, plus you'll be boring me. I have tried to present a very high-level introduction to the topic, and have hopefully geared it to the non-technical reader. "Okay, so what's the purpose of this thing?" The purpose of this primer is to give you some background in compression theory, without delving into all the information theory required to prove the concepts. This background should enable you to comprehend some of the more technical aspects of the forthcoming PNG review. It will also allow me to present a straightforward review of the PNG spec--and provide JPEG & GIF comparisons--without having to digress into the topics covered here (i.e., I can simply assume that all of this information is known when I write the PNG review). "What if I don't understand some of it?" Don't panic. Most of the PNG review will probably be in nice, easy-to- understand terminology. I've packed most of the really technical information in here, which frees me to just discuss the specification when I get around to writing the review. "What if I don't understand ANY of it?" Well, then you'll probably understand less than half of the PNG review when it comes out. Not a cause for alarm--if none of it makes any sense to you, just press the DELETE key. :-) "I really don't care about all this technical stuff--can you just give me a sneak preview?" Sure. PNG is a remarkably well-developed spec that is targeted toward the WWW (although other applications are certainly not out of the question). It is a lossless compression scheme that incorporates gamma correction, an optional alpha channel with full-range transparency (unlike GIF's on/off transparency), two-dimensional interlacing (*very* nice, BTW), some very good (and very expandable) filtering schemes, and a public-domain compression engine that has been around for a while (and has thus proven itself). The scheme is designed to be flexible for individual purposes, provides for a limited form of forward compatibility (Very impressive, BTW. Years down the road, this will mean that an application using an "older format" PNG engine can *possibly* open and display a "newer format" PNG file--and will be able to evaluate for itself, on the fly, whether it will be able to do so correctly), and even provides for embedded text strings. How does that sound? "Why are you so long winded?" Beats me. It's certainly not because I've got the *time* to be long winded ... it just happens that way. "Okay, when can we get started on the compression primer? Right now. ------------------------------------------------------------ ... I assume that what you are asking is, "can we design a compression algorithm that can do some extra crunching and get the file extra small, so it transfers faster?" The answer is a resounding, unqualified "Well, yes and no -- maybe. Depends on the file (and the compression algorithm)." And herein lies the key to file compression: not all data are created equal. Unlike encryption, where one can simply process the heck out of the file, there are real limiting factors in file compression. All this is neatly bundled in some grad-level courses on information theory, but the lay version is thus (with apologies to the information theorists for the grossly oversimplified treatment): 1. Information content: No matter what, the resultant file must always contain enough information to decompress it. The decompression algorithm also contains information--and this will decrease the amount of information that must be contained within the compressed file to expand it. 2. Correlatedness of data: If the data in the file are highly "correlated," then it can *potentially* be highly compressed. As it becomes less correlated, you begin to lose compressibility. And the "correlatedness" of the data in a file is partly a function of the compression algorithm being used. Take an extreme example: Suppose we have a compression program that uses what we'll call a "contiguous-string" compression algorithm--i.e., it looks for contiguous strings of "1"s and "0"s. Next, suppose we have a 100-byte file containing all zeros; we'll call this "Case A": Case A: 0000000000000000 ... Our contiguous-string compression algorithm can compress this file to a single command instructing the expander to insert 100 zeros in place of the command. Clearly, we have achieved very high compression in this example. Now, if we insert a "1" somewhere around the middle of the file, then we have to break the single command into three commands--one that states "insert 42 zeros" (e.g.), one that states "insert a one," and finally one that says "insert 57 zeros"--where 42 + 57 + 1 = 100. Inserting the "1" has decorrelated the data (however slightly), and results in a larger compressed file. Continuing in this manner (i.e., adding 1's to the file) will generally continue to increase the resultant compressed-file size, until we reach the worst-case situation--when the file to be compressed contains only alternating 1s and 0s; we'll call this "Case B": Case B: 1010101010101010 ... Now, using our contiguous-string algorithm, the compressed file will be larger than the original. (Insert "!" here.) Suppose, instead, that we select an alternate compression algorithm--one that looks for *alternating* 1s and 0s. Now, this "alternating string" algorithm will compress the "Case B" file to a single command that instructs the expander to insert 50 occurrences of "10" (not "ten," but "one-zero"). We have once again achieved very high compression--in a situation where the previous compression algorithm resulted in a larger file than the original. But if we were to apply the alternating-string compression algorithm to the "Case A" file, it would perform quite poorly (in fact, the "compressed" file would be larger than the original--the same result we had for the "Case B" file with the contiguous-string compressor). Basically, we have seen a simple example of why different compression algorithms are best suited to specific data sets--i.e., "correlatedness" is defined in terms of the data that is in the file AND the algorithm being used to compress the data. 3. Linear decompositions and lossy compression: A "lossy" compression scheme--such as JPEG--attempts to trade image quality for disk space (or communications channel bandwidth). One approach is to try to determine, in a sense, what are the "major" contributors to the image and what are the "minor" contributors. Then the minor contributors can presumably be removed without substantially affecting the quality of the image. Well, that sounds nice, but how do we determine what qualifies as a "major contributor" or a "minor contributor"? This gets a little technical if you're not familiar with Fourier series and power spectral densities, but I'll give it a try ... We're all familiar with sound, and what is a high-frequency sound (in lay terms, a "high-pitched" sound) or a low- frequency ("low-pitched") sound. Music is generally a compilation of many different frequencies over the range of human hearing (20Hz to 20KHz). A graphic equalizer sorts these frequency components into different frequency "bins"-- where each bin represents a range of frequencies (say, 20-100Hz, 100-500Hz, 500-2500Hz, 2500-10000Hz, 10KHz-20KHz). In signal processing, we refer to the "full range"--in this case, 20Hz to 20KHz--as a "band" or a "full band," and the divisions are referred to as "subbands." Now, if we had a spectrum analyzer (this is becoming one really nice stereo system ...), we could look at the display and see the amount of "power" (being delivered to the speakers) within each subband. Suppose, as the music plays, we notice there is a specific subband that stays at a very low power level throughout the song. Theoretically, we could eliminate this subband from the music altogether, without suffering any substantial degradation in the sound quality--possibly, the difference would not even be noticeable. (There are audiophiles wincing right now.) This is one of the methods used by compressors such as JPEG. A Fourier transform (on which JPEG is based) (for those of you who are about to say, "No, it's based on a DCT!", note that the discrete cosine transform is a modification of the Fourier transform) [start over] A Fourier transform is a "mathematical" spectrum analyzer, and splits the signal-- in this case, an image--into many different subbands. Now, the operator (that's you) is generally asked how much quality he's willing to sacrifice for compression, and this sets an adaptive threshold in the compression algorithm--any subbands with more power than the threshold are contributing substantial power to the image, and are kept. Anything below the threshold is eliminated. There are several shortcomings of the Fourier transform (FT) family (to include the DCT) where their application to compression is concerned; we'll briefly discuss two of them here. Note that these shortcomings are inherent in JPEG compression, which relies upon these transforms. Note, also, that I am referring to the short-time Fourier transform (STFT) and its derivatives; the first shortcoming does not hold for the infinite Fourier transform--but then, I've never seen an infinitely large image, either. :-) And, finally, a technical note: the thing that we generally refer to as the Fourier transform (FT) is really the Fourier series (FS); I will use the more common "Fourier transform" nomenclature. i. The STFT and DCT are not orthogonal decompositions, even though many people (including many engineers!) believe that they are. So, what does this mean to you? Well, "orthogonal" = "lossless"--so, even the "lossless JPEG" isn't truly lossless! It will, in fact, introduce artifacts into the image. The severity of the artifacts is generally minimal, and is a function of the image bandwidth (i.e., the size of the "full band"). For those of you who are familiar with the FT and the FFT, note that the FFT is a time-localized FT, where the localization has been accomplished by square- windowing the FT basis functions. This windowing process leads to the introduction of Gibbs-phenomenon noise (recall that a finite number of sine waves cannot accurately reproduce a square wave). In any event, this is really a technicality--the noise introduced is typically very very small, and can be below the threshold at which it can even begin toggling bits--so "lossless JPEG" can still be considered lossless, for all intents and purposes. ii. The FT and derivative transforms are not "constant Q" across subbands--i.e., the ratio of their bandwidths to their center frequencies is not constant, but it changes from subband to subband. In particular, the *bandwidth* of each subband is fixed across all subbands for a particular transform. The problem this introduces is one of measuring the power spectral density (PSD) accurately for the purposes of compression--namely, the higher subbands (smaller Q) are relatively "narrow," and will thus have artificially smaller PSD values; vice versa for the lower subbands. When analyzing the PSD for compression purposes, it's relatively easy to impose a non-linear threshold-weighting function, but this is still not ideal. Note that both shortcomings discussed can be overcome with the wavelet transform. Just figured I'd mention that; I won't digress any further by launching into a discussion of the WT. 4. Dynamic-range: "Dynamic range" is commonly referred to as "bit depth." Simply put, a greyscale file in which each pixel is represented by 24 bits has substantially greater dynamic range than a greyscale file in which each pixel is represented by 8 bits. 4a. Excess dynamic range Suppose we have pixels that are represented as 3-byte unsigned integer data--i.e., 24 bits per pixel. And let's say that the image pixels only "use" 16 out of each of those 24 bits (for now, we'll take the simplest case, and assume that the high-order 8 bits are always zero). Then, we can eliminate 8 bits from each 24-bit pixel--resulting in an immediate compression of 33%. 4b. Frequency-domain filtering Now, suppose the PSD of the image pixels looks like this (view with a mono-spaced font, like Courier): | | ** | * * | * * | * * | * * | * * | * * | * * | * * --------------------------|----------------------------|--- f1 f1 --- 2 ... where f1 is the bandwidth of the image, and f1/2 is the half-band point. Given such a distribution (which is not uncommon in imagery, among other applications), we can half-band filter the image data, effectively splitting the information into its high-band and low-band components. Note, now, that the upper half of the band has much lower overall power- -and thus requires less dynamic range to represent it--than the lower half. So, we can compress the data in the upper half of the band into smaller words (smaller, that is, than the word size required to represent the full-band data), in a manner similar to that discussed in 4a. This can also result in substantial savings, with no loss of information content. 4c. Time-domain filtering (Most of the claims made in this section should be preceded by the caveat, "Statistically speaking ....") Because any given pixel is often well correlated with its neighbors, we can use the neighboring pixel values to "predict" the value of the pixel in question. If we simply subtract the predicted value from the actual value, we decorrelate the pixel from its neighbors. But because the correlation was initially quite high, the "residue" is generally pretty small. For example, suppose the pixel values in a small neighborhood of 8-bit greyscale data are as follows: 100 120 110 130 105 100 120 100 130 Suppose, now, that we wish to "estimate" the center pixel by simply averaging the surrounding pixels and rounding to the nearest integer: predicted = average(100+120+110+130+100+120+100+130) = 114 Subtracting the predicted value from the actual value: filtered = 114 - 105 = 9 This is just one example, but suppose that the data in the image is correlated enough that this scheme yields a maximum filtered pixel value of (say) 31. Then every filtered pixel in the image can be represented by 5 bits (5 = log2 (32)) instead of 8 bits--again, a substantial savings. If our assumption of correlated data holds over the entire image, we can reduce the dynamic range of the majority of the image (say, all but the first scan line in the image, or all but the first and last lines, or all but the outermost pixels, etc.) by this technique. ------------------------------------------------------------ This has been a very brief discussion of some of the basic concepts involved in compression. Certainly, those of you who are very familiar with compression would have a lot to add to this--and a lot of qualifications to make--that have been left out of the text in the course of this treatment. My goal here was to get the concepts across, not provide a rigorous treatment of the subject matter--which would surely have lost the majority of the audience. For the rest of you, I hope this was reasonably clear. I will follow up sometime soon with the review of the PNG spec*. Meanwhile, if you have any questions about the content of this primer, please feel free to drop me a note. And, if you would, please let me know if this was useful and (or?) informative to you, or if it was too technical (or too elementary), etc. Thanks! - Vince Sabio vince-png@vjs.org *UPDATE 24 Jul 99: The PNG Review can be found here: via FTP: via HTTP: