| > To Chapter 3
Compression In fact, it's too many bits for most purposes. While it's not too wasteful if you want an hour of high quality sound on a CD, it is kind of unwieldy if we need to download or send it over the internet, or store a bunch of it on our home hard drive. In other words, if we're ripping off commercial music illegally and unethically from the Internet, we want it to be at least convenient. Even though high quality sound data isn't anywhere near as large as image or video data, it's still too big. What can we do to reduce the data explosion? The goal is to store the most information in the smallest amount of space, without compromising the quality of the signal (or at worst, compromising it as little as possible). Compression techniques and research are not just limited to digital sound data compression plays an essential part in the storage and transmission of all types of digital information, from word processing documents to digital photographs to full screen, full motion videos. As the amount of information in a medium increases (it takes much more information to represent a second of digital video than it does to represent a page of a script), so does the importance of data compression. For example, were composing great new music for an interactive game, and the hoggish game designer has only left us a little bit of room to try fit all our sounds! We nd 2 cmprs! Three Types of Data Compression There are a number of classic approaches to data compression. The first, and most straightforward, is to try and figure out whats redundant in a signal, leave it out, and put it back in when needed later. Something that is redundant could be as simple as something we already know. For example, if we store the following messages:
It's pretty clear that leaving out the vowels makes the phrases shorter, unambiguous, and easy to reconstruct. Other phrases may not be as clear, and may need a vowel or two. However, clarity of the intended message occurs only because, in these particular messages, we already know what it says, and we're simply storing something to jog our memory! That's not too common. Now say we need to store an arbitrary series of colors:
Perceptual Encoding A second approach to data compression is similar. It also tries to get rid of data that does not "buy us much," but this time, we measure the value of a piece of data in terms of how much it contributes to our overall perception of the sound. Here's a visual analogy: if we want to compress a picture for people or creatures who were colorblind, then instead of needing to represent all colors, we could just send black and white pictures, which as you can well imagine, would require less information than a full color picture. However now we are attempting to represent data based on our perception of it. Notice here that we're not using numbers at all, we're trying to simply compress all the relevant data into a kind of summary of what's most important (to the receiver). The tricky part of this is that in order to understand what's important, we need to analyse the sound into its component features, something that we didn't have to worry about when simply shortening lists of numbers. In other words, to restate the same conceptual paradigm in more precise, i.e more specific terms, using different phonemic psuedo-semantic glyphmorphs, interactive iso-species information exchanges would tend, as the number of such phenonmena increased, toward higher degrees of semiological unit deployment.
MP3 is the current standard for data compression of sound on the web. But don't take this too seriously, these compression standards change frequently as people invent newer and better methods. Perceptually-based sound compression algorithms usually work in this way: theres a lot of numerical information that is not perceptually significant, and we try to get rid of that and just keep what's important. µ-law ("mu-law") encoding is a simple, common and important perception-based compression technique for sound data. It's an older technique, but far easier to explain here than a more sophisticated algorithm like MP3, so we'll go into it in a bit of detail. Understanding it is a useful step towards understanding compression in general. µ-law uses the principle that our ears are far more sensitive to low amplitude changes than high ones. That is, if things are soft, we tend to notice the change in amplitude more easily than between very loud, and nearly equally loud sounds. µ-law compression takes advantage of this phenomenon by mapping 16-bit values onto an 8-bit µ-law table like the one below. Notice how the range of numbers is divided logarithmically rather than linearly, giving more precision at lower amplitudes. In other words, loud sounds are just loud sounds... To encode a µ-law sample, we start with a 16-bit sample value, say 330. We then find the entry in the table that is closest to our sample value. In this case, it would be 324, which is the 28th entry in our table (starting with entry 0), so we store 28 as our µ-law sample value. Later, when we want to decode the µ-law sample, we simply read 28 as an index into the table, and output the value stored there: 324. You might be thinking, "Wait a minute! Our original sample value was 330, but now we have a value of 324. What good is that?" While its true that we lose some accuracy when we encode µ-law samples, we still get much better sound quality than if we had just used regular 8-bit samples. Heres why: in the low amplitude range of the µ-law table our encoded values are only going to be off by a small margin, since the entries are close together. For example, if our sample value is 3 and its mapped to 0, were only off by 3. But since were dealing with 16 bit samples, which have a total range of 65536, being off by 3 isnt so bad. As amplitude increases we can miss the mark by much greater amounts (since the entries get farther and farther apart), but thats okay too the whole point of µ-law encoding is to exploit the fact that at higher amplitudes our ears are not very sensitive to amplitude changes. Using that fact, µ-law compression offers near-16 bit sound quality in an 8-bit storage format! Prediction Algorithms A third type of compression technique involves attempting to predict what a signal is going to do (usually in the frequency domain, not in the time domain), and only storing the difference between the prediction and the actual value. When a prediction algorithm is well tuned for the data on which its used, its usually possible to stay pretty close to the actual values. That means that the difference between your prediction and the real value is very small, and can be stored with just a few bits. Lets say you have a sample value range of 0 to 65536 (a 16-bit range, in all positive integers) and you invent a magical prediction algorithm that is never more than 256 units above or below the actual value. You now only need 8 bits (with a range of 0 to 255) to store the difference between your predicted value and the actual value. You might even keep a running average of the actually differences between sample values, and use that adaptively as the range of numbers you need to represent at any given time. Pretty neat stuff. In actual practice, coming up with such a good prediction algorithm is tricky, and what weve presented here is an extremely simplified presentation of how prediction-based compression techniques really work. The Pros and Cons of Compression Techniques Each of the techniques weve talked about above has advantages and disadvantages. Some are time-consuming to compute but accurate, some are simple to compute (and understand) but less powerful. Each tends to be most effective on certain kinds of data. Because of this, many of the actual compression implementations are adaptive they employ some variable combination of all three techniques, based on the data to be encoded. A good example of a currently widespread adaptive compression technique is the MPEG (Moving Picture Expert Group) standard now used on the internet for the transmission of both sound and video data. MPEG (which in audio is currently referred to as MP3) is now the standard for high quality sound on the Internet, and is rapidly becoming an audio standard for general use. A description of how MPEG audio really works is well beyond the scope of this book, but it might be an interesting exercise for the reader to investigate further. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||