—> To Continue with Chapter 2

Sampling Theory

So now we know that we need to sample a continuous waveform to represent it digitally. We also know that the faster we sample it, the better. But this is a little vague. How
often do we need to sample a waveform in order to record an good representation of it?

The answer to this question is given by the Nyquist Sampling Theorem, which states that to represent a signal the sampling rate (or sampling frequency — not to be confused with the frequency content of the sound, as it frequently is!) needs to be at least twice the highest frequency contained in the sound of the signal.

For example, look back at our time-frequency picture from Section 2.2. It looks like it only contains frequencies up to 8000 Hz. If this were the case, then we would need to sample the sound at a rate of 16000 Hz (16kHz) in order to reproduce the sound. That is, we would need to take sound bites (bytes?!) 16000 times a second.

In the next chapter, when we talk about representing sounds in the frequency domain (as a combination of various amounts of frequency components, which change over time) rather than in the time domain (as a numerical list of sample values of amplitudes), we’ll learn a lot more about the ramifications of the Nyquist theorem for digital sound. But for our current purposes, it's a good idea to remember that since the human ear only responds to sounds up to about 20,000 Hz. We need to sample sounds at least 40,000 times a second, or at rate of 40,000 Hz, to represent these sounds. You may be wondering why we even need to represent sonic frequencies that high (when the piano, for instance, only goes up to the high 4,000 (or 4k) Hz range). The answer is timbral, particular, spectral. Remember that we saw in Section 1.4 that use those higher frequencies fill out the descriptive sonic information.

A Free Sample: A Tonsorial Tale from our Dan

Just to review: we measure frequency in cycles per second (CPS), or Herz (Hz.). The frequency range of human hearing is usually given as 20Hz to 20,000Hz (abbreviated as 20kHz), meaning that we can hear sounds in that range. Knowing that, if we decide that the highest frequency we’re interested in is 20kHz, then according to the Nyquist Theorem, we need a sampling rate of at least twice that frequency, or 40kHz.

Figure .x

Undersampling: What happens if we sample too slowly for the frequencies we're trying to represent?

We take samples (black dots) of a sinewave (in blue) at a certain interval (the sample rate). If the sinewave is changing too quickly (its frequency is too high) then we can't grab enough information to reconstruct the waveform from our samples. The result is that the high frequency waveform masquerades as a lower frequency waveform (how sneaky!), or that the higher frequency is aliased to a lower frequency.

An applet which demonstrates band limited and non-band limited waveforms.

Band limited waveforms are those in which the synthesis method itself does not allow higher harmonics, or frequencies, than the sampling rate allows. It's kind of like putting a governor on your car which doesn't allow you to go past the speed limit. This technique can be useful in a lot of applications where one has absolutely no interest in the wonderful joy of listening to things like aliasing, foldover, and unwanted distorion (we'll talk about those below).

Installed


Soundfile .x This sound file was sampled at the standard 44100 samples per second. This allows frequencies as high as around 22kHz, which is well above our ear's high frequency range. In other words, it's "good enough." Soundfile .x This sound file demonstrates undersampling of the same sound source as Soundfile.x. In this example, the file was sampled at 1024 samples per second. Note that the sound sounds "muddy," at a 1024 sampling rate — that rate does not allow us any frequencies above about 500 Hz, which is sort of like sticking a large canvas bag over your head, and putting your fingers in your ears, while listening.

Figure .x

Picture of an undersampled waveform. This sound was sampled 512 per second. This was way too slow.

Figure .x

This is the same soundfile as above, but now sampled 44100 (44.1kHz) times per second. Much better...

Scrubber applet

Installed

Aliasing

The most common standard sampling rate for digital audio (the one used for CDs) is 44.1kHz, giving us a Nyquist Frequency (defined as half the sampling rate) of 22.05kHz. If we use lower sampling rates, for example, 20kHz, we can’t represent a sound whose frequency is above 10KHz. In fact, if we try, we’ll get usually undesirable artifacts, called foldover or aliasing, in the signal.

In other words: if the sinewave is changing too quickly, we will get the same set of samples that we would have obtained had we been taking samples from a sinewave of lower frequency! As we said before, the effect of this is that the higher frequency contributions now act as impostors of lower frequency information. The effect of this is that there are extra, unanticipated and new low frequency contributions to the sound. Sometimes we can use this in cool, interesting, and funkadelic ways, and other times (like when the NSA is listening to your phone — not that we're paranoid or anything, but we think they are right now, or maybe not, so we better be a little careful, not that we have anything to hide, except for that one little thing a few years ago — or you are trying to make a beautifully faithful reproduction of an exquisite sound) it just messes up the original sound.

So in a sense, these impostors are aliases for the low frequencies, and we say that the result of our undersampling is an aliased waveform at a lower frequency.


Figure .x

Foldover aliasing: this picture shows what happens when we sweep a sinewave up past the Nyquist rate. It's a picture in the frequency domain (which we haven't talked about much yet), so what you're seeing is the amplitude of specific component frequencies over time. The x axis is frequency, the z axis is amplitude, and the y axis is time (read back to front).

As the sinewave sweeps up into frequencies above the Nyquist frequency, an aliased wave (starting at 0 Hz and ending at 44100 Hz over 10 seconds) is reflected below the Nyquist frequency of 22050 Hz. The soundfile can be heard below.

Soundfile .x

Chirpng: A 10 second sound file sweeping a sine wave from 0 Hz to 44,100 Hz. Notice that the sound seems to disapear after it reaches the Nyquist rate of 22050 — but then it wraps around as aliased sound back in to the audible domain.

Anti-aliasing Filters

Fortunately it’s fairly easy to avoid aliasing — we simply make sure that the signal we’re recording doesn’t contain any frequencies above the Nyquist Frequency. To accomplish this task, we use an anti-aliasing filter on the signal. Audio filtering is a technique that allows us to selectively keep or throw out certain frequencies in a sound — just as light filters (like ones you might use on a camera) only allow certain frequencies of light (colors) to pass. For now, just remember that a filter lets us color a sound by changing its frequency content. We'll talk a lot more about filters in this book.

An anti-aliasing filter is a lowpass filter. It's called a lowpass filter because it only allows frequencies below a certain cutoff frequency to pass. Anything above the cutoff frequency gets thrown away. By setting the cutoff frequency of the low-pass filter to the Nyquist frequency, we can throw out the offending frequencies, those high enough to cause aliasing, while retaining all of the lower frequencies that we want to record.

Figure .x

An anti-aliasing lowpass filter. Only the frequencies within the passband, which stops at the Nyquist frequency (and "rolls off" after that), are allowed to pass. This diagram is typical of the way we draw what is called the frequency response of a filter. It shows the amplitude that will come out of the filter in response to different frequencies (of the same amplitude).

Anti-aliasing filters can be analogized to coffee filters. The desired components (frequencies or liquid coffee) is preserved, while the filter (coffee filter or anti-aliasing filter) catches all the undesirable components (the coffee grounds and the frequencies that the system cannot handle).

Perfect anti-aliasing filters cannot be constructed, so we almost always get some aliasing error in an ADC->DAC conversion.

Anti-aliasing filters are a standard component in all digital sound recording, so aliasing is not usually a serious concern to the average user or computer musician (it is, however, a serious concern for audio designers). But, because many of the sounds in computer music are not recorded, but created digitally inside the computer itself, it’s important to fully understand aliasing and the Nyquist Theorem. There’s nothing to stop us from using a computer to create sounds with frequencies well above the Nyquist frequency. And while the computer has no problem dealing with such sounds as data, as soon as we mere humans want to actually hear them (as opposed to just conceptualizing or imagining them), we need to deal with the physical realities of aliasing, the Nyquist Theorem, and the analog to digital conversion process. Of course, it’s also possible to exploit these physical limitations in creative ways . . .


—> To Continue with Chapter 2

<— Back to 2.2

<— To Table of Contents