You, at your home recording studio, probably have seen this term. In digital recording systems, sample rate defines how many times the analog signal sent by a microphone or instrument is sampled per second. The higher the number, the more samples of the analog signal are collected per second. However, not always the highest number means the best option.
I just want to know which sample rate to use!
Summing up, to choose a sample rate, you must consider:
- Your computer processing power;
- The media destined to play your music. For example, CDs and online music platforms use the rate of 44.1 kHz, whilst videos use 48 kHz. So, if you choose any other rate, your music, at some point, will be converted into one of these rates;
- Notice if you have suitable devices or software for the sample rate future conversion;
- Magroove’s suggestion: only work with 44.1 kHz if you already receive it like this. Otherwise, whether it is music or video, work with 48 kHz rate;
What is sample rate?
First of all: what is a “sample”?
A sample is a small part of something; in this case, of an audio signal.
Sounds are vibrations that propagate in physical environments, for example, air. As you speak, the vibrations of your vocal chords produce waves that travel through air. These vibrations occur in cycles, and the number of cycles per second is called frequency. In physics, the unit of measurement used for frequency is Hertz (Hz); therefore, the frequency of the sounds is measured in Hertz. The faster the frequency, the higher the pitch. Human audition goes from 20Hz to 20kHz. It means that the lowest sound a human can hear has a 20Hz frequency, whilst the highest has around 20,000Hz (20kHz).
You might have seen the graphic representation of sound as a wave. So, pay attention to the graph below – it represents, vertically, the intensity or volume of the sound, while horizontally, the propagation of the sound wave into space.
To represent the sound wave, the computer guides itself by small samples that must contain all the data of the sound reproduction. Imagine a singer with a microphone capturing their voice. The sound of their voice is an acoustic energy and makes the air vibrate. This vibration changes, through the microphone, into electric signal, and a cable transmits it to the audio interface. At the input in the audio interface, there is an AD converter (analog to digital). So it digitalizes the electric signal; that is, codifies it to a binary language of 0 and 1. The same thing happens at the interface output. An AD converter makes the exact opposite, changing the binary code into electric signal, which turns back into sound.
What is the sample rate for?
The sample rate defines how many times a sample of the analog signal will be collected at a period of 1 second. In other words. Do you remember that, in Physics, Hertz is the measurement unit for frequencies? The sample rate also uses Hertz, since it represents how many samples will be collected in one second. Now, think about the sample rate as pictures of the sound: the more pictures you take, the better you can represent the sound in each thousandth of a second.
Then, it means that when we use a sample rate of 44.1 kHz, we collect 44.1000 samples in just one second. It seems to be a lot, doesn’t it? Well, actually, it isn’t. The computer needs to describe, in its binary language, a sound wave that is, in nature, continuous.
According to Nyquist’s Theorem, for an accurate digital representation of a sound wave, the sample rate must be, at least, two times bigger than the highest frequency going to be recorded. As the highest sound a human can hear has a frequency of 20 kHz, the minimum sample rate must be 40 kHz to be possible to digitalize this frequency.
With that, if no human can hear anything higher than 20 kHz, why bother having a sample rate above 40 kHz? Actually, why is the minimum standard 44.1 kHz instead of 40 kHz?
According to Nyquist’s Theorem, if the sample rates are 44.1 kHz or 48 kHz, the highest frequencies captured in a digital record will be, respectively, 22.05 kHz and 24 kHz. However, besides the definition of a maximum frequency, the chosen sample rate has a collateral effect: all the frequencies beyond the stablished limit are not distinguished or are wrongly understood as lower frequencies. This is called the “Aliasing Effect”, or “Foldover”.
The Aliasing Effect changes the sound and can make a completely different sound out of the rebuilt signal from the samples.
To avoid the distortion caused by these high frequencies, sound cards usually come with an anti-aliasing filter in the signal input, before it’s converted to digital.
However, due to technical reasons, it is impossible to manufacture an anti-aliasing filter with a sudden attack shortly after the human hearing range. Therefore, the cut of the filter ends up making a curve, gradually decreasing the entry of high frequencies. This curve is called slope. In this slope, the filter won’t either reject nor let all the frequencies totally through. Because of that, the slope of the anti-aliasing filter must be beyond the frequency of 20 kHz. Otherwise it will generate losses in the sound heard by humans.
Usually, the 44.1 kHz is enough to provide a safe zone in which the aliasing frequencies and the anti-aliasing filter slope will not affect the anything within the human audible range, but that depends on the quality of the filter. The problem is that we hardly get all the information about the anti-aliasing filters’ quality of sound cards available in the market. That’s why many people prefer to use a high sample rate, as 88.2 kHz, to make sure that the aliasing effect – or even the anti-aliasing filter – doesn’t interfere in the content of the frequencies around 20 kHz.
What is Jitter and Dither?
Jitter is clock error (clock distortions) . This “clock” determines the distribution of the sampling process during the time. There may be clock variations and deviations in the reading time pattern, taking the pictures of the sounds with a small delay or advance on the rhythm (sample rate) programmed. This problem is called jitter.
Different reasons can create jitter, such as changes in electrical voltage and noises in the audio signal. Clock errors damage the sound wave reading and can even cause changes in timbre and frequency. Jitter can happen not only at the analog to digital conversion, but also the other way around, from digital to analog.
Dither is a background noise applied whilst exporting the audio. It covers up signal digitalization errors, such as jitter. The noise of the dither sometimes is referred as noise floor, even though it isn’t the correct technical term for it. This noise is friendlier to human hearing than the distortions in analog signal rates, thus we inject it in the recording before finalizing. If properly done, most of the times people won’t even note the dither.
In audio, latency is the period of time between the signal’s entry in the system and the perception of this signal. In other words, it is the delay between when it’s played and when it’s heard. The digital audio introduces latency issues to AD and DA conversions. These problems are directly related to the buffer size. The buffer is a temporary memory where all the sound samples are queued. A captured sound, before being converted into digital, goes through the buffer. It must be big enough to store the samples whilst the processor performs another task.
The bigger the buffer, the bigger the latency.
Reducing the buffer automatically means reducing the latency, but also means increasing processing time. This happens because the buffer constantly needs charging. So if the buffer is too small, the CPU will have problems performing different tasks at the same time, causing interruptions in the sound stream.
Besides decreasing the buffer, another way to lower the latency is to increase the sample rate. It seems to be contradictory, as bigger sample rates need greater processing capacity. But if your system is able to handle it, the latency turns out to reduce.
When you choose to work with high sample rates, your files are heavier. With that, you will need more space in disc to store the project.
If you usually do contributions or jobs through the Internet, you must consider that the heavier the project, the longer it takes to upload and download.
Setting the sample rate at DAW (Digital Audio Workstation)
DAWs commonly offer different sample rate options, normally varying between 44.1 kHz and 192 kHz. It is always good to check if your audio interface supports this setting before actually setting up the sample rate on your DAW.
Nowadays, the common interfaces on the market such as M-Audio, Pressonus, Steinberg and Focusrite, usually support from 44.1 kHz to 192 kHz without problems. Don’t forget to read the interface’s manual before buying! It is also interesting to observe the frequency range and the frequency response chart of your microphone. These parameters point out the sensibility of the microphone per frequency and it’s recording range extension. So is it worth recording with a sample rate of 192 kHz to capture extremely high frequencies even though your microphone only captures until 20 kHz?
After all, which sample rate should I use?
44.1 kHz versus 48 kHz
CDs are in 44.1 kHz and MP3 are usually also 44.1 kHz. In the early 80’s, the companies Philips and Sony established this pattern. But why are the numbers so weird? According to the musical technology specialist Mitch Gallager, in the beginning of digital audio records, the pattern was 48 kHz. However, the manufacturers established a different pattern for the products offered to the public. Thus, it makes everything easier to avoid piracy: mathematically, it is hard to convert a rate of 48 kHz to 44 kHz. Curiosity: In the audiovisual realm, the sample rate of 48 kHz was set as a standard right from the beginning, and there is an interesting reason for that. The frequency of 48 samples por second is a multiple of the 24 frames per second used in movies.
Therefore, if the song is made for a music video, the sample rate must be of 48 kHz or its multiples. Conventional DVDs are always 48 kHz and DVD-A (DVD-Audio, which are different from common DVDs) are 96 kHz (two times 48). In 2018, the company Tidal started offering CDs with the MQA (Master Quality Authenticated) – it works with a sample rate of 96 kHz.
Why does 44.1 kHz still survive?
It is possible that the pattern of 44.1 kHz becomes obsolete soon. After all, at the streaming and processing era, what is the point to maintain the standard of 44.1 kHz?
Here are some possible good explanations:
- Tradition: CDs are in 44.1 kHz;
- Inertia: audio interfaces are pre-set to 44.1 kHz, and so are onboard sound cards;
- Cost x benefit: More processing power and disc space are needed to record in a sample rate higher than 44.1 kHz, and the changes are so discreet that a usual ear can’t realize the difference;
- Internet: extremely high-speed Internet is not a reality all over the world, but streaming and loading must be fast enough in these places – smaller files help a lot;
- Low-quality equipment: in-ear phones, earbuds, notebook speakers, etc. It is already hard to hear the difference caused by different sample rates in high quality equipment; say in conventional equipment. The differences go down to almost zero;
- Technical issues: a lot of speakers are only capable of reproducing 44.1 kHz, even today.
Sample rates of 88.2 kHz or 96 kHz are worth it?
There isn’t a consensus about the worth of recording beyond conventional rates. Some people say that extremely high frequencies, beyond the limit of human hearing, have effect on what we hear. Others say they are able to “feel” these frequencies. Some claim that the “the higher the sample rate, the better is the quality of the audio” argument is just something sellers say, only to sell.
In spite of this, it’s important to have in mind that any sample rate conversion, even from a higher to a lower sample rate, generates sound quality loss. The algorithm of sample rate conversion is just not that good and can and will create a change of timbre.
- Try to make your own project with a sample rate that won’t need you to convert further;
- If you are burning CDs in a factory and you recorded over 44.1 kHz, let the company do the conversion. The same happens with bit depths (if you recorded with 24 bits, for example);
- If you are syncing the audio to a video, record in 48 kHz.
- Never, ever convert recorded audios to another sample rate in the middle of your project. If you started in a different sample rate than you wanted to, stick to it and finish the project and remember to change it next time;
- Only convert a sample rate after you finish and bounce the project. If you are burning a CD in a factory or specialized company, export in the recorded format and let them do their job.
- The sample rate is the frequency at which the analog signal is sampled for digitalization;
- The choice of a sample rate is directly related to the media you are going to work with.
- The aliasing effect is generated by the wrong interpretation of a frequency because of a too low sample rate. To avoid it, sound cards usually happen to have an anti-aliasing filter;
- Jitter is a clock error in the sample quantization process;
- Dither is a background noise used to correct imperfections and finish recordings;
- Latency is the delay regarding the sound source. The bigger the sample rate, the smaller the latency, but more processing power is needed;
- The bigger the sample rate, the more space in disc it’ll use;
- Most audio interfaces and DAWs usually work with sample rates between 44.1 kHz and 192 kHz.