When home recording first became mainstream…
It happened for one simple reason:
The analog gear of decades past was slowly, but surely, being replaced…
By a new generation of audio interfaces and other digital gear that was cheaper and easier-to-use than ever before.
And that trend has continued since.
Today…digital audio is the standard nearly all studios, both pro and amateur.
Yet surprisingly few people really understand what it’s all about.
So for today’s post, what I have for you is a comprehensive introduction to the basics of Digital Audio for Music Recording.
These are the 9 topics we will cover:
While digital audio is the standard in music nowadays…
It wasn’t always that way.
Originally, musical information existed only as sound waves in the air.
Then as technology advanced, people discovered ways of converting it to other formats, including:
But ultimately, with the rise of computers, digital audio became the dominant format for music recording because it allowed songs to be easily copied and transported for free.
And the device that makes it all possible is…the digital converter.
To understand how they work, up next…
In the recording studio, digital converters exist in 2 forms:
To convert audio into binary code, they take tens-of-thousands of snapshots (samples) per second, to build an “approximate” picture of the analog waveform.
The picture is not exact, because in the moments between samples, the converter must essentially guess what’s going on.
As you can see in the above diagram where:
The results aren’t perfect, but they’re good enough to produce excellent sound quality.
Exactly how excellent, depends mostly upon…
Take a look at this picture:
As you can see…
By taking more snapshots per second, higher sample rates:
And the end result is, of course…better sound quality.
Now let’s talk specific numbers:
Common sample rates in pro audio include:
The 44.1 kHz minimum is due to a mathematical principle known as…
To accurately record digital audio, converters must capture the full spectrum of human hearing, between 20Hz – 20kHz.
According to the Nyquist-Shannon Sampling Theorum…
Capturing a specific frequency, requires at least 2 samples for each cycle…to measure both the upper, and lower points on the waveform.
That means, recording frequencies of up to 20 kHz requires a sample rate of 40 kHz or more. Which is why CD audio lies just above that, at 44.1 kHz.
While high sample rates DO produce better sound quality…the benefits aren’t free.
The costs include:
So there’s always a trade-off. Pro studios can more easily support the highest sample rates because they use better gear.
For home studios though, most people find that a default setting of 48 kHz works best.
To understand bit depth, let’s first discuss bits.
Short for binary digit, a bit is a single unit of binary code, valued at either a 1 or 0.
The more bits used, more combinations are possible. For example…
As you can see in the diagram below, 4 bits yields a total of 16 combinations.
When used to encode information, each of these numbers is assigned a specific value.
By increasing the bits, the number of possible values grows exponentially.
With bit depth in digital audio, each value is assigned a specific amplitude on the audio waveform.
The greater the bit depth, the more volume increments exist between loud and soft…and the greater the dynamic range of the recording.
A good rule of thumb to remember is: For every extra “bit”, dynamic range increases by 6dB.
Ultimately what this means is…more bit depth equals less noise…
Because by adding this extra headroom, the useful signal (on the loud end of the spectrum) can be recorded higher above the noise floor (on the soft end of the spectrum).
It sounds impressive, that a 24 bit recording yields almost 17 million possible values, right?
Yet that’s still far less than the infinite number of possible values that exist in an analog signal.
So with almost every sample, the actual value lies somewhere in-between two possible values. The converter’s solution is to simply round-it-off or “quantize” it to the nearest value.
The resulting distortion, known as quantization error, happens at 2 phases of the recording process:
With mastering, the sample rate/bit depth of the final track is often reduced upon conversion to its final digital format (CD, mp3, etc.).
When this happens, some information gets deleted and “re-quantized” resulting in further distortion of the sound.
To deal with this problem, there’s a handy solution known as…
When reducing a 24 bit file down to a 16 bit file, dither is used to essentially mask a large portion of the resulting distortion…
By adding a low-level of “random noise” to the audio signal.
Since the concept is hard to visualize with audio, the popular analogy used to explain it is dithering with images.
Here’s how it works:
When a color photo is converted black and white, mathematical guesswork is done to determine whether each colored pixel should be “quantized” to a black pixel, or a white pixel…
…Just like how guesswork is done to quantize digital audio samples.
As you can see in the figure below, the “before” picture looks pretty crappy, doesn’t it?
But with dither…
And by adding this “random noise” to the image, the “after” picture looks much better. With audio dithering, the concept is very similar.
The ONE BIG FLAW with digital studios today, is the amount of time-delay(latency) that accumulates in the signal chain, especially with DAW’s.
With all the calculations that occur, it takes anywhere from a few milliseconds to a few DOZEN milliseconds for the audio signal to exit the system.
In a typical digital signal chain, there are 4 stages that add to the total delay time:
A/D and D/A conversion are 2 the smallest offenders, contributing less than 5 msof total delay.
Your DAW buffer, and certain plugins (including “look-ahead” compressors and virtual instruments), can add up to 20, 30, 40 ms or more.
To keep it at a minimum:
As you’ll notice, buffer times are measured in samples, NOT milliseconds. To convert it:
For example: 1024 samples ÷ 44.1 kHz = 23 ms
If you hate doing math, here’s an easier way to remember it at 44.1 kHz:
In MOST cases, these steps should bring the latency down to a manageable level…
But sometimes, if your gear is either too old or too cheap, it may NOT.
In that case…
Many budget interfaces have a “mix” or “blend” knob, which allows you to combine the session playback, with “live signal” being recorded.
By splitting your live mic/guitar signal and sending half to the computer to be recorded, and half directly to your studio headphones, you avoid latency by side-stepping the signal chain entirely.
The downside to this technique is…you hear the live signal completely dry, with zero effects.
Hopefully though, since computers keep getting faster, this won’t be an issue in the near future.
Whenever two or more devices exchange digital information in real-time…
Their internal clocks must be synced so the samples stay aligned…
Preventing those annoying clicks and pops in the audio that otherwise occur.
To sync them, one device functions as the “master“, and the rest as “slaves“.
In simple home studios, the audio interface clock usually leads by default.
In pro studios, which require premium digital conversion and complex signal routings…
A special stand-alone device known as a digital master clock (aka word clock) can be used instead. As many owners claim, the the sound benefits of these high-end clocks can be far less-subtle than you might imagine.
In today’s world, compressed audio files are the norm in digital audio.
Because with the limited storage space of iPods, smartphones, and internet streaming, all files must be as-small-as-possible.
Using a method of “lossy data compression”, mp3, AAC, and other similar formats can shrink audio files down to 1/10th their original size.
The encoding process works using a principle of human-hearing known as “auditory masking“…
Which makes it possible to delete tons of musical information, while still maintaining acceptable levels of sound quality to most listeners.
Experienced audio engineers might hear a difference, but average consumers will not.
Exactly how much information gets deleted, depends on the bitrate of the file.
With higher bitrates, less information is removed, and more detail is preserved.
For example, with mp3:
To find the ideal format and bitrate for YOUR music, always double-check the recommendations of its destination (iTunes, YouTube, Soundcloud, etc.)