Digital Music Quality

I recently overheard a discussion about old-fashioned vinyl records compared to compact discs (CDs), or more fancy, analogue vs. digital music sources. I don’t want to delve into that topic, though. But the proponent of digital music said — and he’s a smart and technically savvy person — that the digital recordings on CDs, or as an equivalent digital sound file such as WAV, are perfectly accurate.

But are they?

Mr Shannon

Yes, as long as the recording and playback adhere to Mr Shannon’s sampling theory: the sampling frequency must be at least double the highest frequency that occurs in the analogue signal. Sampling is the process of digitising an analogue signal, where the amplitude value is recorded as a number at equidistant points in time, at the sampling rate.1 If we have at least two such numbers per period, we can reconstruct the analogue signal.

What if we have analogue frequencies above half the sampling frequency? To ensure a proper sampling process, such frequencies must be cut off before they enter the digitiser, the analogue-to-digital converter (ADC), using a low-pass filter. Otherwise, we get aliasing, where the mirrored spectra that occur at integer multiples of the sampling frequency will “bleed” into the audible spectrum, and we get distortions. It’s pure physics, it cannot be helped by any fancy technology.

The audible frequency spectrum that a young and healthy person can hear is about 20 Hz to 20 kHz (it gets worse with age). The sampling rate of a CD is 44.1 kHz. So half the sampling rate — 22.05 kHz — should be fine, to represent that frequency range, right? Just use a low-pass filter that cuts out all frequencies above 20 kHz, since we cannot hear these frequencies anyway.

Constructing such a steep low pass filter is a formidable technical challenge. But let’s assume we have an ideal filter, since it’s all about physics, not technology. There’s an aspect that all these observations and discussions about audible and recorded frequencies do not consider: the dynamic response of the whole system in the time domain.

This has always bugged me. I am not an audio engineer, and I have not created any filter or amplifier for many — many — years, be it analogue, digital, or mixed, so I am not up-to-date with the latest technologies and techniques. But there are basic physics. Such as that no macro-world object can travel faster than the speed of light, whatever the technology.

My my engineer’s intuition – which is what remains of actual know-how when you get older – is that the digital data on a CD can actually not represent the original music, if it contains sudden changes in amplitude. I do not mean the increase in overall loudness, like a crescendo, but amplitude spikes created by, say, percussion, like a cymbal, or the strings of an orchestra. Amplitude increases of certain frequencies within the overall music within milliseconds (transients).

Mr Laplace

As Mr Laplace has shown, the frequency domain and the time domain are invariably linked. If you know the frequency response of a system, you can calculate the corresponding time response, using the so called — unsurprisingly — Laplace transform. Or vice versa. It’s one of the most basic math any electronics engineer learns early in their education. Some system characteristics are easier inspected in the frequency domain, some in the time domain. You do this transform all the time, back and forth.

If we limit the frequency range of a system, in order to properly digitise an analogue signal according to Shannon, as outlined above, we also change its time, or dynamic, response. And we humans are very sensitive to timing shifts in what we hear, down to a milliseconds range. The Berlin Philharmonic Orchestra sounds so much better — in reality, not via a recording — than some average orchestra because they have the timing cold down. It’s a striking experience.

Time Domain Accuracy

As the Laplace transform shows, limiting the frequency bandwidth of a system limits the slew rate, ie. how fast the system can faithfully respond to a rapid, transient change in the input signal amplitude.2 This is because also a simple sine wave in the audible spectrum at, say, 8 kHz, can produce higher frequencies if we change its amplitude quickly. So if we cut enough of these higher frequencies by the low pass filter used for digital recording, the latter will simply not contain a true representation of the original signal. There will be timing issues. Think slew rate: the onset of the transient signal will show a delay.

The other way around, this basically means that we can perceive higher frequencies than 20 kHz — in the time domain, ie. the precision with which all the frequencies in an audio signal are aligned to each other. Alas, the technical spec of a digital recording and playback device will usually only say how it responds to continuous sine wave signals at different frequencies. And with continuous sine waves, all is fine. But such specs don’t tell the full story.

Note that Shannon’s sampling theory is still correct and accurate: to catch those higher frequencies produced by the rapid amplitude change, we simply have to sample at a higher frequency. Which is what certain recording studios actually do, eg. at 96 kHz or even 192 kHz.

The final CD data, though, will still be limited to its 44.1 kHz sampling and thus playback rate. Hence, in general, it simply can not contain the accurate representation of the original analogue signal, due to physics, independent of all fancy recording and audio processing technology used. Specific audio recordings of course can be accurate, if the original signal did not contain any higher frequencies to begin with that need to be cut off by the low pass filter. Think of some dreamy, mellow music without percussion.

The Playback Chain

The question is if we listeners will even perceive the inaccuracies in the real world of all the devices involved in the playback, from the digital source to the digital to analogue conversion to the amplification to the transducers in loudspeakers and earphones and headphones. All of these elements introduce their own limitations and imperfections.

But that’s a different discussion. I mean, people use Bluetooth earbuds, with all the involved compression artefacts, for crying out loud. And the sources are MP3 or AAC files, with their lossy compression. Not even CDs. So the timing inaccuracies of the digital sound source are probably not their main obstacle to get good quality sound.

Somewhere in storage I have DNM pre-amps and power-amps, which I had combined with a pair of Rehdeko loudspeakers. The electronics of the amps is designed for timing accuracy, and it makes all the difference. The sound is amazing. Of course they cannot make up for any deficiencies in the (digital) sources, and obviously they cannot work around the “Laplace physics”, but at least the analogue components minimise the timing issues. And it shows.

Apple recommends to use wired head/earphones for their highest quality music files (Apple Digital Master, see below).

Lost Forever

Bottom line, all digital sources these days that are of CD quality in general can not represent the original analogue music signal: the timing accuracy is lost forever. No “spatial audio” or other trickery can bring it back. The only remedy would be to record at higher sampling frequencies in the first place, Mr Shannon says. And Mr Laplace would concur. And to make these recordings available accordingly to us listeners.

Apple Digital Masters could be a step in this direction. Maybe there’s hope.

But old, existing recordings are what they are, what is lost cannot be “engineered in” again.3

The Proof

Here’s an article that confirms the above.

They modelled transient signals, and then also actually measured high-end audiophile equipment, to find exactly the timing inaccuracies that Mr Laplace predicts.

Even though the article does not even mention the Laplace transform, which I find astonishing.

Better Masters

Here’s a PDF document about Apple Digital Masters. It stuns me that a naive statement like this can be found:

The highest frequency audible to humans is around 20kHz; therefore a sampling rate of over 40kHz is required to accurately capture the audible range of frequencies. Compact discs’ 44.1kHz rate is adequate for this need.

Or this one:

Even so, many experts feel that using higher sample rates during production provides better-quality audio and a superior listening experience in the end product. For this reason, higher sample rates of 48, 88.2, 96 and even 192 kHz have become standard.

There’s nothing to feel about. There’s the physics, and it can be measured. Read the article.

Mr Laplace would spin in his grave.

  1. Linear pulse code modulation (LPCM) for CDs. ↩︎

  2. In practice, there are other factors, for example if the system can produce sufficient power (voltage and current) to drive its load. ↩︎

  3. Or maybe it could, to a degree, by careful spectrum analysis, and a machine learning system that uses clever heuristics, to fake a higher sampling rate? No idea. But we could never be sure it’s accurate, since we don’t have the original to compare with. And we would need the appropriate playback system. ↩︎