After my initial “enthusiastic success” with the improved music experience, I have purchased more music files, and compared. And compared. There are quite big differences regarding the experienced audio. So I had to dig deeper. Me being me, I guess. I wanted to better understand what caused the audible differences.
There are tons of articles, discussion forums, and videos on the interwebs about everything “audiophile”. Alas, most forums are, well, typical Internet forums, with lots of opinions, but little real information, that is, underpinned with traceable references to foundational knowledge, well-described and thus repeatable tests based on hypotheses, etc.
Nullius in verba. I had to go back – or deeper – into the basics and build an understanding from there. I looked for peer-reviewed papers, or articles with references to such papers. I found a video series on the basic maths that actually properly tackle the fundamentals, step by step developing the foundational knowledge. And so on.
As an aside, all this sent me back way down memory lane, and caused some regrets. See, I have an Master of Science (MSc) degree with a specialisation in Signal Processing and Biomedical Engineering,1 but life had sent me down a different professional path. So I again realised what I once knew, and mostly forgot. Ayo. Se lavi-la.
Anyway, there turned out to be quite a few rabbit holes to go down into.
I’ll split this story into three segments:
- this post about the basics
- practical audio reconstruction, or simply, you know, gear (added 2023-09-04)
- audio experience (coming soon)
Obviously, the third part will be the one where all comes together and I actually try to explain the experienced differences. Stay tuned. :)
Human Hearing and Audio
The basics means to start to better understand human hearing. The Human Auditory System and Audio gives a good introduction. There’s also Audibility of temporal smearing and time misalignment of acoustic signals and Temporal resolution of hearing probed by bandwidth restriction by the same author.
The main takeaways from these papers (and others) for this context are:
- our hearing is phenomenal, with an incredible dynamic range that is better than any audio equipment can provide;2
- we can distinguish musical events that are about 5 us (microseconds) apart, which corresponds to frequency components that are well above 20 kHz;
- this ability to distinguish temporal events does not substantially diminish with age, that is, even if we lose the ability to hear high frequency continuous signals when getting older, we can still perceive the timing pretty well;
- we experience non-linearities in our hearing, which can “colour” how we perceive sounds in the real world, and thus influence how we’re used to hear the “real sound”, ie. not via playback of recordings;
- apart from the sound transmitted via air pressure waves into our ears, we also perceive sound via skull bone conduction;
- the temporal loudness onset and offset of a sound are much more defining than its spectral composition. In an experiment, various wind instruments were recorded and then played with the beginnings and ends of the notes marginally clipped off, so the spectra hardly changed; however, professional musicians had difficulty recognising their everyday familiar instruments. If you have ever programmed a synthesiser (the electronic instrument), you know that the loudness envelope, traditionally defined by the four parameters attack, decay, sustain, and release, is crucial for the sound produced;3
- the main musical pitch range is actually quite limited: the standard 88-key piano pitch goes from A0 = 27.5 Hz to only C8 = 4,186 Hz; that highest tone is easily perceived also by “old ears” to identify the base pitch, which means that the remainder of the frequency range above is used for the harmonics of the instruments’ tone (the spectrum), and the timing, which is more important for recognising and appreciating the instruments’ sound.
Maths and Signal Processing
For the maths involved, and its application for signal processing, there’s a nice series of videos. The nice thing about maths – not a lot of opinions, merely facts. :)
Everyone and their dog in audiophile circles has heard about Shannon’s sampling theorem,4 and its basic requirement to use a sampling frequency of at least double the highest frequency in the original analogue signal to be able to reconstruct the signal. As nicely described in High-Resolution Audio: How True is your Playback?, this, however, is not the full story, as (mis-) understood by many. So many videos on Youtube with “you fools, haha, don’t buy high resolution music, CD quality can reproduce 20 to 20 kHz perfectly, and you don’t hear beyond that”. Yes, that frequency band is correct – for continuous sine signals.
Transient level changes of pure sine signals well below half the sampling frequency can produce higher frequency components. Check out again the article, and how a simple eight kHz transient sine signal results in a spectrum reaching beyond 20 kHz. And how the same transient eight kHz signal gets distorted upon reconstruction at 44.1 kHz. And that’s just maths, not yet even measuring results of real equipment with its engineering tradeoffs and technological limitations.
Transient changes are crucial for the timbre of instruments – keyword envelopes –, and they can be perceived even by ageing ears and brains. Relatively low audible transient tones will produce frequencies that are well above the often cited 20 kHz hearing range, and we can “hear” these via the timing. Hence, focusing on continuous signals to analyse systems both mathematically and technically is only half the story.
The findings crawling through these rabbit holes – hi there, Alice – tell me that I am probably not only be tricked by my perception.
“Audiophiles” often make outlandish – or so they appear – claims regarding audible differences between different cables and whatnot. And some “high end” music equipment manufacturers do as well. There’s good money to be made with a good story. Frequently, the claims cannot be confirmed by measurements. But then again, as we have seen above, there’s more to audio perception than too simplistic and narrow measurements can tell. I cannot know how you perceive your music. I know my perception, and that it can change situationally, which is not surprising, given the nature of extended mind. So if a new cable improves your listening experience, who am I to tell that this is technically unlikely. The new cable, and the connected expectations, may just prime your priors, but your experience is better. Isn’t that what really counts? So I come down to “if it sounds better to you, go for it”.
Of course I am as vulnerable to this effect as the next gal, or guy. In this sense, I try to explore this topic in order to better understand my experience, and find actual reasons for differences, going beyond the too often simplistic mathematical and technical arguments.
On my journey I have come across arguments that I have immediately dismissed at first, but then changed my mind in the direction of, “yes, at least conceptually or technically possible” upon further thinking about it. An example? Let’s assume you have a wired Ethernet network, over which you transfer digital music data from a server to a digital to analogue converter. How can switches or routers in the network influence the quality of the reconstruction of the music in the DAC, and thus the music experience (assuming perfect digital data transmission without packet loss)? Think! Not sure I would hear the difference, but it’s at least feasible.
Biomedical Engineering requires a good understanding of human anatomy and physiology. I had always enjoyed the corresponding lectures, as well as the labs and exercises in the hospital. Measuring nerve conduction speed. Using ultrasonic doppler to measure blood flows in veins. Laser doppler for blood flow in the skin. Experiments in vivo. We were the experimenters as well as the guinea pigs. When I had my surgery a few years back, I badgered the poor anaesthetist about all his equipment and the procedures while he prepared me to go under. He didn’t mind, I guess he rarely has patients asking about his job in that situation. And he knew I’d shut up as soon as he pushed the syringe plunger. I once spent a whole day in an operation theatre, observing procedures. And attended the testing of a surgical robot in a lab on bodies. I spare you the pictures of that experience. When I needed a crown for a broken tooth recently, I had to ask the dentist about the production process for a crown. He seemed also happy to describe it, probably only few people care about such details of their craft. As regards time frame, to give you a hint. My studies were when magnetic resonance imaging (MRI) was novel, and the first machines had just been installed in clinics. I am not even sure they used superconducting magnets back then. In the processing field, the Motorola 68xxx microprocessors were among the latest and most powerful chips, running at 4 to 24 MHz. Yes, megahertz. Switched capacitor circuits were new as well. ↩︎
Similar to our vision. The best cameras today come not even close to match the dynamic range of our visual perception. ↩︎
I have programmed Waldorf (wave table) Yamaha (frequency modulation) as well as Korg and Roland (traditional oscillators) synthesisers. Envelopes can also be used to shape the filter response over time. And the envelope parameters can be made dynamically changeable during playing the synthesiser, eg. by how hard you hit a key (called velocity in MIDI parlance), or by pressing harder on the keys while the sound is already playing (MIDI: aftertouch). ↩︎
The theorem was actually previously described by E. T. Whittaker (1915). Of course Shannon gave him credit in his own papers. We do have the Whittaker-Shannon Interpolation Formula for reconstructing continuous signals. ↩︎