Written By: Ran Levi
This time, we’re interviewing Prof. Karlheinz Brandenburg about the invention of mp3 and the true story behind the important innovation.
This article is a transcript of a podcast. Listen to the podcast:
Subscribe: iTunes | Android App | RSS link | Facebook | Twitter
Explore episodes in other categories:
MP3 is an algorithm: it is a series of computations that transform a raw audio recording into a stream of bits that represent the same audio, but take much less space. It is a relatively recent invention, and the man responsible for it more than any other is Prof. Karlheinz Brandenburg. I reached out to him in 2011.“My name is Karlheinz Brandenburg. I’m nowadays professor at IImenau University and at the same time the director of the Fraunhofer Institute for Technology. Most people who know my name know it because I’ve been involved in the research in audio coding and standardization leading to MP3.”
A Rejected Patent
Don’t let Karlheinz’s modesty fool you. His research and direct involvement in the standardization process of mp3 later on, are the main reasons for mp3’s success.
“[MP3’s invention] goes back to the time I did my master’s thesis at the University of Erlangen, and my thesis advisor [Dieter Seitzer] had done a patent application. [He believed] ISDN networks should be used to transmit music. The patent examiner, in his first answer, wrote – ‘According to the State of the Art, music needs much higher bitrate than available on ISDN, thus this is not possible – and impossible ideas can not be patented.’ At that time, he was correct. So Seitzer looked for a student who could look into this and find out what is possible. I had just finished my Master’s work and was looking for some possibility to do my Ph.D., so I thought – OK! this is a nice idea. It probably will not work – but hopefully, there will be enough results to get the Ph.D.”
The patent clerk who examined Dieter Seitzer’s proposal rejected it for a very good reason.
One second of an average song in CD quality contains 10 Megabytes of information: That’s like a shelf full of thick paperback books. Sending this much ‘raw’ information through the internet is rather trivial today – but it was far from trivial back in the 1980s with low bandwidth modems. But telephone technology has been around for more than a century, so transmitting voice and music over electrical wires must be possible, right?
Yes, but Prof. Seitzer didn’t want to just transmit music over electrical wires: he also wanted to preserve its original quality. Music played over a telephone never retains its original quality, it sounds muffled and muddy. Telephone systems have a relatively low bandwidth – that is, they can’t transmit huge amounts of data. To be able to transmit music or voice over telephone lines, special filters remove some of the frequencies from the audio before it is sent, and reduces the amount of information it contains. The overall quality of the audio is reduced – but not enough to prevent speech from being understood. It’s like removing the color from a movie: it might not be as enjoyable in black and white as in color, but it is still a movie nevertheless.
Karlheinz Brandenburg’s goal was to find a way to transmit audio – without the penalty of reduced quality. It also had to be done in real-time: that is, the listener should be able to listen to the audio while it was being transmitted, much like in telephone. This meant that Karlheinz had to find a way to compress the audio significantly – yet retain its quality. It’s like trying to bake a cake that will be sweet and tasty but has the caloric content of a stalk of celery.
The route Karlheinz decided to take was that of Psychoacoustics. The roots of Psychoacoustics go back to 1894 when a physicist named Alfred Marshall Mayer reported an interesting discovery. Mayer noticed that when listening to two tones of similar frequencies – one tone will tend to overpower, or ‘mask’, the other tone even though they are of separate frequencies, and so should be processed separately by the auditory system.
For example, when a 1000 Hertz tone is played simultaneously with a weaker 1100 Hertz tone – the weaker tone will be masked out and will not be heard. The higher frequency sound will be overpowered by the lower one.
Masking is only one example of how what we hear depends not only on what is being played to us – but also on our ear structure and the way our brain processes the received information. The science that explores that sound perception is Psychoacoustics. Researchers discovered, even as early as the 1970s, that they can use the weaknesses and limitations of the human auditory processing system to better compress audio without affecting its overall perceived quality. The key is to filter out the frequencies we can’t hear: weak sounds next to loud sounds or weak sounds in very high or very low frequencies. If we return to the earlier cake analogy, if we remove ingredients that contribute calories but don’t contribute to the cake’s tastiness, it’s a win-win. We are saving calories without losing flavor… In audio, an end result is a compressed form of audio with no perceptible loss of sound quality.
A Man of Two Worlds
This was the theory. In practice, in the 1970s there was neither the theoretical knowledge nor the technology for analyzing a piece of music and identifying all the various frequencies and volume levels it contained. This is where Karlheinz Brandenburg stepped in. He was a both a mathematician and an electronics engineer, so he could approach the problem from both perspectives. In his doctoral thesis, he laid out the mathematical foundations for the efficient analysis of audio, and later, he and his colleagues at the German Fraunhofer institute implemented his idea in practice, using specially designed computer chips.
Suzanne Vega’s Challenging Voice
Now – What do you think is the most difficult and challenging music to compress? I would have guessed a classical piece, maybe one played by a large symphonic orchestra. But it’s wrong. Karlheinz’s most challenging piece of music was almost the opposite: Suzanne Vega’s Tom’s Diner acapella version. This song has no musical instrumentation at all: only Suzanne Vega singing into the microphone. It is actually this simplicity that paradoxically turned the acapella version of Tom’s Diner into such a hard nut to crack.It turns out that our auditory system is most sensitive to sounds that fall in the frequency range between 1000 and 5000 Hertz – which is, not surprisingly, the frequency range of human speech. Sounds in that range are easily distinguishable, compared to sounds of lower or higher frequency – and it is in that range that our brain can pick up even the smallest distortion or noise. Suzanne Vega’s clear voice strained the compression algorithm to its limits and exposed any weaknesses it had. “[Ran] What makes Suzanne Vega’s voice so difficult to encode?
[Prof. Brandenburg] This is acapella music. To understand this, you need to know that our ears are specially trained to get the little details of the human speech. Speech is something where it is easier for us to hear differences in musical signals, and Suzanne Vega in this Accapella recording stands in the middle [of the room]. You can hear a little bit of the room, so it’s not exactly the same on the left and right [channels], but it’s a very clean and clear voice. In terms of frequencies, it’s very rich from low to high frequencies that we can hear. That, together, makes it difficult [to encode].”
It took many more months of hard work to improve the compression algorithm so it could handle Suzanne’s voice and some other challenging sounds – like the sound of Castanets, for example. Finally, in 1992, all hurdles were cleared and the research was complete.
MP3 is Rejected
It’s important to note that at that point, Karlheinz didn’t think that the new compression algorithm would be useful for music consumers: people were using CD players, which did a very decent job. Karlheinz expected the algorithm to be useful in the film industry, where it would allow the distribution of movies on DVDs along with high-quality audio. Unfortunately, the Moving Picture Experts Group (or MPEG for short) …their standards committee chose a different compression algorithm for movie audio: an algorithm that wasn’t as effective as the one suggested by Karlheinz but was easier to implement with existing technology.
This was, without a doubt, a bitter disappointment for Karlheinz. After more than ten years of intensive research, he now had a revolutionary algorithm that compressed audio ten fold without affecting its quality – but nobody wanted it. The new algorithm did eventually win the approval of the MPEG committee and was even assigned an official name – MPEG-2 Audio Layer 3 – but except for a handful of software companies, the industry all but ignored it. Two of Karlheinz’s colleagues at the Fraunhofer institute remember displaying their innovation in an event organized by the Bavarian government – but nine out of every ten visitors skipped their booth and were drawn to a much more attractive innovation displayed nearby: a Bavarian beer cooler…
The Hacker who Cracked MP3
It was roughly at that time that the World Wide Web took hold with the general public. After some initial discussions, Karlheinz and his peers in the Fraunhofer institute decided to give the internet a shot: They would offer the new compression algorithm for sale as a way to compress music files stored on a personal computer. The licensing scheme was such that creating a mp3 file would cost money, but listening to one would not. This scheme turned out to be a smart one since it allowed mp3 players like Winamp to be distributed free of charge and made mp3 popular among internet users. Software companies that wished to include mp3 creation in their products purchased the rights to use the algorithm from the Fraunhofer institute. But then, in 1997, when things were finally beginning to fall into place for mp3 – disaster struck.
“Some student from Australia, using a Credit Card number stolen from Taiwan, bought some software […] and discovered that we used a well-documented internal interface by Microsoft. He wrote a different user interface, packaged it with our encoder core, put it on an FTP server in the US and wrote a README file saying: ‘This is Freeware, thanks to Fraunhofer!’. We were not at all happy about that.”
The Australian student who hacked the software and released the mp3 algorithm to the world destroyed the Fraunhofer institute’s business model. The compression algorithm was no longer a secret, and the academic institute had neither the will nor the resources to pursue each and every pirate who used it illegally. Fraunhofer had no choice but to dramatically lower the asking price for the mp3 license.
Ironically, Fraunhofer’s loss is why mp3 is so wildly successful today. Pirated versions of the algorithm allow users to freely convert their music collections to mp3 format, and file-sharing networks like Napster and Kazaa took off almost immediately. The end result was that the Fraunhofer Institute is still making a good profit from licensing the mp3 algorithm to legitimate software companies. Even Karlheinz, it seems, is happy with how things turned out.
“In Germany, we have a very nice law. If inventions are patented, then even if the inventor works for some company – they have to get some share of the royalty income. In the case of MP3, this gets a lot of money every year for Fraunhofer – and some of this money goes to the inventors. I think we have a group of 30-40 people in Fraunhofer alone who get some share of this royalty income. For me, for my style of living, I’m rich.”
The MP3 revolution
MP3 technology transformed the music industry. CD and record stores all but disappeared, and music is now sold on the web by the likes of Apple and Amazon or is streamed with Pandora and Spotify. The record labels lost much of their former power, now that artists can distribute their music as mp3 files directly to their listeners via services like Soundcloud.
And of course, it allowed content creators such as ourselves to record and distribute podcasts without the need for powerful transmitters and tall antennas. Podcasting owes its existence to mp3, and some say it may someday replace FM radio. Will mp3 kill FM radio in the same way it killed the music stores? Only time will tell.