Rumkraft — Ableton Certified Musikskole København

Welcome to the "Digital Audio Quality" series. In this first part, we dive into the MP3 algorithm – not as a technical manual, but as a fascinating story about how science learned to listen the same way our ears do.

Introduction: A revolution in disguise

In 1987, a group of researchers at the Fraunhofer Institute in Erlangen, Germany, were working on a seemingly impossible problem: How do you make audio files smaller without them sounding worse?

Their answer became MP3 – a format that would change the music industry forever. But the interesting part isn't the format itself. It's how it works. Because the MP3 algorithm isn't actually built on how sound works technically. It's built on how you hear.

Psychoacoustics: The science of listening

Psychoacoustics is the study of how humans perceive sound. It's not about sound waves – it's about the brain. And the brain is far from a perfect sound recorder.

Think about your vision: You can't see ultraviolet light, even though it exists. Similarly, your ear has limitations. You can typically hear frequencies between approximately 20 Hz and 20,000 Hz – and even that range narrows with age.

But the real magic lies in what we don't hear, even though it's technically there. And that's exactly what the MP3 algorithm exploits.

Auditory masking: The sound that disappears

The most fascinating principle in MP3 compression is auditory masking. According to research on auditory masking, the phenomenon occurs when a loud sound makes it impossible for you to hear quieter sounds.

Simultaneous Masking

Imagine you're sitting in a café. Someone whispers to you while a truck drives past. You can't hear the whisper – it's "masked" by the louder sound.

The MP3 algorithm analyzes the sound frame by frame (typically 26 milliseconds – about 1,150 samples at 44.1 kHz – at a time) and identifies which frequencies are so weak compared to others that you wouldn't be able to hear them anyway. Then it simply removes that data.

Temporal Masking

Even more fascinating: Your ear also has a "warm-up" and "cool-down" time. Just before a loud sound (pre-masking) and just after (post-masking), your ear is temporarily less sensitive to quiet sounds.

The algorithm exploits this by removing audio data in these short windows – typically 5-20 milliseconds (about 220-880 samples at 44.1 kHz) – because you wouldn't perceive them anyway.

Critical bands: The ear's frequency filters

Your inner ear is organized into so-called critical bands. Think of them as a series of overlapping filters, each covering a specific frequency range.

These bands are not equal – they are narrower at low frequencies and wider at high frequencies. This means you're better at distinguishing between two tones in the bass than in the treble.

The MP3 algorithm divides the audio into 32 subbands and analyzes each band separately. If two tones fall within the same critical band, and one is much stronger, the weaker tone will typically be masked.

This is where the real magic happens: Instead of saving all audio data, MP3 only saves what actually makes a difference to your perception.

The history: From research to revolution

The MP3 format (officially MPEG Audio Layer III) was standardized in 1993, but the journey began much earlier. As described in Davis Pan's foundational IEEE paper from 1995, the work built on decades of research in psychoacoustics.

Karlheinz Brandenburg, often called the "father of MP3," says they tested the algorithm on Suzanne Vega's "Tom's Diner" countless times. The song was chosen because its a cappella version was particularly vulnerable to artifacts – if MP3 could handle it, it could handle almost anything.

According to Rassol Raissi's technical review, MP3 uses three main components:

Polyphase filterbank: Divides the signal into 32 subbands
MDCT: Further frequency division for precision
Psychoacoustic model: Decides what can be removed
Huffman coding: Further compresses the remaining data

Bitrate: Quality vs. size

You've probably heard about bitrates: 128 kbps, 320 kbps. But what do they actually mean?

Bitrate is how many bits per second are used to describe the sound. A CD uses approximately 1,411 kbps. A 128 kbps MP3 uses less than 10% of that – and yet they can sound surprisingly similar.

At lower bitrates, the algorithm becomes more "aggressive." It removes more data, which can lead to audible artifacts – a metallic "sizzling" sound, especially on hi-hats and vocal sibilance.

At 320 kbps, there's almost always enough data for the compression to remain transparent to most listeners under normal conditions.

The limitations: What MP3 can't do

MP3 is impressive, but not perfect. The format has fundamental limitations:

1. Generation loss

Every time you re-encode an MP3, new artifacts are added. It's like photocopying a photocopy – quality degrades. This is critical for producers: always work in lossless formats and export to MP3 as the very last step.

2. Pre-echo

Sharp transients (like the attack of a drum hit) can "bleed" into audio data just before them. This is called pre-echo and can be heard as a faint "warning" before a hit.

3. Stereo coupling

At low bitrates, MP3 can use "joint stereo," which combines information from left and right channels. This saves space but can affect the stereo field.

What does this mean for you?

As a producer, you should always work in lossless (WAV or FLAC) and only export to MP3 for distribution. Remember that sample rate and bit depth are separate from MP3 compression – we cover that in the article about bit depth.

As a DJ, 320 kbps MP3 is often acceptable for clubs, but for large sound systems, we recommend minimum FLAC or WAV. Read more in the next article about lossy vs. lossless.

As a listener, you're free to choose based on situation. Streaming on the go? MP3 is fine. Home system with good speakers? Consider lossless.

Conclusion: Respect for the algorithm

The MP3 algorithm is a masterpiece of interdisciplinary research: psychology, acoustics, mathematics, and signal processing. It teaches us something important about perception:

Reality is not what exists – it's what we experience.

By understanding how we hear, researchers were able to create a revolution in how we share music. Today we may use other formats, but the principles live on.

In the next part of the series, we look at the difference between lossy and lossless formats – and when it actually matters.

Learn more about digital audio

Ableton Level 1Learn to work professionally with audio quality

Ableton Level 2Dive into bounce, export and mastering workflows

DJ Level 1Understand audio formats for professional DJing

Sound DesignCreate sounds with full control over quality

Rumkraft ProDiscuss audio theory with other producers

Digital Audio Quality – Complete Series

Part 1: The MP3 Algorithm: How Your Ear Really Hears (this article)
Part 2: Lossy vs. Lossless: A Guide for DJs and Producers
Part 3: Bit Depth and Dynamic Range
Part 4: The Best Microphone is the One You Have on You
Part 5: Hi-Res Audio: When It Actually Matters

📚 Scientific Sources

Pan, D. (1995). "A Tutorial on MPEG/Audio Compression" – IEEE Multimedia
Raissi, R. (2002). "The Theory Behind MP3" – MP3-Tech.org
Wikipedia: Auditory Masking – Overview of masking phenomena
Fraunhofer IIS: MP3 History – Original developer

Continue reading this article

Enter your email to unlock the rest of the article – and get exclusive tips straight to your inbox.

✨ Confirm your email to unlock ALL blog posts permanently!

We only send relevant tips – no spam. Unsubscribe anytime.

Om forfatteren

Ras 'Kata' Kjærbo

Ras Kjærbo is an Ableton Certified Trainer and one of the driving forces behind Rumkraft. He teaches Ableton Live and music production, and is passionate about sharing his knowledge on everything from sound design to live performance techniques.

Instagram SoundCloud Website

Back to all articles

The MP3 Algorithm: How Your Ear Really Hears (Part 1)