Next DJ Level 1 class starts August 3 — Limited spots!Book your spot
    Next course starts soon
    Sound waves traveling towards the human ear - visualization of psychoacoustics
    Back to blog

    The MP3 Algorithm: How Your Ear Really Hears (Part 1)

    Ras 'Kata' KjærboJanuary 10, 202618 min read

    Welcome to the "Digital Audio Quality" series. In this first part, we dive into the MP3 algorithm – not as a technical manual, but as a fascinating story about how science learned to listen the same way our ears do.

    Introduction: A revolution in disguise

    In 1987, a group of researchers at the Fraunhofer Institute in Erlangen, Germany, were working on a seemingly impossible problem: How do you make audio files smaller without them sounding worse?

    Their answer became MP3 – a format that would change the music industry forever. But the interesting part isn't the format itself. It's how it works. Because the MP3 algorithm isn't actually built on how sound works technically. It's built on how you hear.

    Psychoacoustics: The science of listening

    Psychoacoustics is the study of how humans perceive sound. It's not about sound waves – it's about the brain. And the brain is far from a perfect sound recorder.

    Think about your vision: You can't see ultraviolet light, even though it exists. Similarly, your ear has limitations. You can typically hear frequencies between approximately 20 Hz and 20,000 Hz – and even that range narrows with age.

    But the real magic lies in what we don't hear, even though it's technically there. And that's exactly what the MP3 algorithm exploits.

    Auditory masking: The sound that disappears

    The most fascinating principle in MP3 compression is auditory masking. According to research on auditory masking, the phenomenon occurs when a loud sound makes it impossible for you to hear quieter sounds.

    Simultaneous Masking

    Imagine you're sitting in a café. Someone whispers to you while a truck drives past. You can't hear the whisper – it's "masked" by the louder sound.

    The MP3 algorithm analyzes the sound frame by frame (typically 26 milliseconds – about 1,150 samples at 44.1 kHz – at a time) and identifies which frequencies are so weak compared to others that you wouldn't be able to hear them anyway. Then it simply removes that data.

    Temporal Masking

    Even more fascinating: Your ear also has a "warm-up" and "cool-down" time. Just before a loud sound (pre-masking) and just after (post-masking), your ear is temporarily less sensitive to quiet sounds.

    The algorithm exploits this by removing audio data in these short windows – typically 5-20 milliseconds (about 220-880 samples at 44.1 kHz) – because you wouldn't perceive them anyway.

    Critical bands: The ear's frequency filters

    Your inner ear is organized into so-called critical bands. Think of them as a series of overlapping filters, each covering a specific frequency range.

    These bands are not equal – they are narrower at low frequencies and wider at high frequencies. This means you're better at distinguishing between two tones in the bass than in the treble.

    The MP3 algorithm divides the audio into 32 subbands and analyzes each band separately. If two tones fall within the same critical band, and one is much stronger, the weaker tone will typically be masked.

    This is where the real magic happens: Instead of saving all audio data, MP3 only saves what actually makes a difference to your perception.

    The history: From research to revolution

    The MP3 format (officially MPEG Audio Layer III) was standardized in 1993, but the journey began much earlier. As described in Davis Pan's foundational IEEE paper from 1995, the work built on decades of research in psychoacoustics.

    Karlheinz Brandenburg, often called the "father of MP3," says they tested the algorithm on Suzanne Vega's "Tom's Diner" countless times. The song was chosen because its a cappella version was particularly vulnerable to artifacts – if MP3 could handle it, it could handle almost anything.

    According to Rassol Raissi's technical review, MP3 uses three main components:

    • Polyphase filterbank: Divides the signal into 32 subbands
    • MDCT: Further frequency division for precision
    • Psychoacoustic model: Decides what can be removed
    • Huffman coding: Further compresses the remaining data

    Bitrate: Quality vs. size

    You've probably heard about bitrates: 128 kbps, 320 kbps. But what do they actually mean?

    Bitrate is how many bits per second are used to describe the sound. A CD uses approximately 1,411 kbps. A 128 kbps MP3 uses less than 10% of that – and yet they can sound surprisingly similar.

    At lower bitrates, the algorithm becomes more "aggressive." It removes more data, which can lead to audible artifacts – a metallic "sizzling" sound, especially on hi-hats and vocal sibilance.

    At 320 kbps, there's almost always enough data for the compression to remain transparent to most listeners under normal conditions.

    The limitations: What MP3 can't do

    MP3 is impressive, but not perfect. The format has fundamental limitations:

    1. Generation loss

    Every time you re-encode an MP3, new artifacts are added. It's like photocopying a photocopy – quality degrades. This is critical for producers: always work in lossless formats and export to MP3 as the very last step.

    2. Pre-echo

    Sharp transients (like the attack of a drum hit) can "bleed" into audio data just before them. This is called pre-echo and can be heard as a faint "warning" before a hit.

    3. Stereo coupling

    At low bitrates, MP3 can use "joint stereo," which combines information from left and right channels. This saves space but can affect the stereo field.

    What does this mean for you?

    As a producer, you should always work in lossless (WAV or FLAC) and only export to MP3 for distribution. Remember that sample rate and bit depth are separate from MP3 compression – we cover that in the article about bit depth.

    As a DJ, 320 kbps MP3 is often acceptable for clubs, but for large sound systems, we recommend minimum FLAC or WAV. Read more in the next article about lossy vs. lossless.

    As a listener, you're free to choose based on situation. Streaming on the go? MP3 is fine. Home system with good speakers? Consider lossless.

    Conclusion: Respect for the algorithm

    The MP3 algorithm is a masterpiece of interdisciplinary research: psychology, acoustics, mathematics, and signal processing. It teaches us something important about perception:

    Reality is not what exists – it's what we experience.

    By understanding how we hear, researchers were able to create a revolution in how we share music. Today we may use other formats, but the principles live on.

    In the next part of the series, we look at the difference between lossy and lossless formats – and when it actually matters.

    Learn more about digital audio

    Ableton Level 1Learn to work professionally with audio quality
    Ableton Level 2Dive into bounce, export and mastering workflows
    DJ Level 1Understand audio formats for professional DJing
    Sound DesignCreate sounds with full control over quality
    Rumkraft ProDiscuss audio theory with other producers

    📚 Scientific Sources

    Continue reading this article

    Enter your email to unlock the rest of the article – and get exclusive tips straight to your inbox.

    ✨ Confirm your email to unlock ALL blog posts permanently!

    We only send relevant tips – no spam. Unsubscribe anytime.

    Om forfatteren

    Ras 'Kata' Kjærbo

    Ras 'Kata' Kjærbo

    Ras Kjærbo is an Ableton Certified Trainer and one of the driving forces behind Rumkraft. He teaches Ableton Live and music production, and is passionate about sharing his knowledge on everything from sound design to live performance techniques.

    We use cookies. Learn more ·