Research 01 — Audio Reactivity Across Genres — Why Your Thresholds Break

Audio reactivity is simple in principle: capture audio, measure something, drive a visual parameter with it. The complexity arrives when you try to make it work across more than one genre — or more than one song.

The amplitude problem

Most reactive systems begin with amplitude — a running RMS (root mean square) average or a peak follower. Both work reliably when the audio source is consistent. Calibrate a threshold for one production style and the system responds well.

The problem is that music is not consistent. A well-mastered jazz recording sits at around -18 LUFS (Loudness Units Full Scale). A modern EDM track sits at -7 or -8 LUFS — nearly ten times louder by the measure that matters to human perception. A threshold calibrated for one will either never trigger or immediately saturate when presented with the other.

This is not an edge case. It is the normal condition if your system needs to respond to more than a curated playlist.

Why normalization does not fix it

The obvious correction is to normalize the incoming signal — track the loudest value seen so far, express all future values as a fraction of that maximum. This works well for genres with consistent dynamics. It breaks almost immediately with classical music.

Classical recordings routinely have a dynamic range of 20dB or more. An opening passage might run quietly for minutes, then arrive at a fortissimo that is dramatically louder. A normalizer adapting to the quiet opening has set its ceiling at the wrong level. When the fortissimo arrives, the threshold is already saturated — the normalizer cannot distinguish a forte from a fortissimo, a build from a drop.

The inverse problem occurs with heavily compressed genres. Electronic music and commercial pop are often mastered within a 3–6dB dynamic range (the result of decades of loudness competition). The normalizer quickly adapts to this compressed floor, leaving almost no headroom to differentiate a sustained note from a transient hit.

How different frequencies dominate without EQ

Even at identical perceived loudness, different genres produce fundamentally different amplitude readings from a wideband measurement.

A hip-hop or bass-heavy electronic track concentrates most of its energy below 200Hz. A wideband RMS measurement on this material registers primarily as sub-bass. The snare at 200Hz and the hi-hats at 8–12kHz are present but contribute almost nothing to the overall amplitude value. The same measurement on a string quartet — which has almost no energy below 100Hz — will read dramatically quieter even if both tracks were mastered to the same integrated loudness.

Without any frequency separation, your visual parameters driven by amplitude become proxies for bass content. They are not measuring energy, dynamics, or excitement in any perceptual sense. They are measuring which frequency range dominates the genre.

This is compounded by how audio interfaces and analysis tools respond. Many affordable interfaces and software analysers roll off or attenuate sub-bass. A system calibrated on consumer gear measuring a bass-heavy track will behave differently on the same track analysed through a flat professional interface.

What actually works

These are the approaches I have found most reliable in TouchDesigner, ordered roughly by effort required:

Multi-band analysis. Split the signal into at least three frequency ranges before measuring amplitude: sub-bass (20–200Hz), midrange (200–4kHz), presence (4–20kHz). Drive separate visual parameters from each band independently. This distributes the frequency dominance problem across parameters rather than compressing it into a single value. Bass controls scale or size, mids drive colour or density, highs drive speed or noise. The system remains responsive across genres because each band has its own independent range.

Short-window RMS with tuned attack and release. The analysis window length determines what the RMS measurement reflects. A 30ms window catches transients but is noisy; a 500ms window is smooth but misses fast hits. For general use, 80–120ms is a reasonable starting point. The Lag CHOP in TouchDesigner gives you explicit control over attack and release separately — fast attack (5–15ms) to catch transients as they arrive, slower release (150–400ms) to hold the value long enough to drive a parameter visibly. This pair of settings matters more than most of the analysis parameters.

Percentile-based normalization over a rolling window. Rather than normalizing to the running peak — which gets dominated by outliers and never recovers — track the 90th or 95th percentile of values over a rolling 30–60 second window. This keeps the signal within a usable range across genre shifts without flattening the dynamics. The implementation in TouchDesigner is straightforward: feed the analysis value through a Trail CHOP and use a Sort CHOP to extract the percentile.

A manual gain offset. The least elegant solution, but the most reliable in live contexts: expose a single gain knob and set it by ear in the first few seconds of a new track. Ten seconds of listening tells you more about the incoming signal characteristics than any algorithm running blind. In a performance setting, having a human in the loop to adjust for genre changes is not a failure of the system — it is an acknowledgement that music is varied and the person operating the system knows that better than the code does.

In practice the most robust reactive systems combine all four. Multi-band analysis distributes the problem; short-window RMS with attack/release catches transients cleanly; percentile normalization keeps the range stable across genres; a manual offset handles edge cases. Each layer addresses a different failure mode. Together they produce a system that responds to what the music is doing rather than just to which frequency range happens to be dominant.