microphone to begin
time x frequency x intensity
harmonic overtone series
Real-time spectral analysis
Harmonic decomposition, spectral analysis, and interactive audio intelligence. See sound the way physics does.
Add two or more audio files
to compare their spectral fingerprints side by side
Every sound is a pressure wave. What makes a violin sound different from a trumpet is the hidden mathematical structure inside that wave. Soniform decomposes sound in real time using the Fourier Transform and displays that structure across seven measurements and four visualizations.
Each metric in the bottom bar is recomputed every animation frame from your audio. Together they form a spectral fingerprint of the sound.
The lowest, strongest pitch frequency. What a musician calls the note.
Nearest musical note name (A, C-sharp, etc.) plus cents sharp or flat.
Centre of mass of the spectrum. High value means bright. Low value means warm.
Overall loudness of the signal in decibels.
How many times the waveform crosses zero per frame. A proxy for tonality versus noise.
Score from 0 to 100 measuring how evenly energy spreads across the overtone series.
Frequency below which 85% of the spectrum energy lives.
Use the sidebar to explore any metric in depth. If audio is loaded, each section shows your live reading and explains what it means for the sound you are hearing right now.
When a string or air column vibrates, its base rate of oscillation is the fundamental frequency (f0), the pitch your ear identifies.
Soniform detects f0 using autocorrelation: it slides a copy of the waveform across itself and finds the time-lag where it matches best. That lag equals one period, giving the fundamental.
The dominant pitch frequency detected in real time.
Middle C on a piano is 261.6 Hz. Human speech sits 85 to 255 Hz. A piccolo can reach 4 kHz. High number means high pitch; low number means bass.
Western music divides the octave into 12 equal semitones. Soniform maps the detected fundamental to the nearest note using:
440 Hz is A4, the international tuning reference. Cents are hundredths of a semitone. A reading of 0 cents means perfectly in tune. A reading of 50 cents means halfway between two notes.
Includes octave number and cents deviation from perfect tuning.
Singers and instrumentalists target 0 cents. Vibrato oscillates plus or minus 20 to 50 cents rhythmically.
Imagine the frequency spectrum as a see-saw. Each frequency bin has a weight equal to its amplitude. The centroid is where that see-saw balances, the centre of mass of the spectrum.
It is the single best predictor of perceived brightness. High centroid means tinny, bright, metallic. Low centroid means warm, dark, bass-heavy.
Correlates directly with how bright or dark the sound is perceived.
A soft cello might show 600 Hz. A cymbal crash might show 8000 Hz.
Root Mean Square measures effective signal power. It tracks perceived loudness far better than peak level.
Silence is near -90 dB. A quiet room is around -60 dB. A speaking voice sits around -30 to -15 dB. Loud music approaches 0 dB. Above 0 dB indicates clipping.
Loud is near 0 dB. Silence is near -90 dB.
Every time the waveform crosses the zero axis that is a zero crossing. ZCR counts how many happen per analysis frame.
A pure 440 Hz sine wave crosses roughly 41 times per 2048-sample frame. Noisy broadband sounds cross hundreds of times. Clean tonal sounds cross far fewer times.
Low means tonal or pitched. High means noisy or percussive.
A pitched sound distributes its energy across a harmonic series, the fundamental and its integer multiples. How that energy is spread tells us about timbre.
This is the Shannon entropy of the overtone distribution, normalised to 100. A score of 0 means all energy is in one harmonic (pure sine wave). A score of 100 means energy is perfectly equally spread across all harmonics.
Flute is around 20 to 35. Guitar is around 50 to 70. Full orchestra is 80 and above.
This is Soniform's original metric. Simple flutes concentrate energy at the fundamental. Violins and brass spread it across many overtones producing richer scores.
Rolloff is the frequency below which 85% of total spectral energy lives. Think of it as a brightness threshold.
High rolloff means significant high-frequency content such as cymbals or consonants. Low rolloff means bass or kick drum is dominant.
Bass-heavy sounds have low rolloff. Bright or noisy sounds have high rolloff.
The Fast Fourier Transform takes a snapshot of N time-domain samples and decomposes it into constituent sine waves at every frequency from 20 Hz to the Nyquist limit.
Soniform plots magnitudes on a logarithmic frequency axis so octaves appear equally spaced, matching how human hearing works.
Larger FFT means more frequency bins and sharper spectral detail, but each frame covers more time so fast transients blur. Smaller FFT means faster response but coarser resolution. 2048 is the default balance.
The FFT assumes the signal repeats periodically. A hard edge causes spectral leakage. Windowing tapers the frame to zero at both edges. Hann is a good general default. Blackman reduces leakage further. Rectangular gives maximum resolution but causes leakage artifacts.
A spectrogram is a rolling history of FFT snapshots: time runs left to right, frequency bottom to top, and colour encodes intensity. Dark navy means silence. Bright cyan or white means strong energy.
Horizontal stripes are sustained tones. Vertical smears are transients such as claps or drum hits. Diagonal streaks indicate pitch glides or vibrato. Speech shows shifting formant bands separated by silence.
Adjust the history slider (1 to 8 seconds) in the sidebar. Short windows reveal rapid events; long windows show how the sound evolves over time.
A vibrating string or tube produces not just f0 but a series of overtones at exact integer multiples, the harmonic series.
Soniform plots this as a radial node graph. The central cyan node is f0. Surrounding violet nodes are overtones. Node size represents amplitude.
Two instruments playing the same note have the same node positions but very different sizes. A flute concentrates energy at n=1 and n=2. A violin spreads to n=6 and beyond. That difference is what makes them sound distinct.
A real-time audio analysis platform that makes the hidden mathematics of sound visible. Here is everything you need to get started.
The raw air pressure over time. Wide swings mean loud sound. A flat line means silence. Shape reveals rhythm, attack, and sustain.
The frequency fingerprint at this instant. Logarithmic x-axis from 20 Hz to 22 kHz, decibels on the y-axis. Each peak is a frequency component present in the sound.
A rolling history of the spectrum. Time runs left to right, frequency runs bottom to top, colour encodes loudness. Notes appear as horizontal stripes. Transients appear as vertical smears.
The overtone series as a radial node diagram. The central node is the fundamental. Surrounding nodes are integer multiples. Node size encodes amplitude, making timbre visible.
The main tool. Load audio or enable your microphone to explore all four visualizations and seven metrics in real time. Adjust FFT settings, windowing, and gain from the sidebar.
Add multiple audio files and see their spectra, waveforms, and metrics side by side. Ideal for comparing instruments, singers, or different versions of a mix.
Deep explanations of every metric and visualization. When audio is active, Learn shows your live readings so you understand what each number means for your specific sound.
This page. A guide to everything in Soniform including how to use it, what you are seeing, what the numbers mean, and what makes this platform different.
Larger means more frequency detail but slower time response. 2048 is the recommended balance. Use 8192 for fine pitch analysis. Use 512 for tracking fast transients.
How much the spectrum is averaged frame to frame. High smoothing gives a calmer, easier-to-read display. Low smoothing responds instantly to rapid changes.
Hann is the best general-purpose choice. Blackman reduces spectral leakage further. Rectangular gives maximum resolution but causes leakage artifacts.
Amplifies or attenuates the signal before analysis. Use positive gain for quiet recordings. Use negative gain if the signal is clipping.
Annotates the strongest frequency peaks in the spectrum with their frequency in Hz or kHz. Toggle off for a cleaner view.
Overlays dashed vertical lines at harmonic series positions in the spectrum. Helps you see whether peaks align with expected overtones.
Pro tip: Try playing a single sustained note on any instrument into your microphone, then open Learn and select Harmonic Map. You will see exactly which overtones your instrument produces and how the Harmonic Complexity Score reflects its timbre in real time.
Every frame, roughly 60 times per second, Soniform collects a block of audio samples from the Web Audio API, applies a windowing function to reduce edge artifacts, then runs the Fast Fourier Transform. The FFT output feeds all four visualizations simultaneously.
Pitch detection uses autocorrelation, finding the dominant period in the signal without relying on the FFT. The Harmonic Complexity Score is the Shannon entropy of the overtone amplitude distribution, normalised to 0 to 100. All other metrics including centroid, rolloff, ZCR, and RMS are standard DSP calculations performed on the raw waveform or frequency data.
Everything runs entirely in the browser. No server, no upload, no latency beyond your hardware. Your audio never leaves your device.
Academic context: The techniques here, including STFT, spectral moments, harmonic analysis, and autocorrelation pitch detection, are foundational to Music Information Retrieval (MIR), speech processing, acoustic engineering, and computational musicology. The Harmonic Complexity Score is Soniform's own formulation, applying information-theoretic entropy to the overtone domain.