AI Stem Separation (Demucs)

Separates a mixed audio file into individual stems (vocals, drums, bass, other instruments) using Meta’s Demucs neural network. Useful for analyzing individual elements of a mixed file, or preparing stems for masking analysis.

Requires the phantom-audio[separation] extra. Install with: pip install phantom-audio[separation]

Parameters

Parameter	Type	Default	Description
file_path	string	required	Path to mixed audio file
output_dir	string	required	Directory to write separated stems

Example Output

$ separate_stems full-mix.wav

Stem Separation: full-mix.wav Model: Demucs v4 (htdemucs) Duration: 3:42

Processing… done (18.4s)

Output files: ./stems/vocals.wav (24-bit WAV) ./stems/drums.wav (24-bit WAV) ./stems/bass.wav (24-bit WAV) ./stems/other.wav (24-bit WAV)

Quality estimate: Vocals: high (clear separation) Drums: high (clean transients) Bass: medium (some bleed from kick) Other: medium (residual content)

What the Numbers Mean

Quality estimate — Phantom’s confidence in the separation quality for each stem. “High” means clean isolation. “Medium” means some bleed from other sources. Bleed is more common between bass/kick and in dense arrangements.
Processing time — Stem separation uses neural networks and is CPU-intensive. Expect 5-20 seconds per minute of audio depending on your machine.

Example Prompts

Full separation

Separate my mix into stems — I want to analyze each element individually

Vocals only

Extract just the vocals from song.wav so I can analyze them

Analysis pipeline

Separate this reference track into stems, then run masking analysis between my vocals and the reference vocals

analyze_masking — Compare separated stems for frequency overlap
multi_stem_masking — Analyze all separated stems at once
batch_diagnostic — Run diagnostics on all separated stems

Pro tip

Stem separation quality drops with heavily compressed or limited audio (less transient information for the model to work with). For best results, use the highest quality source available: uncompressed WAV or FLAC, before any mastering processing.

AI Stem Separation (Demucs)

Parameters

Example Output

What the Numbers Mean

Example Prompts

Related Tools