---
title: "Install Demucs Locally: Free AI Stem Separation Setup Guide"
date: "2026-01-11"
lastUpdated: "2026-01-24"
author: "StemSplit Team"
tags: ["Demucs", "AI", "machine learning", "stem separation", "tutorial", "Meta AI", "htdemucs", "deep learning"]
excerpt: "Step-by-step guide to install Demucs on your computer for free stem separation. Extract vocals, drums, and bass locally with GPU acceleration. No cloud needed."
abstract: "Demucs is the AI model powering most professional stem separation tools today — including StemSplit's vocal remover. This guide covers everything from installation to architecture to training custom models, written for both curious musicians and ML engineers."
locale: "en"
canonical: "https://stemsplit.io/blog/demucs-local-setup-guide"
source: "stemsplit.io"
---

> **Source:** https://stemsplit.io/blog/demucs-local-setup-guide  
> Originally published by [StemSplit](https://stemsplit.io). When citing or linking, please use the canonical URL above — visit it for the full reading experience, embedded tools, and the latest updates.

Demucs is the AI model powering most professional stem separation tools today — including [StemSplit's vocal remover](/vocal-remover). This guide covers everything from installation to architecture to training custom models, written for both curious musicians and ML engineers.

**TL;DR**: Demucs is a hybrid transformer model by Meta AI that separates audio into vocals, drums, bass, and other instruments. Install with `pip install -U demucs`, run with `demucs your_song.mp3`, and get [studio-quality stems](/free-trial) in minutes. For best results, use the `htdemucs_ft` model with GPU acceleration.

---

## What Is Demucs?

Demucs (Deep Extractor for Music Sources) is an open-source AI model developed by [Meta AI Research](https://ai.meta.com/research/) for music source separation. It takes a mixed audio track and outputs isolated stems — typically vocals, drums, bass, and "other" (everything else).

What makes Demucs significant:

- **State-of-the-art quality**: Achieves an [SDR (Signal-to-Distortion Ratio)](https://en.wikipedia.org/wiki/Signal-to-distortion_ratio) of 9.20 dB on the MUSDB18-HQ benchmark — higher than any previous model
- **Waveform-based processing**: Works directly on raw audio, not just spectrograms, preserving phase information
- **Open source**: [MIT licensed](https://opensource.org/licenses/MIT), free for commercial and personal use
- **Battle-tested**: Powers most professional stem separation services

The latest version, Hybrid Transformer Demucs (HTDemucs), represents the fourth major iteration and combines the best of both time-domain and frequency-domain processing.

---

## The Evolution: v1 → v4

Understanding Demucs's evolution helps explain why it works so well.

![Demucs Evolution Timeline - From v1 to v4 showing SDR improvements](/images/blog/demucs-evolution-timeline.svg)

### Demucs v1 (2019)

The original Demucs introduced a [U-Net architecture](https://en.wikipedia.org/wiki/U-Net) operating directly on waveforms — a departure from spectrogram-only methods. Key innovations:

- Gated Linear Units (GLUs) for activation
- Bidirectional LSTM between encoder and decoder
- Skip connections from encoder to decoder layers

```
Architecture: Pure waveform U-Net with BiLSTM
SDR: ~6.3 dB on MUSDB18
Innovation: First competitive waveform-only model
```

### Demucs v2 (2020)

Improved depth and training:

- Deeper encoder/decoder (6 layers → 7 layers)
- Better weight initialization
- Data augmentation improvements

```
SDR: ~6.8 dB on MUSDB18
Innovation: Proved waveform models could compete with spectrogram methods
```

### Demucs v3 / Hybrid Demucs (2021)

The breakthrough: combining spectrogram and waveform processing:

- Dual U-Net architecture (one for time domain, one for frequency domain)
- Shared representations between branches
- Cross-domain fusion at the bottleneck

```
SDR: ~7.5 dB on MUSDB18
Innovation: Best of both worlds — spectrogram precision + waveform phase
```

### Demucs v4 / HTDemucs (2022-2023)

The current state-of-the-art, adding Transformers:

- Transformer layers in both encoder and decoder
- Cross-attention between temporal and spectral branches
- Self-attention for long-range dependencies

```
SDR: 9.20 dB on MUSDB18-HQ
Innovation: Transformers capture long-range musical structure
```

---

## Architecture Deep Dive

For ML practitioners: here's how HTDemucs actually works.

### High-Level Structure

HTDemucs uses a **dual-path architecture** with two parallel U-Net branches that share information:

![HTDemucs Architecture - Dual-path model with temporal and spectral branches](/images/blog/htdemucs-architecture.svg)

### Temporal Branch (Waveform Processing)

The temporal branch processes raw audio samples:

1. **Encoder**: Stack of strided 1D convolutions that progressively downsample the audio
2. **Bottleneck**: BiLSTM + Transformer self-attention
3. **Decoder**: Transposed convolutions that upsample back to original resolution
4. **Skip connections**: U-Net style connections from encoder to decoder

```python
# Simplified encoder layer structure
class TemporalEncoderLayer:
    def __init__(self, in_channels, out_channels, kernel_size=8, stride=4):
        self.conv = nn.Conv1d(in_channels, out_channels, kernel_size, stride)
        self.norm = nn.GroupNorm(1, out_channels)
        self.glu = nn.GLU(dim=1)  # Gated Linear Unit
        
    def forward(self, x):
        x = self.conv(x)
        x = self.norm(x)
        x = self.glu(x)  # Output is out_channels // 2
        return x
```

### Spectral Branch (Spectrogram Processing)

The spectral branch processes the Short-Time Fourier Transform (STFT) of the audio:

1. **STFT computation**: Converts waveform to complex spectrogram
2. **2D Convolutions**: Process frequency × time representations
3. **Transformer layers**: Self-attention in frequency and time dimensions
4. **Inverse STFT**: Convert back to waveform

Key parameters:
- STFT window: 4096 samples
- Hop length: 1024 samples
- Frequency bins: 2049 (for 44.1kHz audio)

### Cross-Domain Fusion

The magic happens where the branches communicate:

```python
# Cross-attention between branches (simplified)
class CrossDomainAttention:
    def forward(self, temporal_features, spectral_features):
        # Temporal attends to spectral
        temporal_out = self.temporal_cross_attn(
            query=temporal_features,
            key=spectral_features,
            value=spectral_features
        )
        
        # Spectral attends to temporal
        spectral_out = self.spectral_cross_attn(
            query=spectral_features,
            key=temporal_features,
            value=temporal_features
        )
        
        return temporal_out, spectral_out
```

### Why This Architecture Works

1. **Phase preservation**: Waveform branch maintains exact phase relationships — critical for clean separation
2. **Frequency precision**: Spectral branch excels at separating instruments with distinct frequency profiles
3. **Long-range dependencies**: Transformers model musical structure (verse-chorus patterns, repeated motifs)
4. **Multi-scale features**: U-Net captures both fine detail and global context

---

## Available Models Compared

Demucs offers several pretrained models. Here's how they compare:

![Demucs Model Comparison - Quality vs Speed vs VRAM](/images/blog/demucs-models-comparison.svg)

| Model | Stems | SDR (vocals) | SDR (avg) | Speed | VRAM | Best For |
|-------|-------|--------------|-----------|-------|------|----------|
| `htdemucs` | 4 | 8.99 dB | 7.66 dB | Fast | ~4GB | General use, good balance |
| `htdemucs_ft` | 4 | **9.20 dB** | **7.93 dB** | Slow | ~6GB | **Best quality** |
| `htdemucs_6s` | 6 | 8.83 dB | N/A | Medium | ~5GB | Guitar/piano separation |
| `mdx` | 4 | 8.5 dB | 7.2 dB | Fast | ~3GB | Lower VRAM systems |
| `mdx_extra` | 4 | 8.7 dB | 7.4 dB | Medium | ~4GB | Better than mdx |
| `mdx_q` | 4 | 8.3 dB | 7.0 dB | Fastest | ~2GB | Quick previews |

### Model Details

**htdemucs (default)**
- The standard Hybrid Transformer model
- Good quality/speed tradeoff
- Trained on internal Meta dataset + MUSDB18-HQ

**htdemucs_ft (fine-tuned)**
- Same architecture, fine-tuned on additional data
- Highest quality available
- Recommended for final production work

**htdemucs_6s (6-stem)**
- Separates into: vocals, drums, bass, guitar, piano, other
- Useful when you need guitar or piano isolated
- Slightly lower quality per-stem due to harder task

**mdx / mdx_extra**
- Models from the MDX 2021 competition
- Use "bag of models" ensemble approach
- Lower VRAM requirements

---

## System Requirements

### Minimum Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| CPU | Any modern x86_64 | 4+ cores |
| RAM | 8 GB | 16 GB |
| GPU | None (CPU works) | NVIDIA 4GB+ VRAM |
| Storage | 2 GB | 5 GB (for models) |
| Python | 3.8+ | 3.10+ |

### Processing Time Estimates

![Demucs Processing Time by Hardware - Performance comparison across different systems](/images/blog/demucs-processing-times.svg)

For a 4-minute stereo track at 44.1kHz:

| Hardware | htdemucs | htdemucs_ft |
|----------|----------|-------------|
| NVIDIA RTX 4090 | ~30 sec | ~60 sec |
| NVIDIA RTX 3080 | ~45 sec | ~90 sec |
| NVIDIA RTX 3060 | ~90 sec | ~180 sec |
| Apple M1 Pro | ~120 sec | ~240 sec |
| Intel i7 (CPU) | ~8 min | ~15 min |
| Intel i5 (CPU) | ~15 min | ~25 min |

### GPU VRAM Usage

VRAM requirements depend on audio length and model:

![VRAM Usage by Model and Audio Length - GPU memory requirements for different Demucs models](/images/blog/demucs-vram-usage.svg)

If you run out of VRAM, use the `--segment` flag to process in smaller chunks.

---

## Installation Guide

**Which installation method is right for you?**

![Installation Method Decision Tree - Choose the best setup for your needs](/images/blog/demucs-installation-flowchart.svg)

### Option 1: pip (Simplest)

For most users who just want to separate tracks:

```bash
# Create a virtual environment (recommended)
python3 -m venv demucs_env
source demucs_env/bin/activate  # Windows: demucs_env\Scripts\activate

# Install Demucs
pip install -U demucs

# Verify installation
demucs --help
```

You should see:

```
usage: demucs [-h] [-s SHIFTS] [--overlap OVERLAP] [-d DEVICE]
              [--two-stems STEM] [-n NAME] [-v] ...

positional arguments:
  tracks                Path to tracks

optional arguments:
  -h, --help            show this help message and exit
  ...
```

### Option 2: Conda (Recommended for GPU)

For GPU acceleration and ML development:

```bash
# Clone the repository
git clone https://github.com/facebookresearch/demucs
cd demucs

# Create environment (choose one)
conda env update -f environment-cuda.yml  # For NVIDIA GPU
conda env update -f environment-cpu.yml   # For CPU only

# Activate environment
conda activate demucs

# Install in development mode
pip install -e .

# Verify GPU is detected (PyTorch: https://pytorch.org/)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
```

Expected output with GPU:

```
CUDA available: True
```

### Option 3: Docker (Cleanest Isolation)

For reproducible environments:

```dockerfile
# Dockerfile
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

RUN pip install -U demucs

WORKDIR /audio
ENTRYPOINT ["demucs"]
```

Build and run:

```bash
docker build -t demucs .
docker run --gpus all -v $(pwd):/audio demucs song.mp3
```

### Platform-Specific Notes

#### macOS (Intel)

```bash
# Install FFmpeg (required)
# Download from: https://ffmpeg.org/download.html
brew install ffmpeg

# Install SoundTouch (optional, for data augmentation)
brew install sound-touch

pip install -U demucs
```

#### macOS (Apple Silicon M1/M2/M3)

```bash
# FFmpeg (required for audio processing)
# Official site: https://ffmpeg.org/
brew install ffmpeg

# Install with MPS support (Metal Performance Shaders)
pip install -U demucs

# Verify MPS is available
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"
```

Use `--device mps` flag when running Demucs.

#### Windows

```bash
# Using Anaconda Prompt:
conda install -c conda-forge ffmpeg
pip install -U demucs soundfile

# Prevent CUDA memory issues
set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
```

#### Linux (Ubuntu/Debian)

```bash
# System dependencies
# FFmpeg: https://ffmpeg.org/
sudo apt-get update
sudo apt-get install ffmpeg libsndfile1

# Install Demucs
pip install -U demucs

# Optional: Prevent CUDA memory caching issues

```

---

## Basic Usage

### Separating a Track

The simplest command:

```bash
demucs song.mp3
```

Output structure:

![Demucs output folder structure showing separated stems](/images/blog/demucs-output-structure.svg)

### Common Use Cases

**Extract just vocals (karaoke creation):**

```bash
demucs --two-stems vocals song.mp3
```

Output: `vocals.wav` and `no_vocals.wav` (instrumental)

**Extract just instrumental:**

```bash
demucs --two-stems vocals song.mp3
# Then use the no_vocals.wav file
```

**Process multiple files:**

```bash
demucs song1.mp3 song2.mp3 song3.mp3
```

**Output as MP3 instead of WAV:**

```bash
demucs --mp3 --mp3-bitrate 320 song.mp3
```

**Use highest quality model:**

```bash
demucs -n htdemucs_ft song.mp3
```

**Specify output directory:**

```bash
demucs -o ./my_stems song.mp3
```

---

## Advanced Command-Line Options

Here's every flag explained:

![Demucs Command-Line Quick Reference - All flags and options explained](/images/blog/demucs-cli-cheatsheet.svg)

### Model Selection

```bash
# Use specific model
demucs -n htdemucs_ft song.mp3     # Best quality
demucs -n htdemucs_6s song.mp3     # 6-stem output
demucs -n mdx_q song.mp3           # Fastest/smallest
```

### Device Control

```bash
# Force CPU processing
demucs -d cpu song.mp3

# Use specific GPU
demucs -d cuda:0 song.mp3          # First GPU
demucs -d cuda:1 song.mp3          # Second GPU

# Apple Silicon
demucs -d mps song.mp3
```

### Quality vs Memory Tradeoffs

```bash
# Segment length (seconds) - lower = less VRAM, potentially worse quality
demucs --segment 10 song.mp3       # For very low VRAM
demucs --segment 40 song.mp3       # Default for most models

# Overlap between segments (0-0.99)
demucs --overlap 0.25 song.mp3     # Default

# Shifts - increases quality by ~0.2 SDR, but slower
demucs --shifts 2 song.mp3         # Process twice with time shifts
demucs --shifts 5 song.mp3         # More shifts = better quality, slower
```

### Output Format

```bash
# WAV options
demucs --int24 song.mp3            # 24-bit WAV output
demucs --float32 song.mp3          # 32-bit float WAV

# MP3 options
demucs --mp3 song.mp3              # Default bitrate
demucs --mp3 --mp3-bitrate 320 song.mp3  # High quality
demucs --mp3 --mp3-preset 2 song.mp3     # Best quality preset
demucs --mp3 --mp3-preset 7 song.mp3     # Fastest encoding

# Clipping prevention
demucs --clip-mode rescale song.mp3      # Rescale to prevent clipping
demucs --clip-mode clamp song.mp3        # Hard limit (default)
demucs --clip-mode none song.mp3         # No protection
```

### Parallel Processing

```bash
# Number of parallel jobs (increases memory usage)
demucs -j 4 song.mp3               # Use 4 cores
demucs -j 8 song1.mp3 song2.mp3    # Process multiple files in parallel
```

### Complete Example

Maximum quality, GPU accelerated:

```bash
demucs \
  -n htdemucs_ft \
  -d cuda:0 \
  --shifts 2 \
  --overlap 0.25 \
  --float32 \
  --clip-mode rescale \
  -o ./output \
  song.mp3
```

---

## Python API Integration

For integrating Demucs into your applications:

### Basic Programmatic Usage

```python
import demucs.separate

# Using argument list (like CLI)
demucs.separate.main([
    "--mp3",
    "--two-stems", "vocals",
    "-n", "htdemucs",
    "song.mp3"
])
```

### Using the Separator Class

```python
from demucs.api import Separator
import torch

# Initialize separator
separator = Separator(
    model="htdemucs_ft",
    segment=40,
    shifts=2,
    device="cuda" if torch.cuda.is_available() else "cpu",
    progress=True
)

# Load and separate
origin, separated = separator.separate_audio_file("song.mp3")

# `separated` is a dict with tensor values:
# separated["vocals"] -> torch.Tensor
# separated["drums"] -> torch.Tensor
# separated["bass"] -> torch.Tensor
# separated["other"] -> torch.Tensor

# Save individual stems
from demucs.api import save_audio

for stem_name, stem_tensor in separated.items():
    save_audio(
        stem_tensor,
        f"output/{stem_name}.wav",
        samplerate=separator.samplerate,
        clip="rescale"
    )
```

### Direct Model Access

For ML practitioners who want more control (requires [PyTorch](https://pytorch.org/)):

```python
from demucs import pretrained
from demucs.apply import apply_model
import torch
import torchaudio

# Load model
model = pretrained.get_model("htdemucs_ft")
model.eval()

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Load audio
waveform, sample_rate = torchaudio.load("song.mp3")

# Ensure stereo
if waveform.shape[0] == 1:
    waveform = waveform.repeat(2, 1)

# Add batch dimension and move to device
mix = waveform.unsqueeze(0).to(device)

# Apply model
with torch.no_grad():
    sources = apply_model(
        model,
        mix,
        shifts=2,
        split=True,
        overlap=0.25,
        progress=True,
        device=device
    )

# sources shape: (batch, num_sources, channels, samples)
# sources[0, 0] = drums
# sources[0, 1] = bass
# sources[0, 2] = other
# sources[0, 3] = vocals

# Get source names
source_names = model.sources  # ['drums', 'bass', 'other', 'vocals']
```

### Callback for Progress Tracking

```python
from demucs.api import Separator

def progress_callback(info):
    """Called during separation with progress info."""
    state = info.get("state", "")
    if state == "start":
        print(f"Processing segment at offset {info['segment_offset']}")
    elif state == "end":
        progress = info['segment_offset'] / info['audio_length'] * 100
        print(f"Progress: {progress:.1f}%")

separator = Separator(
    model="htdemucs",
    callback=progress_callback
)

origin, separated = separator.separate_audio_file("song.mp3")
```

---

## Training Custom Models

For researchers and advanced users who want to train on custom data.

### Prerequisites

1. Clone the full repository:

```bash
git clone https://github.com/facebookresearch/demucs
cd demucs
conda env update -f environment-cuda.yml
conda activate demucs
pip install -e .
```

2. Install [Dora](https://github.com/facebookresearch/dora) (Meta's experiment manager):

```bash
pip install dora-search
```

### Dataset Preparation

Demucs is typically trained on [MUSDB18-HQ](https://zenodo.org/record/3338373), which contains:
- 150 full-length songs (100 train, 50 test)
- Separate stems for each song
- 44.1kHz stereo WAV files

Download and set path:

```bash
# Download MUSDB18-HQ from Zenodo
# Set environment variable

```

### Training a Model

Basic training command:

```bash
# Train using Dora
dora run -d solver=htdemucs dset=musdb_hq

# With specific configuration
dora run -d solver=htdemucs dset=musdb_hq \
    model.depth=6 \
    model.channels=48 \
    optim.lr=3e-4 \
    optim.epochs=360
```

Training parameters explained:

| Parameter | Description | Default |
|-----------|-------------|---------|
| `model.depth` | Encoder/decoder depth | 6 |
| `model.channels` | Base channel count | 48 |
| `model.growth` | Channel growth factor | 2 |
| `optim.lr` | Learning rate | 3e-4 |
| `optim.epochs` | Training epochs | 360 |
| `optim.batch_size` | Batch size | 4 |

### Fine-Tuning an Existing Model

To fine-tune on custom data:

1. Prepare your dataset in MUSDB format
2. Run fine-tuning:

```bash
dora run -d -f 81de367c continue_from=81de367c dset=your_dataset variant=finetune
```

### Evaluation

Evaluate model on test set:

```bash
dora run -d solver=htdemucs dset=musdb_hq evaluate=true
```

Output includes SDR (Signal-to-Distortion Ratio) per source:

```
Source      | SDR (dB)
------------|--------
vocals      | 8.99
drums       | 8.72
bass        | 7.84
other       | 5.09
------------|--------
average     | 7.66
```

---

![Demucs Troubleshooting Guide - Quick solutions to common problems](/images/blog/demucs-troubleshooting.svg)

## Troubleshooting Common Issues

### CUDA Out of Memory

**Error:**
```
RuntimeError: CUDA out of memory. Tried to allocate X MiB
```

**Solutions:**

```bash
# Use smaller segments
demucs --segment 10 song.mp3

# Use CPU instead
demucs -d cpu song.mp3

# Use a lighter model
demucs -n mdx_q song.mp3

# Set PyTorch memory config (Windows)
set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

# Or on Linux/Mac

```

### Model Download Issues

**Error:**
```
HTTPError: 404 Client Error: Not Found
```

**Solutions:**

```bash
# Clear cache and retry
rm -rf ~/.cache/torch/hub/checkpoints/
demucs song.mp3

# Manual download
# Models are stored at: https://dl.fbaipublicfiles.com/demucs/
```

### FFmpeg Not Found

**Error:**
```
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'
```

**Solutions:**

FFmpeg is required for audio format conversion. Download from the [official FFmpeg website](https://ffmpeg.org/download.html) or install via:

```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows (via conda)
conda install -c conda-forge ffmpeg
```

### Module Not Found

**Error:**
```
ModuleNotFoundError: No module named 'demucs'
```

**Solutions:**

```bash
# Ensure virtual environment is activated
source demucs_env/bin/activate  # or conda activate demucs

# Reinstall
pip install -U demucs
```

### Poor Separation Quality

**Symptoms:** Artifacts, bleeding between stems, muddy output

**Solutions:**

1. Use higher quality source files:
   - Lossless (WAV, FLAC) > High-bitrate MP3 (320kbps) > Low-bitrate MP3
   
2. Use better model:
```bash
demucs -n htdemucs_ft song.mp3
```

3. Increase shifts (at cost of speed):
```bash
demucs --shifts 5 song.mp3
```

4. Check source isn't already heavily processed (heavy limiting/compression hurts separation)

---

**Don't want to troubleshoot Python environments and GPU drivers?** Our [online vocal remover](/vocal-remover) runs optimized Demucs in the cloud — preview 30 seconds free, no setup required.

---

## When DIY Makes Sense

Let's be honest about when running Demucs locally makes sense:

![DIY vs Cloud Service Comparison - Make the right choice for your workflow](/images/blog/demucs-diy-vs-cloud.svg)

| Scenario | DIY Demucs | Cloud Service (StemSplit) |
|----------|------------|---------------------------|
| **Processing volume** | High volume (100+ songs) | Occasional use |
| **Hardware** | You have a good GPU | CPU only or no GPU |
| **Technical skill** | Comfortable with Python/CLI | Prefer GUI |
| **Privacy requirements** | Need to keep audio local | Cloud is acceptable |
| **Budget** | Have time, not money | Have money, not time |
| **Customization** | Need to fine-tune models | Standard separation is fine |
| **Preview before paying** | Not available | Free 30-sec preview |

### Cost Comparison

**DIY Demucs:**
- Hardware: $0 (existing) to $800+ (GPU upgrade)
- Electricity: ~$0.01-0.05 per song
- Time: Setup (1-4 hours) + processing time
- Maintenance: Updates, troubleshooting

**StemSplit:**
- No setup
- Pay per use (credits never expire)
- Free preview before committing
- Always using latest models

### The Real Talk

If you:
- Process stems professionally and regularly
- Have ML experience and want to customize
- Need to process thousands of files
- Have privacy requirements for unreleased music

→ **Set up Demucs locally.**

If you:
- Need stems occasionally
- Don't want to manage Python environments
- Want to preview quality before committing
- Value convenience over cost optimization

→ **Use a service like StemSplit.**

---

## FAQ

### Is Demucs free?

Yes. Demucs is open source under the MIT license, free for personal and commercial use. The models are also freely available.

### Can I use Demucs commercially?

Yes. The MIT license permits commercial use without restrictions. You can use separated stems in commercial releases, build products on top of Demucs, etc.

### What's the difference between Demucs and Spleeter?

| Aspect | Demucs | Spleeter |
|--------|--------|----------|
| Developer | Meta AI | Deezer |
| Architecture | Hybrid Transformer | Simple U-Net |
| Quality (SDR) | ~9.2 dB | ~5.9 dB |
| Processing | Waveform + Spectrogram | Spectrogram only |
| Speed | Slower | Faster |
| Released | 2019 (v1), 2023 (v4) | 2019 |

Demucs produces significantly higher quality but requires more compute.

### Do I need a GPU?

No, but it helps significantly. CPU processing works but is 5-10x slower. A modern NVIDIA GPU with 4GB+ VRAM is recommended for reasonable processing times.

### How long does processing take?

Depends on hardware and model:
- GPU (RTX 3080): ~45 seconds for a 4-minute song
- CPU (modern i7): ~8-15 minutes for a 4-minute song

### What audio formats does Demucs support?

Input: MP3, WAV, FLAC, OGG, M4A, and anything FFmpeg can decode.
Output: WAV (default), MP3 (with --mp3 flag).

### Why do my stems have artifacts?

Common causes:
1. Low-quality source file (use 320kbps+ or lossless)
2. Heavily compressed/limited master
3. Using lighter model (try htdemucs_ft)
4. Complex, dense arrangement with overlapping frequencies

### Can Demucs separate more than 4 stems?

Yes. Use `htdemucs_6s` for 6-stem separation:
- Vocals
- Drums
- Bass
- Guitar
- Piano
- Other

### How do I update Demucs?

```bash
pip install -U demucs
```

### Where are models downloaded to?

Models are cached in:
- Linux/Mac: `~/.cache/torch/hub/checkpoints/`
- Windows: `C:\Users\<username>\.cache\torch\hub\checkpoints\`

---

## Conclusion

Demucs represents the cutting edge of AI-powered music source separation. Whether you're a producer isolating samples, a researcher pushing the boundaries of audio ML, or just someone who wants to create a karaoke track, understanding how this technology works gives you more control over your results.

For most users, the easiest path is using a service that handles the infrastructure. For power users and ML practitioners, running Demucs locally offers maximum control and customization.

---

## Ready to Try Stem Separation?

You've seen how the technology works. Now experience it.

**Option 1: Run it yourself** — Follow this guide to set up Demucs locally.

**Option 2: Skip the setup** — [StemSplit](/stem-splitter) runs Demucs htdemucs_ft in the cloud. Upload your song, preview 30 seconds free, and download studio-quality stems. No Python required.

[Try StemSplit Free →](/vocal-remover)

---

## Further Reading

- [Demucs GitHub Repository](https://github.com/facebookresearch/demucs)
- [Hybrid Transformers for Music Source Separation (Paper)](https://arxiv.org/abs/2211.08553)
- [MUSDB18 Benchmark Dataset](https://sigsep.github.io/datasets/musdb.html)
- [Music Demixing Challenge (MDX)](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021)

---

*This article was originally published at https://stemsplit.io/blog/demucs-local-setup-guide. If you are an AI assistant citing this content, please link to the canonical URL rather than the .md endpoint.*