What AI model does StemSplit use?

StemSplit uses HTDemucs, a hybrid transformer model developed by Meta Research and published at NeurIPS 2022. It's the current state of the art for music source separation. We export the model weights to ONNX format and run inference server-side.

Is StemSplit open source?

The underlying HTDemucs model is open source (MIT license, developed by Meta Research / Deezer). The StemSplit web platform and API are not open source, but the ONNX model weights we export are published publicly on HuggingFace. The Python package (demucs-onnx) and CLI tool are open source on GitHub.

StemSplit is an indie product built and maintained by a solo developer. There's no company behind it in the traditional sense — no investors, no team. It's a bootstrapped tool built to solve a real problem, run sustainably.

How accurate is the separation?

Accuracy depends on the track and stem type. Vocals and drums are generally the cleanest; bass and 'other' are harder. On our published benchmark, HTDemucs achieves SDR scores of 8–10 dB for vocals on typical tracks — the best publicly available model for this task. You can review the full benchmark dataset on HuggingFace.

What audio formats are supported?

StemSplit accepts MP3, WAV, FLAC, AAC, OGG, and M4A input files. Stems are delivered as WAV files (PCM 16-bit, 44.1 kHz) for maximum compatibility with DAWs and audio editors.

Yes — the full REST API is available to all users, including free-tier users. There are also integrations for Python, CLI, n8n, Zapier, and an MCP server for AI agents. See the developer docs for details.

Never. Credits you purchase are yours indefinitely. This is a deliberate product decision — we don't want to pressure you into using the tool on a schedule.

Chi Siamo

Chi è StemSplit

Rendere la separazione audio professionale accessibile a tutti.

La nostra missione

Crediamo che tutti dovrebbero avere accesso a strumenti audio professionali. Che tu sia un produttore amatoriale, un appassionato di karaoke o un creatore di contenuti, StemSplit ti dà risultati di qualità studio senza il budget da studio. Stiamo costruendo pubblicamente, e i tuoi suggerimenti plasmano direttamente la nostra roadmap.

Chi serviamo

Produttori musicali e DJ
Appassionati di karaoke
Creatori di contenuti e YouTuber
Podcaster ed editor audio
Educatori e studenti di musica
Sviluppatori che creano app audio

Under the Hood

Built on Serious Technology

StemSplit uses HTDemucs — a state-of-the-art hybrid transformer model developed by Meta Research and published at NeurIPS 2022. It's the same model researchers use, made accessible through a clean web interface.

HTDemucs Model

HTDemucs is a hybrid transformer/convolutional model that works in both the waveform and spectrogram domain simultaneously. This dual-domain approach is what gives it its accuracy advantage over older CNN-only models like Demucs v3 or Spleeter.

ONNX Export

We export the model to ONNX format — an open standard for machine learning interoperability. This means the model runs without PyTorch, making it lighter, faster, and portable. We've published the ONNX weights on HuggingFace for anyone to use.

Up to 6 Stems

Standard mode separates audio into vocals, drums, bass, and other. The 6-stem model additionally separates guitar and piano. Fine-tuned variants (htdemucs_ft) are available for single-stem extraction with higher accuracy.

No GPU Required

Processing happens entirely in the cloud. You upload a file, we run inference server-side, and you download the separated stems. No software to install, no GPU needed on your end.

The Story

Who Built This

StemSplit was built by a solo developer who kept running into the same frustration: existing vocal removal tools were either buried behind monthly subscriptions you'd forget to cancel, or they used credits that expired before you needed them again.

The goal was simple — build the tool that should already exist. One where you pay a fair price for what you actually use, your balance never expires, and the output quality is good enough that you don't need to try three different services.

It's an indie product. There's no VC money, no growth team, no sales calls. Just a tool that works, priced honestly, maintained by one person who uses it too. The roadmap is driven by users — if you have a suggestion, it genuinely gets read.

Built in public

Development happens openly. Packages are published on GitHub, PyPI, npm, and HuggingFace. The benchmark methodology is public. When things break, they get fixed and the fix is visible.

GitHub — StemSplit org HuggingFace — models & datasets PyPI — stemsplit-python npm — n8n-nodes-stemsplit

Ecosystem

More Than a Website

StemSplit is also a developer platform. The same separation engine that powers the web app is available through multiple distribution channels.

REST API

Full programmatic access to stem separation. Submit jobs, poll status, download results. Used by developers building their own tools on top of StemSplit.

Python Package

stemsplit-python on PyPI. Run separation locally or call the API from Python scripts and data pipelines.

CLI Tool

stemsplit via Homebrew. Separate audio files from the terminal with a single command. Useful for batch processing and shell scripts.

n8n Node

n8n-nodes-stemsplit on npm. Drag-and-drop stem separation inside n8n automation workflows — no code required.

MCP Server

stemsplit-mcp on npm. Expose stem separation as a tool that AI agents (Claude, GPT, etc.) can call directly via the Model Context Protocol.

Zapier Integration

Connect StemSplit to thousands of apps via Zapier. Trigger separations from Google Drive uploads, new emails, form submissions, and more.

Research & Benchmarks

Open Methodology

We built and published a benchmark dataset on HuggingFace that evaluates multiple HTDemucs model variants across real music tracks. The evaluation uses Signal-to-Distortion Ratio (SDR) — the standard academic metric for source separation quality — alongside listening tests across different musical genres and stem types.

View the benchmark dataset on HuggingFace

Evaluation metric

What is SDR?

SDR (Signal-to-Distortion Ratio) measures how cleanly a model separates a target source from the mix, in decibels. Higher is better. It's the same metric used in the SiSEC Music Separation benchmark, the standard academic evaluation for this problem.

Typical SDR scores (HTDemucs)

Vocals~9 dB

Drums~8.5 dB

Bass~8 dB

Other~6 dB

Source: StemSplit benchmark dataset on HuggingFace. Higher SDR = cleaner separation.

Pricing Philosophy

Why We Charge the Way We Do

Most audio tools use subscriptions because subscriptions are good for the business — they generate predictable revenue even from users who barely log in. That's not how we want to operate.

StemSplit uses a pay-as-you-go credit model. You buy minutes of processing time and they never expire. If you separate one song a month, you pay for one song. If you batch-process an album, you pay for that. There's no plan to upgrade to, no features gated behind a higher tier.

The free tier gives every new user 5 minutes to try the product for real — not a watermarked preview or a 30-second clip. Five full minutes of actual output.

How it works

Credits never expire — ever
No subscription required
5 free minutes on signup, no card needed
Same quality at every price point
Full API access included

View pricing

In cosa crediamo

Semplicità prima di tutto

La tecnologia complessa dovrebbe essere semplice da usare. Carica, elabora, scarica — è tutto.

Prezzi trasparenti

Niente abbonamenti, niente crediti che scadono. Paga quello che usi, quando lo usi.

Costruzione in pubblico

Stiamo costruendo pubblicamente con la nostra comunità. I tuoi suggerimenti e feedback non sono solo ascoltati — sono la nostra roadmap.

Common Questions

Pronto a provare?

Scopri perché migliaia di utenti scelgono StemSplit.

Inizia Gratis Contattaci