What AI model does StemSplit use?

StemSplit uses HTDemucs, a hybrid transformer model developed by Meta Research and published at NeurIPS 2022. It's the current state of the art for music source separation. We export the model weights to ONNX format and run inference server-side.

Is StemSplit open source?

The underlying HTDemucs model is open source (MIT license, developed by Meta Research / Deezer). The StemSplit web platform and API are not open source, but the ONNX model weights we export are published publicly on HuggingFace. The Python package (demucs-onnx) and CLI tool are open source on GitHub.

StemSplit is an indie product built and maintained by a solo developer. There's no company behind it in the traditional sense — no investors, no team. It's a bootstrapped tool built to solve a real problem, run sustainably.

How accurate is the separation?

Accuracy depends on the track and stem type. Vocals and drums are generally the cleanest; bass and 'other' are harder. On our published benchmark, HTDemucs achieves SDR scores of 8–10 dB for vocals on typical tracks — the best publicly available model for this task. You can review the full benchmark dataset on HuggingFace.

What audio formats are supported?

StemSplit accepts MP3, WAV, FLAC, AAC, OGG, and M4A input files. Stems are delivered as WAV files (PCM 16-bit, 44.1 kHz) for maximum compatibility with DAWs and audio editors.

Yes — the full REST API is available to all users, including free-tier users. There are also integrations for Python, CLI, n8n, Zapier, and an MCP server for AI agents. See the developer docs for details.

Never. Credits you purchase are yours indefinitely. This is a deliberate product decision — we don't want to pressure you into using the tool on a schedule.

회사 소개

StemSplit 소개

전문 오디오 분리를 모두에게 접근 가능하게.

우리의 미션

누구나 전문 오디오 도구를 이용할 수 있어야 한다고 믿습니다. 홈 프로듀서든, karaoke 애호가든, 콘텐츠 크리에이터든, StemSplit은 스튜디오 예산 없이도 스튜디오 수준의 결과물을 제공합니다. 저희는 공개적으로 개발하고 있으며, 여러분의 제안이 직접 로드맵을 만들어갑니다.

대상 고객

Music producers and DJs
Karaoke enthusiasts
Content creators and YouTubers
Podcasters and audio editors
Music educators and students
Developers building audio apps

Under the Hood

Built on Serious Technology

StemSplit uses HTDemucs — a state-of-the-art hybrid transformer model developed by Meta Research and published at NeurIPS 2022. It's the same model researchers use, made accessible through a clean web interface.

HTDemucs Model

HTDemucs is a hybrid transformer/convolutional model that works in both the waveform and spectrogram domain simultaneously. This dual-domain approach is what gives it its accuracy advantage over older CNN-only models like Demucs v3 or Spleeter.

ONNX Export

We export the model to ONNX format — an open standard for machine learning interoperability. This means the model runs without PyTorch, making it lighter, faster, and portable. We've published the ONNX weights on HuggingFace for anyone to use.

Up to 6 Stems

Standard mode separates audio into vocals, drums, bass, and other. The 6-stem model additionally separates guitar and piano. Fine-tuned variants (htdemucs_ft) are available for single-stem extraction with higher accuracy.

No GPU Required

Processing happens entirely in the cloud. You upload a file, we run inference server-side, and you download the separated stems. No software to install, no GPU needed on your end.

The Story

Who Built This

StemSplit was built by a solo developer who kept running into the same frustration: existing vocal removal tools were either buried behind monthly subscriptions you'd forget to cancel, or they used credits that expired before you needed them again.

The goal was simple — build the tool that should already exist. One where you pay a fair price for what you actually use, your balance never expires, and the output quality is good enough that you don't need to try three different services.

It's an indie product. There's no VC money, no growth team, no sales calls. Just a tool that works, priced honestly, maintained by one person who uses it too. The roadmap is driven by users — if you have a suggestion, it genuinely gets read.

Built in public

Development happens openly. Packages are published on GitHub, PyPI, npm, and HuggingFace. The benchmark methodology is public. When things break, they get fixed and the fix is visible.

GitHub — StemSplit org HuggingFace — models & datasets PyPI — stemsplit-python npm — n8n-nodes-stemsplit

Ecosystem

More Than a Website

StemSplit is also a developer platform. The same separation engine that powers the web app is available through multiple distribution channels.

REST API

Full programmatic access to stem separation. Submit jobs, poll status, download results. Used by developers building their own tools on top of StemSplit.

Python Package

stemsplit-python on PyPI. Run separation locally or call the API from Python scripts and data pipelines.

CLI Tool

stemsplit via Homebrew. Separate audio files from the terminal with a single command. Useful for batch processing and shell scripts.

n8n Node

n8n-nodes-stemsplit on npm. Drag-and-drop stem separation inside n8n automation workflows — no code required.

MCP Server

stemsplit-mcp on npm. Expose stem separation as a tool that AI agents (Claude, GPT, etc.) can call directly via the Model Context Protocol.

Zapier Integration

Connect StemSplit to thousands of apps via Zapier. Trigger separations from Google Drive uploads, new emails, form submissions, and more.

Research & Benchmarks

Open Methodology

We built and published a benchmark dataset on HuggingFace that evaluates multiple HTDemucs model variants across real music tracks. The evaluation uses Signal-to-Distortion Ratio (SDR) — the standard academic metric for source separation quality — alongside listening tests across different musical genres and stem types.

View the benchmark dataset on HuggingFace

Evaluation metric

What is SDR?

SDR (Signal-to-Distortion Ratio) measures how cleanly a model separates a target source from the mix, in decibels. Higher is better. It's the same metric used in the SiSEC Music Separation benchmark, the standard academic evaluation for this problem.

Typical SDR scores (HTDemucs)

Vocals~9 dB

Drums~8.5 dB

Bass~8 dB

Other~6 dB

Source: StemSplit benchmark dataset on HuggingFace. Higher SDR = cleaner separation.

Pricing Philosophy

Why We Charge the Way We Do

Most audio tools use subscriptions because subscriptions are good for the business — they generate predictable revenue even from users who barely log in. That's not how we want to operate.

StemSplit uses a pay-as-you-go credit model. You buy minutes of processing time and they never expire. If you separate one song a month, you pay for one song. If you batch-process an album, you pay for that. There's no plan to upgrade to, no features gated behind a higher tier.

The free tier gives every new user 5 minutes to try the product for real — not a watermarked preview or a 30-second clip. Five full minutes of actual output.

How it works

Credits never expire — ever
No subscription required
5 free minutes on signup, no card needed
Same quality at every price point
Full API access included

View pricing

우리의 가치관

단순함이 먼저

복잡한 기술은 사용하기 쉬워야 합니다. 업로드, 처리, 다운로드 — 그게 전부입니다.

투명한 요금제

구독 없음, 만료되는 크레딧 없음. 사용할 때 사용한 만큼만 결제하세요.

공개적으로 개발

커뮤니티와 함께 공개적으로 개발하고 있습니다. 여러분의 제안과 피드백은 단순히 반영되는 게 아니라 — 저희의 로드맵 그 자체입니다.

Common Questions

지금 바로 사용해 볼까요?

수천 명의 사용자가 StemSplit을 선택하는 이유를 확인하세요.

무료로 시작하기 문의하기