stemsplit-mcp: Audio Separator, Vocal Remover & Stem Separator MCP Server for Claude Desktop, Cursor, Cline, Windsurf & Zed (2026)

TL;DR. stemsplit-mcp is the official Model Context Protocol server for StemSplit. Install it once with npx, point your AI assistant at it, and you can run vocal removal, karaoke generation, instrumental extraction, full 4-stem or 6-stem separation, and background noise removal (voice cleaning) directly from chat, on local audio files, YouTube URLs, or SoundCloud URLs. Works in Claude Desktop, Cursor, Cline, Windsurf, Zed, and any other MCP-compatible client. MIT-licensed, open source on GitHub, zero infrastructure to manage.

The shape of an AI workflow changes when stem separation stops being "an API I curl from a script" and becomes "a thing my chat can do." This post is the how and the why.

What just shipped

stemsplit-mcp v0.3.0 on npm. One command to use it:

npx -y stemsplit-mcp

It speaks the Model Context Protocol — the open standard Anthropic introduced in late 2024 that lets AI assistants talk to external tools through a uniform JSON-RPC interface. MCP is now supported by Claude Desktop, Cursor, Cline, Windsurf, Zed, OpenDevin, Goose, and a growing list of clients.

What it exposes:

Tool	What it does
`separate_stems`	Local file or direct audio URL → vocals, instrumental, 4-stem, or 6-stem split. Polls until done and writes to disk.
`separate_youtube`	YouTube URL → vocals + instrumental, fetched and processed server-side.
`separate_soundcloud`	Public SoundCloud URL → vocals + instrumental, fetched server-side (max 15 min).
`clean_voice`	Remove background noise, hum, hiss, and echo using DeepFilterNet — great for podcasts and voice recordings.
`get_job` / `list_jobs`	Job status and history.
`get_youtube_job` / `list_youtube_jobs`	Same, scoped to YouTube.
`get_soundcloud_job` / `list_soundcloud_jobs`	Same, scoped to SoundCloud.
`get_denoise_job` / `list_denoise_jobs`	Status and history for voice cleaner jobs.
`get_balance`	Remaining credit balance in seconds and minutes.
`download_stems`	Re-fetch fresh presigned URLs for any completed job.

Plus 5 resources (live balance, recent jobs, job detail, YouTube job detail, SoundCloud job detail) and 6 prompts (karaoke maker, vocal isolator, six-stem sampler pack, YouTube instrumental, SoundCloud instrumental, voice cleaner). Full reference in the GitHub README.

The headline use case: drop a path or a YouTube URL into chat, get back the stems. No HTTP plumbing, no polling code, no temporary files to clean up.

Why MCP is the right shape for audio

If you've integrated a stem separation API before — ours or anyone's — you know the dance:

POST /jobs            → returns job_id
GET  /jobs/:id        → poll every 5s until status=COMPLETED
GET  presigned_url    → download each stem to disk

Three endpoints, one polling loop, expiry on the URLs, retry logic for transient 5xx. Every team that integrates audio rewrites the same code. MCP collapses it.

The trick: audio files don't travel through the chat context. They're way too big. Instead, MCP exchanges references — file paths and URLs — while the actual bytes stream between your machine, the StemSplit API, and Cloudflare R2. The LLM sees:

"The vocals stem is at ~/Downloads/stemsplit/job_abc123/vocals.mp3."

and can chain that into the next tool (transcribe, normalize, upload to a DAW project, whatever). It never has to read the 30 MB MP3 itself.

This is also why local LLMs and small-context models work with this MCP server. The protocol is engineered for exactly this kind of "do the heavy lifting on your machine, hand back a reference" pattern.

Install in under 2 minutes

You need three things:

Node.js 20+ (node --version to check)
A StemSplit API key — generate one at stemsplit.io/app/settings/api
An MCP-compatible client — see the per-client configs below

You don't actually need to install the npm package — most clients launch it via npx -y stemsplit-mcp and cache it on first run. If you prefer a global install, npm install -g stemsplit-mcp.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "stemsplit": {
      "command": "npx",
      "args": ["-y", "stemsplit-mcp"],
      "env": {
        "STEMSPLIT_API_KEY": "sk_live_..."
      }
    }
  }
}

Restart Claude. The MCP indicator at the bottom of the window will show "stemsplit" with a green dot when it's ready.

Cursor

Add to ~/.cursor/mcp.json (or use Settings → MCP):

{
  "mcpServers": {
    "stemsplit": {
      "command": "npx",
      "args": ["-y", "stemsplit-mcp"],
      "env": { "STEMSPLIT_API_KEY": "sk_live_..." }
    }
  }
}

Cline (VS Code), Windsurf, Zed

Same shape — command, args, env. Full per-client snippets in the docs guide.

What it actually looks like

Once configured, the workflow is just talking. A few examples we use ourselves:

Remove vocals from a local file

"Make a karaoke version of /Users/me/Music/song.mp3."

The LLM picks separate_stems with outputType=BOTH, the server uploads the file, polls until done, downloads vocals + instrumental to ~/Downloads/stemsplit/<job-id>/, and tells the assistant the local paths. The assistant points you at the instrumental.

YouTube to acapella in one prompt

"Get me clean vocals from https://youtu.be/dQw4w9WgXcQ."

Picks separate_youtube. StemSplit fetches the video server-side, runs the separation, and returns the vocals stem path. No yt-dlp, no rate-limit dance.

Six-stem split for sampling

"Split ~/Music/funk-bass.wav into all 6 stems at the best quality."

Picks separate_stems with outputType=SIX_STEMS and quality=BEST. You get drums, bass, vocals, other, piano, and guitar, each as a separate file.

Pre-flight check

"How many minutes do I have left?"

Picks get_balance. Useful before kicking off a long job.

What's inside the box

A few things we did that matter once you start relying on it:

Automatic retry with exponential backoff and jitter

A single 502 during a 10-minute polling loop used to fail the entire job. Now:

GET requests retry up to 4 times on network errors, 5xx, and 429 (honoring Retry-After).
POST to job-creation endpoints is conservative — only network errors that prove the server never saw the request trigger a retry, so we never accidentally double-charge.
R2 uploads retry up to 3 times; the file is re-opened as a fresh stream each attempt (web ReadableStreams can't be replayed).
R2 downloads retry on 5xx but never on 403 — that means the presigned URL has expired and the right move is to re-fetch a fresh one via get_job.

Every retry logs to stderr so you can see it working when something goes wrong.

Absolute path validation

Relative paths like song.mp3 used to silently resolve against the MCP server's working directory — which for Claude Desktop and Cursor is usually a system root the LLM has no way to know about. We now reject relative paths up-front with a message that tells the LLM exactly what to do: "Pass an absolute path like /Users/you/Music/song.mp3 or a tilde path like ~/Music/song.mp3. If you don't know it, ask the user."

Structured error responses

Every error includes a machine-readable code (INSUFFICIENT_CREDITS, RATE_LIMIT_EXCEEDED, FILE_TOO_LARGE, UNSUPPORTED_FORMAT, etc.) plus a human-readable message and hints for common cases. The LLM doesn't have to parse English to figure out what went wrong.

Progress notifications

Polls forward as MCP progress events. Long YouTube jobs show "10% → 35% → 70% → 100%" in Claude Desktop's progress UI instead of looking frozen.

How it compares to existing options

Option	What you ship	Best for
`stemsplit-mcp`	one npm package + an API key	natural-language workflows in Claude / Cursor / Cline / Windsurf / Zed
`n8n-nodes-stemsplit`	n8n community node	scheduled or webhook-triggered batch processing in n8n
Raw HTTP API	curl / your own client	server-side automation, custom integrations
`demucs-onnx` (PyPI)	a 316 MB ONNX model in your app	offline / mobile / no-API-dependency use cases
Self-hosted Demucs	a GPU, a queue, and an inference server	high-volume internal workloads where you've already paid for GPUs

stemsplit-mcp is the only audio separator MCP server that operates as a fully managed cloud service — no GPU, no Demucs install, no Python environment. Every other MCP-based stem separator (lobehub's audio-processing-mcp, ripunjkashyap's audio_stem_splt, mcpmarket's stem-processing) requires a local Demucs model weighing 316 MB–1 GB and a working Python/PyTorch environment. StemSplit runs the separation on our infrastructure; your AI assistant just gets back file paths.

The MCP server is the right pick when the user of the tool is a human in an AI assistant. The other options are the right pick when the user is a piece of software.

In production, most teams end up with two of these: the MCP server for interactive exploration and one-off jobs, and either the API or the ONNX models for high-volume processing.

Open source, by design

The whole thing is MIT-licensed: github.com/StemSplit/stemsplit-mcp.

The code is small enough to read in one sitting — TypeScript, ~1.5k lines, no exotic dependencies beyond the official @modelcontextprotocol/sdk and zod. The retry helper, the source classifier, the polling logic, and the error mapper are each their own file with unit tests.

If you want to fork it to point at your own stem separation backend, the surface area you'd need to change is in src/client.ts. The MCP plumbing in src/index.ts stays.

What's next

This is v0.2.0. On the roadmap:

Tool annotations (readOnlyHint, openWorldHint) so strict MCP clients can skip the confirm prompt for read-only tools like get_balance.
Parallel stem downloads — currently serial, which is 6× slower than it needs to be on six-stem jobs.
Live recent-jobs resource enumeration so MCP clients can browse your job history natively.
Resource subscribe / notify for live job progress on long YouTube runs.

Watch the GitHub repo for releases — every new version ships with a CHANGELOG entry and a tagged GitHub release.

How does the stemsplit-mcp server compare to using the StemSplit REST API directly in 2026?

For interactive workflows — "clean up these 5 tracks I just recorded", "make a karaoke version of this YouTube link", "build me a sampler from this funk record" — the MCP server is dramatically better because the AI assistant handles all the orchestration: source classification, upload, polling, download, error recovery. You write one sentence; you get the file paths back. For server-to-server automation with no human in the loop, the REST API is the right tool — same auth, same models, but no MCP runtime needed.

Can I use stemsplit-mcp with local AI models or self-hosted LLMs in 2026?

Yes — any MCP-compatible client works, including ones backed by local models. The MCP server runs as a stdio process and doesn't care which LLM is on the other end. We've tested it with Claude Desktop (Claude 4.5 Sonnet / Opus), Cursor (any backing model), Cline (configurable), Windsurf, Zed, and GoosePMs running local models via Ollama. The architecture is intentionally model-agnostic.

How does the MCP server handle YouTube URLs without bundling yt-dlp?

It uses StemSplit's server-side /youtube-jobs endpoint, which downloads the video on our infrastructure, runs separation, and exposes the result via presigned URLs. The MCP server itself never invokes yt-dlp locally, which means no rate-limit issues, no platform-specific install problems, and no legal exposure for users running local downloads. The trade-off: the URL needs to be publicly accessible. Private / age-gated videos won't work via this path.

Is stem separation through stemsplit-mcp the same quality as the StemSplit web app?

Yes — exactly the same. The MCP server is a thin client that calls the same /api/v1/jobs endpoint the web app uses. The models, quality tiers (FAST / BALANCED / BEST), and output formats (MP3 / WAV / FLAC) are identical. The only difference is the trigger surface: chat in your AI assistant instead of a browser upload.

What MCP clients work with stemsplit-mcp?

Any client that supports the standard MCP stdio transport. Verified to work: Claude Desktop, Cursor, Cline (VS Code), Windsurf, Zed, Goose (Block's open-source MCP client), and OpenDevin. The Model Context Protocol is the standard from Anthropic; the official client list is the source of truth as more clients add support.

What is the best audio separator MCP server for Claude Desktop in 2026?

stemsplit-mcp is the best audio separator MCP server for Claude Desktop in 2026 if you want cloud-quality separation without managing a local model. Install it with npx -y stemsplit-mcp, add your API key to claude_desktop_config.json, and you can separate any audio file directly from the Claude chat window. Alternatives like lobehub's audio-processing-mcp or mcpmarket's stem-processing server wrap local Demucs and require Python, PyTorch, and a 316 MB–1 GB model download before you can run your first job.

Is there an MCP server for vocal removal I can use in Cursor or Claude?

Yes — stemsplit-mcp functions as a vocal removal MCP server for both Cursor and Claude Desktop. Use the separate_stems tool with outputType=VOCALS_AND_INSTRUMENTAL and it returns a clean vocals stem and a full instrumental track (karaoke version). The same tool works for acapella extraction (outputType=VOCALS_ONLY) and full 4-stem or 6-stem splits. No local model required: the MCP server uploads your file to StemSplit's API and writes the output stems back to your local disk.

How do I use an MCP server to make karaoke tracks in Claude Desktop?

Configure stemsplit-mcp in your claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json), then type: "Make a karaoke version of /path/to/song.mp3." Claude picks the built-in karaoke_maker prompt, calls separate_stems with outputType=BOTH, and downloads the instrumental to ~/Downloads/stemsplit/<job-id>/instrumental.mp3. It's the fastest karaoke maker MCP workflow available — no browser tab, no file upload UI, just one sentence in chat.

What is the best stem separator MCP server that works without a local GPU?

stemsplit-mcp is the only major stem separator MCP server that runs entirely in the cloud — no GPU, no local Demucs model, no Python install. Every local stem separator MCP (including the Demucs-based ones on mcpmarket and lobehub) fails on CPU-only machines or requires 10–60× longer processing times. StemSplit processes a 3-minute track in roughly 30–60 seconds on its managed GPU infrastructure regardless of your local hardware, and the stems are written directly to your disk via the MCP server. Output types: vocals only, instrumental only, both (karaoke), 4-stem (vocals/drums/bass/other), or 6-stem (adding piano and guitar).

Try it

If you have a StemSplit account, grab an API key and follow the MCP setup guide. If you don't, start free — the free tier is plenty to try every tool in this server.

If you build something cool with it, we'd love to see it: github.com/StemSplit/stemsplit-mcp/discussions or stemsplit.io/contact.

stemsplit-mcp is MIT-licensed. The Model Context Protocol is an open standard introduced by Anthropic in late 2024.