Automatic BPM and Key Detection: How It Works (2025)
Most DJs and producers spend hours manually tagging BPM and key in their music libraries. What if every track came with that metadata automatically — accurate, consistent, and ready to use?
TL;DR: StemSplit now automatically detects BPM (tempo) and musical key for every processed song using librosa — the industry-standard Python library for audio analysis. This data appears on job detail pages and is available via our API and RapidAPI endpoints. BPM detection analyzes 60 seconds for accuracy, while key detection uses 120 seconds with chroma features and key profile correlation.
What Is BPM and Key Detection?
BPM (Beats Per Minute) tells you the tempo of a track — how fast the beat is. Essential for DJs who need to match tempos between songs and producers who want to know the exact speed of a track.
Musical Key identifies the harmonic center of a song — like "C major" or "A minor." Critical for harmonic mixing, where DJs transition between songs in compatible keys for smoother blends.
Together, BPM and key metadata transform how you organize and work with music. No more guessing, no more manual entry.
How StemSplit Detects BPM and Key
We built this feature using librosa — the same Python library used by Spotify, YouTube Music, and major music production software. Here's why it's the right choice and how it works.
Why librosa?
Industry Standard: librosa is the de facto standard for music information retrieval in Python. It's used by:
- Spotify for audio analysis
- YouTube Music for content identification
- Research institutions for music information retrieval
- Professional audio software for tempo/key detection
Proven Accuracy: The algorithms in librosa are based on decades of research in music information retrieval. They're battle-tested on millions of songs and refined through academic research.
Open Source & Maintained: Unlike proprietary solutions, librosa is open source, actively maintained, and transparent about its methods. You can verify exactly how detection works.
BPM Detection Process
Our BPM detection analyzes 60 seconds of audio — the sweet spot between accuracy and speed.
How it works:
- Onset Detection — Identifies the start of musical events (beats, notes, transients)
- Tempo Estimation — Analyzes the timing between onsets to find the underlying tempo
- Beat Tracking — Refines the tempo estimate by tracking the actual beat pattern
The result: A precise BPM value rounded to one decimal place (e.g., 128.3 BPM).
Why 60 seconds? Research shows that 60 seconds captures enough musical content for reliable tempo detection. Shorter samples (<20 seconds) can be inaccurate, especially with tempo changes. Longer samples (>60 seconds) provide diminishing returns — the extra time doesn't significantly improve accuracy.
Key Detection Process
Our key detection analyzes 120 seconds of audio for maximum accuracy.
How it works:
- Chroma Feature Extraction — Analyzes the pitch class profile (which notes are present)
- Temporal Averaging — Averages chroma features across time for stability
- Key Profile Correlation — Compares the chroma profile to major and minor key templates (Krumhansl-Schmuckler profiles)
- Mode Detection — Determines whether the key is major or minor based on correlation strength
The result: A key signature like "C", "Am", "F#", or "Dm".
Why 120 seconds? Key detection needs more audio than BPM because harmonic content can vary throughout a song. 120 seconds ensures we capture the overall harmonic character, not just a single section. This is especially important for songs with key changes or complex harmonic progressions.
Why Krumhansl-Schmuckler Profiles? These are the most widely validated key profiles in music psychology research. They're based on how human listeners perceive key relationships — making our detection align with how DJs and producers actually hear music.
Where You'll See BPM and Key
On Job Detail Pages
Every completed job now shows BPM and key prominently at the top of the page — right after the title and duration. They appear in styled badges that make the information impossible to miss.
Displayed for:
- Uploaded stem separation jobs
- YouTube jobs
- SoundCloud jobs
In the API Response
BPM and key are included in the audioMetadata field for all job types.
API Endpoints:
GET /api/v1/jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.keyGET /api/v1/youtube-jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.keyGET /api/v1/soundcloud-jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.key
RapidAPI Endpoints:
GET /rapidapi/v1/jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.keyGET /rapidapi/v1/youtube-jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.keyGET /rapidapi/v1/soundcloud-jobs/{id}— ReturnsaudioMetadata.bpmandaudioMetadata.key
Example API Response
{
"id": "clxxx123...",
"status": "COMPLETED",
"audioMetadata": {
"bpm": 128.3,
"key": "Am",
"waveformPeaks": {
"vocals": [0.2, 0.5, 0.8, ...],
"instrumental": [0.3, 0.6, 0.7, ...]
}
},
"outputs": {
"vocals": {
"url": "https://storage.example.com/vocals.mp3",
"expiresAt": "2025-01-15T13:00:00Z"
}
}
}
For complete API documentation, see our Developer Reference.
Building an app that needs BPM and key data? Our API makes it easy to access this metadata programmatically. Check out our developer documentation to get started.
Technical Deep Dive: The Detection Algorithms
BPM Detection Algorithm
librosa uses a multi-stage approach:
Stage 1: Onset Detection
- Analyzes the audio signal for sudden changes in energy
- Identifies the start of musical events (drums, notes, transients)
- Creates an onset envelope — a representation of when musical events occur
Stage 2: Tempo Estimation
- Analyzes the spacing between onsets
- Uses autocorrelation to find repeating patterns
- Identifies the most likely tempo candidates
Stage 3: Beat Tracking
- Refines the tempo estimate by tracking actual beats
- Handles tempo variations and changes
- Outputs a precise BPM value
Why this works: Unlike simple peak detection, this approach understands musical structure. It doesn't just find loud sounds — it finds the underlying rhythmic pattern that defines the tempo.
Key Detection Algorithm
Our key detection uses chroma-based analysis:
Stage 1: Chroma Feature Extraction
- Converts audio to chroma features — a 12-dimensional representation
- Each dimension represents one of the 12 pitch classes (C, C#, D, D#, E, F, F#, G, G#, A, A#, B)
- Shows which pitch classes are present and how strongly
Stage 2: Temporal Averaging
- Averages chroma features across the entire analyzed segment
- Creates a stable representation of the song's harmonic content
- Reduces the impact of momentary harmonic variations
Stage 3: Key Profile Correlation
- Compares the averaged chroma profile to 24 key templates (12 major + 12 minor)
- Uses Krumhansl-Schmuckler key profiles — validated through music psychology research
- Calculates correlation coefficients for each possible key
Stage 4: Mode Selection
- Selects the key with the highest correlation
- Determines major vs. minor by comparing major and minor correlations
- Outputs the final key (e.g., "C" for C major, "Am" for A minor)
Why this works: Chroma features capture the harmonic "fingerprint" of a song. By comparing this fingerprint to known key profiles, we can identify the tonal center — the same way human listeners do.
Accuracy and Limitations
BPM Detection Accuracy
What works well:
- Clear, consistent tempos
- Well-produced commercial releases
- Songs with prominent rhythmic elements
Challenges:
- Songs with tempo changes (rubato, accelerando)
- Very slow or very fast tempos (outside 60-200 BPM range)
- Ambient or rhythmically ambiguous music
Typical accuracy: Within ±1 BPM for most commercial music.
Key Detection Accuracy
What works well:
- Songs with clear tonal centers
- Standard major/minor keys
- Well-produced commercial releases
Challenges:
- Modal music (Dorian, Mixolydian, etc.) — may detect relative major/minor
- Songs with frequent key changes
- Atonal or highly chromatic music
- Very short songs (<30 seconds)
Typical accuracy: 85-95% correct key identification for standard pop/rock/electronic music.
Why Not 100% Accuracy?
Music is complex. A song might:
- Start in one key and modulate to another
- Use modal scales that don't fit major/minor templates
- Have ambiguous harmonic content
Our detection provides the primary key — the tonal center that dominates most of the song. For songs with key changes, it identifies the most prominent key.
Use Cases for BPM and Key Data
For DJs
Harmonic Mixing: Match keys between songs for smooth, musical transitions. Songs in compatible keys (like C major and A minor) blend naturally.
Tempo Matching: Know the exact BPM before mixing. No more guessing or manually tapping tempo.
Library Organization: Sort and filter your collection by BPM and key. Build playlists that flow musically.
For Producers
Remix Planning: Know the original key and tempo before starting a remix. Maintain harmonic compatibility or plan key changes intentionally.
Sample Matching: Find samples that match your project's key and tempo automatically.
Reference Tracks: Quickly identify the key and tempo of reference tracks for your own productions.
For Developers
Music Apps: Build apps that organize music by BPM and key automatically.
DJ Software Integration: Use our API to populate BPM/key fields in DJ software automatically.
Music Analysis Tools: Create tools that analyze music libraries and suggest compatible tracks.
API Integration Examples
JavaScript/TypeScript
// Get job with BPM and key
const response = await fetch('https://api.stemsplit.io/v1/jobs/{jobId}', {
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
}
});
const job = await response.json();
if (job.audioMetadata) {
console.log(`BPM: ${job.audioMetadata.bpm}`);
console.log(`Key: ${job.audioMetadata.key}`);
}
Python
import requests
response = requests.get(
f'https://api.stemsplit.io/v1/jobs/{job_id}',
headers={'Authorization': f'Bearer {api_key}'}
)
job = response.json()
if job.get('audioMetadata'):
print(f"BPM: {job['audioMetadata']['bpm']}")
print(f"Key: {job['audioMetadata']['key']}")
RapidAPI
curl --request GET \
--url 'https://stemsplit-api.p.rapidapi.com/v1/jobs/{jobId}' \
--header 'X-RapidAPI-Key: YOUR_RAPIDAPI_KEY' \
--header 'X-RapidAPI-Host: stemsplit-api.p.rapidapi.com'
For complete API documentation with all endpoints and examples, see our Developer Reference.
Controlling BPM and Key Detection
For Uploaded Files
When uploading a file for stem separation, you can choose to enable or disable audio analysis:
- Enabled (default): BPM and key are detected automatically
- Disabled: Faster processing, no BPM/key detection
This option appears in the upload interface. For most users, we recommend leaving it enabled — the analysis adds only 2-3 seconds to processing time.
For YouTube and SoundCloud Jobs
BPM and key detection are always enabled for YouTube and SoundCloud jobs. Since these jobs already include audio analysis for metadata extraction, BPM and key detection adds minimal overhead.
FAQ
How accurate is the BPM detection?
For most commercial music with consistent tempos, BPM detection is accurate within ±1 BPM. Songs with tempo changes or ambiguous rhythms may have less accurate results.
How accurate is the key detection?
Key detection achieves 85-95% accuracy for standard pop, rock, and electronic music. Modal music or songs with frequent key changes may be less accurate.
Can I disable BPM and key detection?
Yes — for uploaded files only. Use the "Enable Audio Analysis" toggle in the upload interface. YouTube and SoundCloud jobs always include BPM and key detection.
What if a song changes key or tempo?
The detection identifies the primary key and tempo — the ones that dominate most of the song. For songs with changes, it reports the most prominent values.
Is this data available via API?
Yes. BPM and key are included in the audioMetadata field for all job types. See our Developer Reference for complete API documentation.
What library does StemSplit use for detection?
We use librosa — the industry-standard Python library for music information retrieval. It's the same library used by Spotify, YouTube Music, and major audio software.
Why librosa instead of other libraries?
librosa is:
- Industry standard (used by major platforms)
- Open source and transparent
- Based on validated research
- Actively maintained
- Proven accurate on millions of songs
Can I use this data commercially?
Yes. BPM and key metadata detected by StemSplit can be used in your applications, DJ software, or music analysis tools. The data is provided as-is — you're responsible for how you use it.
How long does detection take?
BPM and key detection adds 2-3 seconds to processing time. This happens automatically during stem separation, so there's no additional wait.
The Bottom Line
Automatic BPM and key detection transforms how you work with music. No more manual tagging, no more guessing. Every song processed through StemSplit comes with accurate tempo and key metadata — ready to use in your DJ sets, productions, or applications.
Whether you're mixing tracks, planning remixes, or building music apps, having BPM and key data automatically available saves time and opens up new creative possibilities.
Start Using BPM and Key Detection
Every song you process through StemSplit now includes automatic BPM and key detection.
- ✅ Powered by librosa — industry-standard accuracy
- ✅ Available via API and RapidAPI
- ✅ Displayed prominently on job pages
- ✅ Works for uploads, YouTube, and SoundCloud
Try Stem Separation with BPM/Key Detection →