Local-first AI co-producer for Ableton Live.
prompt2midi analyzes a reference track, turns the musical evidence into producer-readable structure, generates editable MIDI, creates an original inspired loop package, and prepares a Suno-ready prompt. The core path is designed to run locally: JUCE is the Ableton-facing client, Node owns the job API, and Python owns audio analysis and generation helpers.
The product goal is not to copy songs. It is a pre-production tool for making new material from reference traits: tempo, key, groove, drum feel, bass movement, arrangement energy, and production language.
prompt2midi is open source. Producers, artists, engineers, researchers, and tool builders are welcome to contribute.
There are two main ways to use it:
- DAW inspiration starter: generate editable ideas, MIDI, arrangement notes, and prompts that a producer can continue shaping in Ableton, Logic, FL Studio, Bitwig, or any other DAW.
- Pre-SUNO tool: turn an inspired idea/reference into a cleaner prompt, structure guide, and optional proxy package that an artist can finish in SUNO or continue developing locally.
The repo already contains a working local vertical slice:
- JUCE plugin UI for WAV/MP3 selection, prompt entry, job polling, result display, and prompt copy.
- Local Node backend on
127.0.0.1:47321. - Python WAV analysis with BPM, key, energy, loudness, spectral features, genre/groove hints, chords, structure, stems, transcription, composition, and optional audio generation.
- Deterministic inspired-loop generator that writes
bass.mid,drums.mid,chords.mid,melody.mid,full_loop.mid,summary.json, andprompt.txt. - Optional model paths for Basic Pitch, Demucs, CLAP genre detection, Gemini Suno prompt generation, ACE-Step, and MusicGen.
This is still an MVP/research codebase. Some outputs are production useful, but model transcription, stem splitting, and MIDI mapping are still weak in places and should be treated as editable evidence, not finished arrangements. Improving stem separation, source-aware MIDI mapping, and full JUCE AU/VST integration are the next major production-readiness features.
developis the default integration branch for daily work.- Feature branches start from
developand merge back intodevelop. mainis release-only. Whendevelopis ready to ship, mergedevelopintomainand create a version tag.- Do not open normal feature PRs directly into
main.
Producers often know what they like about a reference record but cannot quickly turn that into reusable production material. prompt2midi closes that gap:
- Drop in a reference track.
- Extract musical facts and evidence.
- Generate original MIDI parts that match the useful traits, not the exact song.
- Generate a clear AI-music prompt for SUNO or similar tools.
- Optionally render a local reference-inspired sample before uploading anything elsewhere.
- Finish the idea either inside a DAW or inside SUNO.
When a generation service rejects direct artist/song prompts, the correct workflow is not to bypass the filter. prompt2midi uses the reference to identify neutral production traits, then creates new musical assets and a prompt that avoids artist imitation, copied hooks, lyrics, and vocal likeness.
Example framing:
- Avoid:
make a Michael Jackson songorcopy the bassline from this record. - Prefer:
1980s pop-funk feel, tight dance groove, bright chord stabs, syncopated bass movement, crisp drums, original melody, no copied lyrics, no vocal imitation.
The tool is designed to reduce copying risk by creating original material and by describing musical traits instead of requesting a clone. It does not guarantee legal clearance, does not replace rights review, and should not be used to bypass copyright or platform policies.
prompt2midi is a local-first desktop production system. The plugin is only the DAW-facing client; the local backend owns job orchestration; Python owns audio intelligence and generation helpers; optional model services improve output quality without becoming required for the core workflow.
flowchart TD
Producer["Producer in Ableton Live"] --> Plugin["JUCE plugin UI<br/>file/prompt input, progress, result display"]
Plugin -->|POST /analyze| Node["Local Node backend<br/>127.0.0.1:47321"]
Node --> Jobs["Job store + progress events<br/>queued/running/succeeded/failed"]
Node --> Decode["Input validation + FFmpeg MP3 decode<br/>WAV passed to Python"]
Decode --> Python["Python analysis package<br/>analysis/analyze.py"]
Python --> Core["Core facts<br/>BPM, key, loudness, energy, spectral features"]
Python --> Deep["Optional deeper analysis<br/>chords, drums, structure, stems, transcription"]
Python --> Compose["Original composition package<br/>bass/drums/chords/melody/full_loop MIDI"]
Python --> Arrange["Arrangement Lock / full-track proxy<br/>maps, reports, guide MIDI, proxy audio"]
Compose --> Exports["Local exports<br/>MIDI, summary.json, prompt.txt"]
Arrange --> Exports
Python --> Node
Node --> Prompt["Prompt layer<br/>deterministic local prompt + optional Gemini SUNO prompt"]
Node --> Result["Aggregated result JSON<br/>analysis, warnings, assets, prompts, paths"]
Result --> Plugin
Plugin --> DAW["Producer actions<br/>audition, copy prompt, import MIDI, package for SUNO"]
ACE["Optional local ACE-Step API<br/>127.0.0.1:8001"] -. audio candidates .-> Arrange
Models["Optional local engines<br/>Basic Pitch, Demucs, CLAP, MusicGen, All-In-One Docker"] -. evidence .-> Deep
Gemini["Optional cloud Gemini<br/>GEMINI_API_KEY"] -. SUNO prompt .-> Prompt
Ableton / JUCE plugin
|
| POST /analyze
v
Local Node backend
|
| validates input, decodes MP3, creates job, calls Python
v
Python analysis engine
|
| returns structured JSON, MIDI paths, composition package, optional audio sample
v
Node aggregation
|
| deterministic producer prompt, optional Gemini Suno prompt
v
JUCE result display
- The producer selects a WAV/MP3 reference and/or enters a direction in the plugin.
- The JUCE client posts the request to the localhost Node backend and keeps the audio thread pass-through.
- Node validates local paths, decodes MP3 to WAV when needed, creates a job, and publishes progress events.
- Python analyzes the WAV, writes structured JSON, MIDI evidence, composition assets, and optional arrangement/proxy artifacts.
- Node aggregates the Python result with producer-facing prompt text and optional Gemini SUNO text.
- The plugin polls status/result and displays confidence-aware output paths, warnings, and copy/export actions.
| Layer | Files | Responsibility |
|---|---|---|
| JUCE plugin | Source/PluginEditor.*, Source/PluginProcessor.*, Source/LocalApiClient.h |
UI only: choose/drop reference, send local job, poll status, show results. Audio processing stays pass-through. |
| Node backend | backend/server.js, backend/lib/* |
Local API, job state, input validation, MP3 decode, Python invocation, prompt aggregation, error normalization. |
| Python analysis | analysis/analyze.py, analysis/core/*, analysis/detectors/* |
Extract structured musical facts from WAV audio. Optional libraries improve results, but fallback paths keep the core running. |
| MIDI/transcription | analysis/midi/* |
Write MIDI, run Basic Pitch, run Demucs, expose provenance and limitations for every MIDI asset. |
| Composition | analysis/composition/* |
Generate a new original loop package from analysis hints. This is the main product output. |
| Prompting | backend/lib/promptGenerator.js, backend/lib/geminiPromptGenerator.js |
Turn structured facts into producer-facing copy and Suno prompts. Gemini is optional. |
| Audio generation | analysis/generation/* |
Optional local sample generation using ACE-Step first, then AudioCraft/MusicGen fallback paths. |
| Tooling scripts | scripts/pipelines/*, scripts/setup/*, scripts/packaging/*, scripts/services/*, scripts/dev/* |
Local CLI runners, setup commands, package builders, service launchers, and developer refresh tools. |
| Requirements/docs | requirements/*, docs/pipelines/*, docs/backend/*, docs/qa/* |
Optional engine dependency pins and topic-grouped operational docs. |
The plugin accepts:
- A local
.wav/.wave/.mp3reference file. - A text direction.
- Or prompt-only mode when no audio file is supplied.
The plugin posts JSON to the local backend:
{
"audioPath": "/absolute/path/to/reference.mp3",
"prompt": "same groove, change the bass notes a little, replace the main stab"
}backend/server.js exposes:
GET /healthPOST /analyzeGET /status?id=<job_id>GET /result?id=<job_id>
Node creates a job immediately so the plugin remains responsive. It validates absolute audio paths, rejects unsupported formats, enforces a size limit, and decodes MP3 input through ffmpeg into tmp/jobs/<job_id>/decoded-input.wav.
Node then starts python -m analysis.analyze as a child process and records pipeline events so the UI can show progress.
The Python engine reads PCM WAV and returns structured JSON. The dependency-free base path extracts:
- duration, sample rate, channel count
- energy curve
- loudness
- zero-crossing rate and peak amplitude
- approximate BPM
- approximate key
- warnings when confidence is low
Optional librosa/scipy paths improve:
- BPM estimation
- key estimation
- chord progression detection
- arrangement/section analysis
- groove descriptors
Optional CLAP genre detection uses laion/larger_clap_music through transformers when available.
Every MIDI file is labeled by source and confidence. This area is intentionally conservative: stem splitting and MIDI mapping exist, but they are not yet production-grade. They are useful for evidence, sketching, and direction, but the next feature work should improve source separation, note assignment, timing cleanup, and DAW-ready mapping.
| Asset | How it is made | Meaning |
|---|---|---|
reference-sketch.mid |
Deterministic pattern from estimated BPM/key | Generated sketch, not transcription. |
model-transcription.mid |
Basic Pitch on full mix | Model transcription candidate; needs ear correction. |
source-bass-transcription.mid |
Demucs bass stem + Basic Pitch | Stem-aware bass candidate; still may contain bleed. |
source-drum-groove.mid |
Demucs drums stem + onset detection | Quantized drum groove estimate. |
model-bass-transcription.mid |
Pitch-filtered Basic Pitch notes from full mix | Fallback bass candidate, not source-separated. |
bass-transcription.mid |
Monophonic low-frequency tracking | Legacy heuristic fallback. |
Recommended evidence exports are copied to:
tmp/jobs/<job_id>/exports/
The code intentionally distinguishes generated MIDI from transcription evidence. This matters because only source-aware paths should be described as source-aware.
Current limitations:
- Demucs-style stem splitting can bleed bass, drums, vocals, and harmonic material into each other.
- Full-mix model transcription often produces extra notes and wrong instrument ownership.
- Bass, drum, chord, and melody mappings still need stronger source-aware cleanup before they should be considered arrangement-ready.
- All extracted MIDI should be auditioned and edited in Ableton before being used as final material.
Next work:
- Improve stem-aware bass, drum, chord, and melody extraction.
- Improve mapping from analysis evidence into separate DAW tracks.
- Tighten quantization, note filtering, register selection, and confidence labels.
- Complete JUCE integration for real AU/VST plugin workflows, including more polished import/export behavior.
analysis/reference/reference_groove.py fingerprints the reference for:
- kick accents
- bass accents
- hat/percussion accents
- swing
- low-end weight
- bass note tendencies
- club energy
analysis/reference/reference_transform.py converts the user's direction into controls such as:
- preserve groove similarity
- keep bass rhythm but vary notes
- replace a stab/timbre role
- keep kick and hat feel while using new samples
This is the bridge between "I like this song" and "make a new production with similar traits."
analysis/composition/composition.py generates the main product output:
tmp/jobs/<job_id>/exports/
midi/
bass.mid
drums.mid
chords.mid
melody.mid
full_loop.mid
summary.json
prompt.txt
The generator is deterministic in structure but randomized in musical choices. It uses BPM, key, detected chords, drum pattern evidence, genre/style hints, and user direction to choose one of several composition modes:
- house
- techno
- synth wave
- hip hop
- ambient
full_loop.mid is MIDI format type 1 so a DAW can import separate tracks.
There are two prompt paths:
- Python stub prompt from
analysis/composition/composition.py, always local. - Optional Gemini prompt from
backend/lib/geminiPromptGenerator.jswhenGEMINI_API_KEYis present.
The Gemini path uses gemini-2.0-flash by default and writes a single Suno paragraph from structured analysis and composition data. If Gemini is disabled, missing, times out, or fails, the job still succeeds with the local stub prompt.
The prompt contract ends with a protective instruction:
Instrumental, no vocals. Inspired by the reference groove and production style, not a cover and not a copy.
analysis/generation/audio_generation.py can prepare a 30-second local sample before the user uploads anything to Suno.
Provider order:
- ACE-Step local API, enabled by
PROMPT2MIDI_ENABLE_ACE_STEP=1. - AudioCraft MusicGen, enabled by
PROMPT2MIDI_ENABLE_AUDIOCRAFT=1. - Transformers MusicGen Melody, enabled by
PROMPT2MIDI_ENABLE_MUSICGEN=1.
ACE-Step uses a local API at 127.0.0.1:8001 by default and model settings around:
acestep-v15-turboacestep-5Hz-lm-0.6B- MLX backend by default on macOS
The sample is scored for duration, loudness, pulse consistency, clipping, harshness, and reference groove similarity. The score helps pick the best candidate, but listening is still required.
| Engine | Required? | Purpose | Setup |
|---|---|---|---|
| Python stdlib WAV analyzer | Yes | Base BPM/key/energy/loudness/spectral analysis | Built in |
ffmpeg |
For MP3 | Decode MP3 to WAV and export reference sections | Install separately or set PROMPT2MIDI_FFMPEG |
librosa / scipy |
Optional | Better BPM/key, chords, structure, drums, groove | Python environment |
CLAP laion/larger_clap_music |
Optional | Zero-shot genre tags | Python ML deps |
| Basic Pitch | Optional | Model MIDI transcription | npm run setup:transcription |
Demucs htdemucs |
Optional | Bass/drum/other/vocal stems | npm run setup:stems |
Gemini gemini-2.0-flash |
Optional cloud | Higher quality Suno prompt | GEMINI_API_KEY=... |
| ACE-Step 1.5 | Optional local service | Reference-guided audio samples | npm run setup:ace-step, then npm run ace-step:start |
| AudioCraft MusicGen | Optional local | Fallback sample generation | npm run setup:musicgen |
| Transformers MusicGen Melody | Optional local | Legacy fallback sample generation | Set PROMPT2MIDI_ENABLE_MUSICGEN=1 with deps installed |
prompt2midi is built around reference-inspired transformation, not cloning.
It should:
- Analyze traits instead of copying a recording.
- Generate new MIDI instead of exporting copyrighted melodies as final output.
- Use neutral production language instead of artist-name prompting.
- Avoid vocals, lyrics, artist likeness, copied hooks, and exact bass/melody sequences.
- Keep every extracted/transcribed artifact labeled as evidence, not guaranteed clearance.
It should not:
- Promise that any output is free of legal issues.
- Claim to bypass Suno copyright filters.
- Recreate a protected song, master recording, vocal likeness, lyric, or signature hook.
- Tell users that a generated sample is automatically safe to upload commercially.
Use references you own, created, licensed, or are otherwise allowed to analyze. Treat the generated Suno prompt and local sample as a safer creative starting point, not legal advice.
This is an open-source project and contributions are welcome. Useful areas include:
- stronger stem separation and source-aware MIDI mapping
- better AU/VST/JUCE host integration
- Ableton, Logic, and other DAW workflow testing
- prompt packaging for SUNO and other music tools
- audio-analysis fixtures and regression tests
- documentation, examples, setup scripts, and UX polish
Branch from develop, keep changes local-first, and label extracted MIDI honestly when confidence is limited.
Install Node dependencies:
npm installStart the backend:
npm startRun the developer loop with backend logs:
npm run dev:refreshInstall optional engines:
npm run setup:transcription
npm run setup:stems
npm run setup:ace-step
npm run setup:musicgenStart ACE-Step when using local sample generation:
npm run ace-step:startUseful environment flags:
PROMPT2MIDI_DISABLE_MODEL=1
PROMPT2MIDI_DISABLE_STEMS=1
PROMPT2MIDI_DISABLE_SUNO=1
PROMPT2MIDI_DISABLE_LIBROSA=1
PROMPT2MIDI_DISABLE_GENRE=1
PROMPT2MIDI_DISABLE_CHORDS=1
PROMPT2MIDI_DISABLE_STRUCTURE=1
PROMPT2MIDI_DISABLE_DRUMS=1
PROMPT2MIDI_ENABLE_ACE_STEP=1
PROMPT2MIDI_ENABLE_AUDIOCRAFT=1
PROMPT2MIDI_ENABLE_MUSICGEN=1
GEMINI_API_KEY=...curl -s -X POST http://127.0.0.1:47321/analyze \
-H 'Content-Type: application/json' \
-d '{"audioPath":"/absolute/path/to/reference.wav","prompt":"keep the groove, vary the bass notes, replace the stab sound"}'Poll status:
curl -s 'http://127.0.0.1:47321/status?id=<job_id>'Fetch result:
curl -s 'http://127.0.0.1:47321/result?id=<job_id>'Run Python tests:
python3 -m unittest analysis.tests.test_feature_extraction
python3 -m unittest analysis.tests.test_compositionRun Node tests:
npm testCompile-check Python:
python3 -m compileall analysis- Keep the core workflow local-first.
- Do not run long jobs, subprocesses, network calls, or file-heavy analysis in JUCE
processBlock. - Node owns orchestration and aggregation.
- Python returns structured analysis JSON and file paths.
- Generated MIDI is product output; extracted MIDI is evidence.
- Optional engines must degrade to warnings, not hard job failure.
- Producer-facing copy must describe confidence and limitations honestly.