Generate multi-track audio from chord progressions. Describe how you want it to sound in plain English — AI translates your words into 26 musical parameters.
curl -X POST http://gamma.omegaai.dev/api/generate \
-H "Content-Type: application/json" \
-d '{"chords": "Dm7 G7 Cmaj7 Am7", "description": "smoky jazz bar, walking bass, sax solo"}' \
--output song.mp3
curl -X POST http://gamma.omegaai.dev/api/generate \
-H "Content-Type: application/json" \
-d '{"chords": "C G Am F"}' \
--output song.mp3
Returns an MP3 file. Generation typically takes 3–15 seconds depending on song length.
Request body (JSON):
| Field | Type | Description |
|---|---|---|
| chords | string required |
Chord progression. Space-separated symbols, newlines for sections. e.g. "Dm7 G7 Cmaj7 Am7" |
| description | string optional |
Natural language description of desired sound. GPT-4.1-mini translates this into variation parameters. e.g. "smoky jazz bar, walking bass" |
| style | string optional |
Force a specific style, overriding GPT/auto-detect. One of the 13 styles below. |
| bpm | integer optional |
Force a specific tempo, overriding GPT/auto-detect. |
Response:
Content-Type: audio/mpeg — MP3 binaryX-Style — style used (e.g. jazz)X-BPM — tempo usedX-Generation-Time — how long it tookX-Metadata — full JSON metadata including GPT paramsReturns {"status": "ok"}
Auto-generated OpenAPI/Swagger UI for testing the API interactively.
Standard chord symbols are supported. Separate chords with spaces. Use newlines (or \n in JSON) to denote sections.
| Type | Examples |
|---|---|
| Major | C, F, Bb |
| Minor | Am, Dm, F#m |
| Seventh | G7, Cmaj7, Dm7, Am7b5 |
| Extended | Cmaj9, Dm11, G13 |
| Suspended | Csus4, Dsus2, 7sus4 |
| Altered | G7#9, C7b9, Db7#11 |
| Diminished | Bdim, Cdim7 |
| Augmented | Caug, C+ |
| Slash chords | C/E, Am/G, Dm7/C |
| Power chords | C5, G5 |
| Add chords | Cadd9, Fmadd9 |
{"chords": "Am F C G\nAm F C G\nDm Am F E7"}
13 built-in styles, each with distinct instruments, drum patterns, and character. If you provide a description, GPT picks the best style automatically. Otherwise, the system auto-detects from chord content.
| Status | Cause | Detail |
|---|---|---|
| 400 | No valid chords in input | Check chord symbols for typos |
| 400 | Invalid style name | Response lists all valid styles |
| 500 | Rendering failure | Server-side issue with FluidSynth/FFmpeg |
| 504 | Generation timeout | Took longer than 120 seconds |
If the OpenAI call fails (rate limit, network error), the API does not return an error. It falls back to auto-detecting the style from chord content and applying a random variation. You always get music back.
When you provide a description, it's sent to OpenAI's GPT-4.1-mini with a carefully engineered system prompt. The prompt encodes:
GPT returns a JSON object selecting a style, BPM, and whichever variation parameters it deems relevant. The API then validates and clamps every value to its legal range before use — GPT never gets unchecked access to the generator.
If no description is provided, the system uses auto-detection: it analyzes the harmonic content of your chords (ratio of minor chords, jazz extensions, sus chords, power chords, etc.) and picks the most appropriate style with weighted randomization. A random variation with musical coherence (one of 8 "personality" archetypes like tight, loose, intimate, energetic, swung, cinematic, minimal, lush) is applied so each generation sounds unique.
The parser converts chord symbols into MIDI note numbers. It handles:
Am/G puts G in the bass register while Am voices sit in the mid registerEach chord becomes a bass note (placed in the 28–47 MIDI range) and a set of voicing notes (48–84 range), which are then distributed across tracks according to the style.
The core of the system. For each chord in the progression, up to 7 simultaneous tracks are generated:
| Track | Channel | Role |
|---|---|---|
| Comp | 0 | Primary chording instrument (guitar, piano, synth pad). Plays chord voicings with style-specific patterns: arpeggios, strums, block chords, rhythm stabs, power chugs. |
| Bass | 1 | Bass line. Patterns include root-only, root-fifth, walking bass, octave jumps, synth pulse, reggae. Activity controlled by bass_activity. |
| Harmony | 2 | Secondary harmony — adds color. Patterns include thirds above, sixths below, octave doubling, high voicings, sparse fifths. |
| Pad | 3 | Sustained atmospheric layer (strings, warm pads, choir). Swells and sustains underneath everything. |
| Solo | 4 | Lead instrument playing improvised melodies. Uses scale-aware note selection (maps each chord to its appropriate mode) with configurable complexity. |
| Fill | 5 | Counter-melody fills between chords. Arpeggio runs, descending lines, trills, chord stabs. |
| Drums | 9 | Full GM drum kit. Style-specific patterns (jazz ride, rock kick-snare, electronic hi-hats, Latin congas, etc.) plus auxiliary percussion. |
These parameters control the musical character at a fine-grained level. GPT sets them from your description, or they're randomized coherently.
| Parameter | Range | What it does |
|---|---|---|
| swing | 0.0 – 0.66 | Offbeat delay. 0 = straight, 0.33 = light swing, 0.66 = heavy triplet swing. |
| articulation | 0.3 – 1.5 | Note duration multiplier. Low = staccato (short, punchy), high = legato (sustained, flowing). |
| register | -1, 0, 1 | Octave shift for the comping instrument. -1 = low and warm, +1 = high and bright. |
| drum_intensity | 0.0 – 1.5 | Drum velocity/presence. 0 = silent, 0.3 = ghost notes only, 1.0 = normal, 1.5 = heavy hitting. |
| dynamics_shape | 6 options | Volume envelope over the whole piece: flat, arc (build-climax-resolve), crescendo, decrescendo, wave (two waves), swell (quick build, sustain). |
| groove_offset | -0.03 – 0.03 | Timing push/pull. Negative = ahead of the beat (driving), positive = behind (laid-back). |
| humanize | 0.0 – 1.0 | Random timing and velocity variation. 0 = machine-perfect, 0.5 = natural feel, 1.0 = loose/drunk. |
| fill_freq | 0, 4, 8, 16 | Drum fill frequency. 0 = only at section breaks, 4 = every 4 chords, etc. |
| voicing_spread | -1, 0, 1 | Chord voicing width. -1 = tight/close, 0 = normal, +1 = open/spread voicings. |
| bass_octave | -1, 0, 1 | Bass register. -1 = sub bass (rumble), 0 = normal, +1 = higher (lighter). |
| tempo_drift | 0.0 – 0.05 | Rubato amount. Subtle BPM fluctuations per chord for an expressive, human feel. |
| note_overlap | -0.15 – 0.15 | Negative = gaps between notes (choppy), positive = notes bleed into each other (sustain pedal feel). |
| ghost_notes | 0.0 – 1.0 | Extra quiet notes between main beats. Adds rhythmic texture and groove. |
| accent_pattern | 4 options | Which beats get velocity boosts: downbeat (1), backbeat (2&4), all (even), syncopated (offbeats). |
| comp_density | 0.3 – 1.5 | How busy the comping instrument is. Low = sparse, high = fills every beat with hits. |
| harmony_amount | 0.0 – 1.5 | Volume of the secondary harmony track. 0 = silent, 1.5 = prominent. |
| pad_amount | 0.0 – 1.5 | Sustained pad track level. Higher = lush atmospheric wash. |
| bass_activity | 0.3 – 1.5 | Bass complexity. 0.5 = root notes only, 1.0 = normal pattern, 1.5 = busy walking lines. |
| crash_freq | 0.0 – 0.3 | Extra crash cymbal probability on each chord change. |
| chord_rhythm | 4 options | How much chord durations vary: uniform (all equal), normal, varied, dramatic (big contrasts). |
| merge_repeats | bool | If true, consecutive identical chords merge into one long sustained chord. |
| hold_endings | bool | If true, the last chord of each section is held longer for a natural phrase ending. |
| solo_amount | 0.0 – 1.0 | How much the solo instrument plays. 0 = silent, 0.5 = occasional phrases, 1.0 = constant lead. |
| solo_complexity | 0.0 – 1.0 | Solo note selection. Low = simple chord tones, mid = scale runs, high = chromatic passing tones. |
| aux_perc | 0.0 – 1.0 | Auxiliary percussion presence (shakers, tambourines, bongos, congas). Style-dependent instruments. |
| fill_melody | 0.0 – 1.0 | Melodic fill frequency between chords. Counter-melodies and transitions. |
| style_morph | 0.0 – 1.0 | Genre changes at section boundaries. 0 = consistent style throughout, 1.0 = new genre every section. Prefers contrasting families (e.g. jazz → rock, ambient → electronic). |
The generated multi-track MIDI file is rendered to audio in two stages:
On Railway, rendering uses FluidSynth exclusively. Locally, the system can also use sfizz with higher-quality SFZ instrument libraries for per-track rendering with individual reverb and EQ, but these libraries are too large (4.3GB) for cloud deployment.
asyncio.run_in_executor so the event loop stays responsive. A semaphore limits to 2 concurrent generations to prevent memory exhaustion./tmp. After the MP3 streams back, a BackgroundTask cleans up all intermediate files./health.The AI understands musical vocabulary and vibes. Here are some effective descriptions:
| Description | What GPT does |
|---|---|
smoky jazz bar, walking bass | Jazz style, swing ~0.4, high bass_activity, tenor sax solo |
lo-fi bedroom chill | High humanize, tempo drift, ghost notes, low drum intensity |
epic cinematic crescendo | Strings/ambient, crescendo dynamics, high pad + harmony, rubato |
tight funk groove | Syncopated accents, ghost notes, precise timing, pushed groove |
dreamy ethereal ambient | Ambient style, note overlap, high pads, sparse drums, high humanize |
aggressive heavy rock | Rock style, high drum intensity + comp density, pushed groove |
intimate acoustic fingerpicking | Acoustic style, sparse density, low drums, legato articulation |