Gamma Live

Generate multi-track audio from chord progressions. Describe how you want it to sound in plain English — AI translates your words into 26 musical parameters.

Try It

Quick Start

With a description
curl -X POST http://gamma.omegaai.dev/api/generate \
  -H "Content-Type: application/json" \
  -d '{"chords": "Dm7 G7 Cmaj7 Am7", "description": "smoky jazz bar, walking bass, sax solo"}' \
  --output song.mp3
Just chords (auto-detects style)
curl -X POST http://gamma.omegaai.dev/api/generate \
  -H "Content-Type: application/json" \
  -d '{"chords": "C G Am F"}' \
  --output song.mp3

Returns an MP3 file. Generation typically takes 3–15 seconds depending on song length.

API Endpoints

POST /api/generate Generate audio from chords

Request body (JSON):

FieldTypeDescription
chords string required Chord progression. Space-separated symbols, newlines for sections.
e.g. "Dm7 G7 Cmaj7 Am7"
description string optional Natural language description of desired sound. GPT-4.1-mini translates this into variation parameters.
e.g. "smoky jazz bar, walking bass"
style string optional Force a specific style, overriding GPT/auto-detect. One of the 13 styles below.
bpm integer optional Force a specific tempo, overriding GPT/auto-detect.

Response:

  • Content-Type: audio/mpeg — MP3 binary
  • X-Style — style used (e.g. jazz)
  • X-BPM — tempo used
  • X-Generation-Time — how long it took
  • X-Metadata — full JSON metadata including GPT params
GET /health Health check

Returns {"status": "ok"}

GET /docs Interactive API docs

Auto-generated OpenAPI/Swagger UI for testing the API interactively.

Chord Format

Standard chord symbols are supported. Separate chords with spaces. Use newlines (or \n in JSON) to denote sections.

TypeExamples
MajorC, F, Bb
MinorAm, Dm, F#m
SeventhG7, Cmaj7, Dm7, Am7b5
ExtendedCmaj9, Dm11, G13
SuspendedCsus4, Dsus2, 7sus4
AlteredG7#9, C7b9, Db7#11
DiminishedBdim, Cdim7
AugmentedCaug, C+
Slash chordsC/E, Am/G, Dm7/C
Power chordsC5, G5
Add chordsCadd9, Fmadd9
Multi-section example
{"chords": "Am F C G\nAm F C G\nDm Am F E7"}

Styles

13 built-in styles, each with distinct instruments, drum patterns, and character. If you provide a description, GPT picks the best style automatically. Otherwise, the system auto-detects from chord content.

acoustic
92–118 BPM
Fingerpicked nylon guitar, warm, intimate
folk
95–122 BPM
Steel guitar, shakers, light and airy
ballad
62–88 BPM
Piano-led, emotional, tenor sax solos
piano_ballad
68–92 BPM
Bright piano arpeggios, celesta fills
pop
105–132 BPM
Rhodes, upbeat, brass stabs, finger bass
rock
118–148 BPM
Power chords, overdrive, heavy drums
clean_rock
108–135 BPM
Clean electric guitar, crunchy but clear
jazz
80–140 BPM
Jazz guitar, walking bass, vibraphone, swing
rnb
70–100 BPM
Rhodes, warm pads, finger bass, laid-back
electronic
118–140 BPM
Synth pads, synth bass, 808-style drums
ambient
60–85 BPM
Halo pads, fretless bass, ethereal, sparse
latin
100–130 BPM
Nylon guitar, trumpet, heavy percussion
strings
55–78 BPM
String ensemble, cello bass, violin solo

Error Handling

StatusCauseDetail
400No valid chords in inputCheck chord symbols for typos
400Invalid style nameResponse lists all valid styles
500Rendering failureServer-side issue with FluidSynth/FFmpeg
504Generation timeoutTook longer than 120 seconds

If the OpenAI call fails (rate limit, network error), the API does not return an error. It falls back to auto-detecting the style from chord content and applying a random variation. You always get music back.


How It Works

Request AI Layer Audio Pipeline POST /api/generate ──> GPT-4.1-mini ──> Chord Parser chords + description description ──> params ──> MIDI Generator ──> FluidSynth (MIDI ──> WAV) ──> FFmpeg (WAV ──> MP3) ──> MP3 Response

1. Description Interpretation (GPT-4.1-mini)

When you provide a description, it's sent to OpenAI's GPT-4.1-mini with a carefully engineered system prompt. The prompt encodes:

GPT returns a JSON object selecting a style, BPM, and whichever variation parameters it deems relevant. The API then validates and clamps every value to its legal range before use — GPT never gets unchecked access to the generator.

If no description is provided, the system uses auto-detection: it analyzes the harmonic content of your chords (ratio of minor chords, jazz extensions, sus chords, power chords, etc.) and picks the most appropriate style with weighted randomization. A random variation with musical coherence (one of 8 "personality" archetypes like tight, loose, intimate, energetic, swung, cinematic, minimal, lush) is applied so each generation sounds unique.

2. Chord Parsing

The parser converts chord symbols into MIDI note numbers. It handles:

Each chord becomes a bass note (placed in the 28–47 MIDI range) and a set of voicing notes (48–84 range), which are then distributed across tracks according to the style.

3. Multi-Track MIDI Generation

The core of the system. For each chord in the progression, up to 7 simultaneous tracks are generated:

TrackChannelRole
Comp0Primary chording instrument (guitar, piano, synth pad). Plays chord voicings with style-specific patterns: arpeggios, strums, block chords, rhythm stabs, power chugs.
Bass1Bass line. Patterns include root-only, root-fifth, walking bass, octave jumps, synth pulse, reggae. Activity controlled by bass_activity.
Harmony2Secondary harmony — adds color. Patterns include thirds above, sixths below, octave doubling, high voicings, sparse fifths.
Pad3Sustained atmospheric layer (strings, warm pads, choir). Swells and sustains underneath everything.
Solo4Lead instrument playing improvised melodies. Uses scale-aware note selection (maps each chord to its appropriate mode) with configurable complexity.
Fill5Counter-melody fills between chords. Arpeggio runs, descending lines, trills, chord stabs.
Drums9Full GM drum kit. Style-specific patterns (jazz ride, rock kick-snare, electronic hi-hats, Latin congas, etc.) plus auxiliary percussion.

4. The 26 Variation Parameters

These parameters control the musical character at a fine-grained level. GPT sets them from your description, or they're randomized coherently.

ParameterRangeWhat it does
swing0.0 – 0.66Offbeat delay. 0 = straight, 0.33 = light swing, 0.66 = heavy triplet swing.
articulation0.3 – 1.5Note duration multiplier. Low = staccato (short, punchy), high = legato (sustained, flowing).
register-1, 0, 1Octave shift for the comping instrument. -1 = low and warm, +1 = high and bright.
drum_intensity0.0 – 1.5Drum velocity/presence. 0 = silent, 0.3 = ghost notes only, 1.0 = normal, 1.5 = heavy hitting.
dynamics_shape6 optionsVolume envelope over the whole piece: flat, arc (build-climax-resolve), crescendo, decrescendo, wave (two waves), swell (quick build, sustain).
groove_offset-0.03 – 0.03Timing push/pull. Negative = ahead of the beat (driving), positive = behind (laid-back).
humanize0.0 – 1.0Random timing and velocity variation. 0 = machine-perfect, 0.5 = natural feel, 1.0 = loose/drunk.
fill_freq0, 4, 8, 16Drum fill frequency. 0 = only at section breaks, 4 = every 4 chords, etc.
voicing_spread-1, 0, 1Chord voicing width. -1 = tight/close, 0 = normal, +1 = open/spread voicings.
bass_octave-1, 0, 1Bass register. -1 = sub bass (rumble), 0 = normal, +1 = higher (lighter).
tempo_drift0.0 – 0.05Rubato amount. Subtle BPM fluctuations per chord for an expressive, human feel.
note_overlap-0.15 – 0.15Negative = gaps between notes (choppy), positive = notes bleed into each other (sustain pedal feel).
ghost_notes0.0 – 1.0Extra quiet notes between main beats. Adds rhythmic texture and groove.
accent_pattern4 optionsWhich beats get velocity boosts: downbeat (1), backbeat (2&4), all (even), syncopated (offbeats).
comp_density0.3 – 1.5How busy the comping instrument is. Low = sparse, high = fills every beat with hits.
harmony_amount0.0 – 1.5Volume of the secondary harmony track. 0 = silent, 1.5 = prominent.
pad_amount0.0 – 1.5Sustained pad track level. Higher = lush atmospheric wash.
bass_activity0.3 – 1.5Bass complexity. 0.5 = root notes only, 1.0 = normal pattern, 1.5 = busy walking lines.
crash_freq0.0 – 0.3Extra crash cymbal probability on each chord change.
chord_rhythm4 optionsHow much chord durations vary: uniform (all equal), normal, varied, dramatic (big contrasts).
merge_repeatsboolIf true, consecutive identical chords merge into one long sustained chord.
hold_endingsboolIf true, the last chord of each section is held longer for a natural phrase ending.
solo_amount0.0 – 1.0How much the solo instrument plays. 0 = silent, 0.5 = occasional phrases, 1.0 = constant lead.
solo_complexity0.0 – 1.0Solo note selection. Low = simple chord tones, mid = scale runs, high = chromatic passing tones.
aux_perc0.0 – 1.0Auxiliary percussion presence (shakers, tambourines, bongos, congas). Style-dependent instruments.
fill_melody0.0 – 1.0Melodic fill frequency between chords. Counter-melodies and transitions.
style_morph0.0 – 1.0Genre changes at section boundaries. 0 = consistent style throughout, 1.0 = new genre every section. Prefers contrasting families (e.g. jazz → rock, ambient → electronic).

5. Audio Rendering

The generated multi-track MIDI file is rendered to audio in two stages:

On Railway, rendering uses FluidSynth exclusively. Locally, the system can also use sfizz with higher-quality SFZ instrument libraries for per-track rendering with individual reverb and EQ, but these libraries are too large (4.3GB) for cloud deployment.

6. Architecture

7. Description Tips

The AI understands musical vocabulary and vibes. Here are some effective descriptions:

DescriptionWhat GPT does
smoky jazz bar, walking bassJazz style, swing ~0.4, high bass_activity, tenor sax solo
lo-fi bedroom chillHigh humanize, tempo drift, ghost notes, low drum intensity
epic cinematic crescendoStrings/ambient, crescendo dynamics, high pad + harmony, rubato
tight funk grooveSyncopated accents, ghost notes, precise timing, pushed groove
dreamy ethereal ambientAmbient style, note overlap, high pads, sparse drums, high humanize
aggressive heavy rockRock style, high drum intensity + comp density, pushed groove
intimate acoustic fingerpickingAcoustic style, sparse density, low drums, legato articulation