Gamma — Chord-to-Audio API

Try It

Quick Start

With a description

curl -X POST http://gamma.omegaai.dev/api/generate \
  -H "Content-Type: application/json" \
  -d '{"chords": "Dm7 G7 Cmaj7 Am7", "description": "smoky jazz bar, walking bass, sax solo"}' \
  --output song.mp3

Just chords (auto-detects style)

curl -X POST http://gamma.omegaai.dev/api/generate \
  -H "Content-Type: application/json" \
  -d '{"chords": "C G Am F"}' \
  --output song.mp3

Returns an MP3 file. Generation typically takes 3–15 seconds depending on song length.

API Endpoints

POST /api/generate Generate audio from chords

Request body (JSON):

Field	Type	Description
chords	`string` required	Chord progression. Space-separated symbols, newlines for sections. e.g. `"Dm7 G7 Cmaj7 Am7"`
description	`string` optional	Natural language description of desired sound. GPT-4.1-mini translates this into variation parameters. e.g. `"smoky jazz bar, walking bass"`
style	`string` optional	Force a specific style, overriding GPT/auto-detect. One of the 13 styles below.
bpm	`integer` optional	Force a specific tempo, overriding GPT/auto-detect.

Response:

Content-Type: audio/mpeg — MP3 binary
X-Style — style used (e.g. jazz)
X-BPM — tempo used
X-Generation-Time — how long it took
X-Metadata — full JSON metadata including GPT params

GET /health Health check

Returns {"status": "ok"}

GET /docs Interactive API docs

Auto-generated OpenAPI/Swagger UI for testing the API interactively.

Chord Format

Standard chord symbols are supported. Separate chords with spaces. Use newlines (or \n in JSON) to denote sections.

Type	Examples
Major	`C`, `F`, `Bb`
Minor	`Am`, `Dm`, `F#m`
Seventh	`G7`, `Cmaj7`, `Dm7`, `Am7b5`
Extended	`Cmaj9`, `Dm11`, `G13`
Suspended	`Csus4`, `Dsus2`, `7sus4`
Altered	`G7#9`, `C7b9`, `Db7#11`
Diminished	`Bdim`, `Cdim7`
Augmented	`Caug`, `C+`
Slash chords	`C/E`, `Am/G`, `Dm7/C`
Power chords	`C5`, `G5`
Add chords	`Cadd9`, `Fmadd9`

Multi-section example

{"chords": "Am F C G\nAm F C G\nDm Am F E7"}

Styles

13 built-in styles, each with distinct instruments, drum patterns, and character. If you provide a description, GPT picks the best style automatically. Otherwise, the system auto-detects from chord content.

acoustic

92–118 BPM

Fingerpicked nylon guitar, warm, intimate

folk

95–122 BPM

Steel guitar, shakers, light and airy

ballad

62–88 BPM

Piano-led, emotional, tenor sax solos

piano_ballad

68–92 BPM

Bright piano arpeggios, celesta fills

pop

105–132 BPM

Rhodes, upbeat, brass stabs, finger bass

rock

118–148 BPM

Power chords, overdrive, heavy drums

clean_rock

108–135 BPM

Clean electric guitar, crunchy but clear

jazz

80–140 BPM

Jazz guitar, walking bass, vibraphone, swing

rnb

70–100 BPM

Rhodes, warm pads, finger bass, laid-back

electronic

118–140 BPM

Synth pads, synth bass, 808-style drums

ambient

60–85 BPM

Halo pads, fretless bass, ethereal, sparse

latin

100–130 BPM

Nylon guitar, trumpet, heavy percussion

strings

55–78 BPM

String ensemble, cello bass, violin solo

Error Handling

Status	Cause	Detail
400	No valid chords in input	Check chord symbols for typos
400	Invalid style name	Response lists all valid styles
500	Rendering failure	Server-side issue with FluidSynth/FFmpeg
504	Generation timeout	Took longer than 120 seconds

If the OpenAI call fails (rate limit, network error), the API does not return an error. It falls back to auto-detecting the style from chord content and applying a random variation. You always get music back.

How It Works

Request AI Layer Audio Pipeline POST /api/generate ──> GPT-4.1-mini ──> Chord Parser chords + description description ──> params ──> MIDI Generator ──> FluidSynth (MIDI ──> WAV) ──> FFmpeg (WAV ──> MP3) ──> MP3 Response

1. Description Interpretation (GPT-4.1-mini)

When you provide a description, it's sent to OpenAI's GPT-4.1-mini with a carefully engineered system prompt. The prompt encodes:

All 13 styles with their character descriptions and BPM ranges
All 26 variation parameters with exact numeric ranges and musical meaning
Musical interpretation rules mapping vibes to parameters (e.g. "chill" → low drums, high humanize, laid-back groove)

GPT returns a JSON object selecting a style, BPM, and whichever variation parameters it deems relevant. The API then validates and clamps every value to its legal range before use — GPT never gets unchecked access to the generator.

If no description is provided, the system uses auto-detection: it analyzes the harmonic content of your chords (ratio of minor chords, jazz extensions, sus chords, power chords, etc.) and picks the most appropriate style with weighted randomization. A random variation with musical coherence (one of 8 "personality" archetypes like tight, loose, intimate, energetic, swung, cinematic, minimal, lush) is applied so each generation sounds unique.

2. Chord Parsing

The parser converts chord symbols into MIDI note numbers. It handles:

Root detection — any note including sharps/flats (C, F#, Bb, etc.)
Quality parsing — major, minor, diminished, augmented, sus, and all extensions (7, 9, 11, 13) with arbitrary alterations (#9, b13, etc.)
Slash chords — Am/G puts G in the bass register while Am voices sit in the mid register
Section breaks — newlines in the chord string create section boundaries, which affect dynamics, fills, and style morphing

Each chord becomes a bass note (placed in the 28–47 MIDI range) and a set of voicing notes (48–84 range), which are then distributed across tracks according to the style.

3. Multi-Track MIDI Generation

The core of the system. For each chord in the progression, up to 7 simultaneous tracks are generated:

Track	Channel	Role
Comp	0	Primary chording instrument (guitar, piano, synth pad). Plays chord voicings with style-specific patterns: arpeggios, strums, block chords, rhythm stabs, power chugs.
Bass	1	Bass line. Patterns include root-only, root-fifth, walking bass, octave jumps, synth pulse, reggae. Activity controlled by `bass_activity`.
Harmony	2	Secondary harmony — adds color. Patterns include thirds above, sixths below, octave doubling, high voicings, sparse fifths.
Pad	3	Sustained atmospheric layer (strings, warm pads, choir). Swells and sustains underneath everything.
Solo	4	Lead instrument playing improvised melodies. Uses scale-aware note selection (maps each chord to its appropriate mode) with configurable complexity.
Fill	5	Counter-melody fills between chords. Arpeggio runs, descending lines, trills, chord stabs.
Drums	9	Full GM drum kit. Style-specific patterns (jazz ride, rock kick-snare, electronic hi-hats, Latin congas, etc.) plus auxiliary percussion.

4. The 26 Variation Parameters

These parameters control the musical character at a fine-grained level. GPT sets them from your description, or they're randomized coherently.

Parameter	Range	What it does
swing	0.0 – 0.66	Offbeat delay. 0 = straight, 0.33 = light swing, 0.66 = heavy triplet swing.
articulation	0.3 – 1.5	Note duration multiplier. Low = staccato (short, punchy), high = legato (sustained, flowing).
register	-1, 0, 1	Octave shift for the comping instrument. -1 = low and warm, +1 = high and bright.
drum_intensity	0.0 – 1.5	Drum velocity/presence. 0 = silent, 0.3 = ghost notes only, 1.0 = normal, 1.5 = heavy hitting.
dynamics_shape	6 options	Volume envelope over the whole piece: `flat`, `arc` (build-climax-resolve), `crescendo`, `decrescendo`, `wave` (two waves), `swell` (quick build, sustain).
groove_offset	-0.03 – 0.03	Timing push/pull. Negative = ahead of the beat (driving), positive = behind (laid-back).
humanize	0.0 – 1.0	Random timing and velocity variation. 0 = machine-perfect, 0.5 = natural feel, 1.0 = loose/drunk.
fill_freq	0, 4, 8, 16	Drum fill frequency. 0 = only at section breaks, 4 = every 4 chords, etc.
voicing_spread	-1, 0, 1	Chord voicing width. -1 = tight/close, 0 = normal, +1 = open/spread voicings.
bass_octave	-1, 0, 1	Bass register. -1 = sub bass (rumble), 0 = normal, +1 = higher (lighter).
tempo_drift	0.0 – 0.05	Rubato amount. Subtle BPM fluctuations per chord for an expressive, human feel.
note_overlap	-0.15 – 0.15	Negative = gaps between notes (choppy), positive = notes bleed into each other (sustain pedal feel).
ghost_notes	0.0 – 1.0	Extra quiet notes between main beats. Adds rhythmic texture and groove.
accent_pattern	4 options	Which beats get velocity boosts: `downbeat` (1), `backbeat` (2&4), `all` (even), `syncopated` (offbeats).
comp_density	0.3 – 1.5	How busy the comping instrument is. Low = sparse, high = fills every beat with hits.
harmony_amount	0.0 – 1.5	Volume of the secondary harmony track. 0 = silent, 1.5 = prominent.
pad_amount	0.0 – 1.5	Sustained pad track level. Higher = lush atmospheric wash.
bass_activity	0.3 – 1.5	Bass complexity. 0.5 = root notes only, 1.0 = normal pattern, 1.5 = busy walking lines.
crash_freq	0.0 – 0.3	Extra crash cymbal probability on each chord change.
chord_rhythm	4 options	How much chord durations vary: `uniform` (all equal), `normal`, `varied`, `dramatic` (big contrasts).
merge_repeats	bool	If true, consecutive identical chords merge into one long sustained chord.
hold_endings	bool	If true, the last chord of each section is held longer for a natural phrase ending.
solo_amount	0.0 – 1.0	How much the solo instrument plays. 0 = silent, 0.5 = occasional phrases, 1.0 = constant lead.
solo_complexity	0.0 – 1.0	Solo note selection. Low = simple chord tones, mid = scale runs, high = chromatic passing tones.
aux_perc	0.0 – 1.0	Auxiliary percussion presence (shakers, tambourines, bongos, congas). Style-dependent instruments.
fill_melody	0.0 – 1.0	Melodic fill frequency between chords. Counter-melodies and transitions.
style_morph	0.0 – 1.0	Genre changes at section boundaries. 0 = consistent style throughout, 1.0 = new genre every section. Prefers contrasting families (e.g. jazz → rock, ambient → electronic).

5. Audio Rendering

The generated multi-track MIDI file is rendered to audio in two stages:

FluidSynth — Renders MIDI to WAV at 44.1kHz stereo using the FluidR3 GM soundfont (140MB, 128 General MIDI instruments). Each GM program number maps to a realistic instrument sound.
FFmpeg — Converts WAV to MP3 using LAME at VBR quality 2 (~190kbps). The MP3 is streamed back as the HTTP response body, then temp files are cleaned up.

On Railway, rendering uses FluidSynth exclusively. Locally, the system can also use sfizz with higher-quality SFZ instrument libraries for per-track rendering with individual reverb and EQ, but these libraries are too large (4.3GB) for cloud deployment.

6. Architecture

Framework — FastAPI (async, auto OpenAPI docs)
Concurrency — Audio generation is CPU-bound (subprocess calls to FluidSynth/FFmpeg). Runs in a thread pool via asyncio.run_in_executor so the event loop stays responsive. A semaphore limits to 2 concurrent generations to prevent memory exhaustion.
Temp files — Each generation creates MIDI, WAV, and MP3 in /tmp. After the MP3 streams back, a BackgroundTask cleans up all intermediate files.
Timeout — 120 second hard limit per generation. Most requests complete in 3–15 seconds.
Deployment — Docker container on Railway with Dockerfile builder. Health check on /health.
Graceful degradation — If OpenAI is unavailable, the API still works. It falls back to chord-content-based style detection and coherent random variation.

7. Description Tips

The AI understands musical vocabulary and vibes. Here are some effective descriptions:

Description	What GPT does
`smoky jazz bar, walking bass`	Jazz style, swing ~0.4, high bass_activity, tenor sax solo
`lo-fi bedroom chill`	High humanize, tempo drift, ghost notes, low drum intensity
`epic cinematic crescendo`	Strings/ambient, crescendo dynamics, high pad + harmony, rubato
`tight funk groove`	Syncopated accents, ghost notes, precise timing, pushed groove
`dreamy ethereal ambient`	Ambient style, note overlap, high pads, sparse drums, high humanize
`aggressive heavy rock`	Rock style, high drum intensity + comp density, pushed groove
`intimate acoustic fingerpicking`	Acoustic style, sparse density, low drums, legato articulation

Gamma Live