Stable Audio 3.0: AI That Makes Six-Minute Songs

Stability AI, the team behind Stable Diffusion, just rolled out a new family of audio models called Stable Audio 3.0. And the headline claim? The top-tier model can churn out professional-grade music that runs more than six minutes long.

That's a real jump. Honestly, it changes what people can do with AI-generated audio in a single pass.

What's Actually Inside the Stable Audio 3.0 Lineup

The release isn't just one model. It's four, each built for a different kind of job:

Small SFX (459M parameters) — focused on sound effects
Small (459M parameters) — geared toward on-device generation
Medium (1.4B parameters) — full music compositions
Large (2.7B parameters) — the heavyweight, also for full compositions

The two small models work well for on-device sound and music generation up to two minutes long. Useful if you're building something lightweight, or just want generation to run locally without leaning on the cloud.

The Six-Minute Mark Is a Big Deal

Here's where things get interesting. Both the medium and large models can put together full compositions stretching up to 6 minutes and 20 seconds, while keeping musical structure and melodic tone intact. Think about it this way — that's more than double the length Stable Audio 2.0 could manage when it dropped back in 2024.

Maintaining melodic tone across that kind of runtime is the tricky part. Plenty of generation models can string sounds together. Holding a song's identity from start to finish? That's a different ask.

Open Weights for Three of the Four Models

Stability AI is making the small SFX, small, and medium models available with open weights. So anyone can grab them, run them, modify them. Back in 2024, the company shipped Stable Audio Open, which only handled music generation up to 47 seconds. The new lineup blows past that in pretty much every direction.

The large model, though, plays by different rules. It's only available through the API and self-hosting paid services. And if your company pulls in more than $1 million in revenue, you'll need an enterprise license to use it.

That tiered approach makes sense, kind of. Keep the experimental, hobbyist, and small-team uses free and open. Charge the bigger players who can afford it.

Why Licensed Training Data Matters Right Now

Music generation is getting crowded fast. Google has been pushing Lyria, ElevenLabs rolled out an AI-powered music generation app, and they're not the only ones in the space.

But here's the thing — the legal side of all this is messy. Suno and Udio are both in the middle of ongoing court battles. Licensing of training data and partnerships with music labels look like they'll be central to whether any of these services actually survive long-term.

Stability AI's Label Deals

Last year, Stability AI signed agreements with Warner Music Group and Universal Music Group to build out models and music creation tools together. The company says its new audio models are trained entirely on fully licensed data. That's a meaningful claim in a market where lawsuits are flying around.

A New Push for Professional Musicians

Stability AI is also building a suite of products aimed specifically at professional musicians. The company hasn't shared the features yet, so we're working with limited details here.

What we do know: Ethan Kaplan, formerly chief digital officer at Universal Audio and Fender, is joining to lead the professional music offering. That's a serious hire from someone with real audio industry experience.

The Music Exec Hiring Spree

Stability isn't the only AI company chasing credibility through executive hires. Earlier this year, Suno brought on Jeremy Sirota, the former Merlin CEO, as chief commercial officer. ElevenLabs picked up Derek Cournoyer from indie music publisher Kobalt to lead strategy for its music business.

You can see the pattern. AI companies want people who actually understand how the music industry works — licensing, labels, distribution, the whole machinery. It's not enough to ship a clever model anymore.