AI tools
January 18, 2024

AudioCraft is Midjourneu for Audio

AudioCraft’s AI audio models can generate environmental sounds and music tracks from text descriptions.

John Paul Ada
John Paul Ada

In the immortal words of David Bowie, “Tomorrow belongs to those who can hear it coming.” What if the “it” the Starman was referring to was the future of audio modeling? Buckle up, audiophiles and tech-enthusiasts, we’re about to explore the brave new world of AI-generated music with the unveiling of AudioCraft.

Picture a world where a classically trained pianist can pen new symphonies without caressing a single ivory key. Imagine indie game developers sculpting their virtual worlds with authentic soundscapes on a budget that makes shoestrings look like yacht moorings. Think about the mom-and-pop shop around the corner augmenting their latest Instagram post with a custom-made soundtrack by simply tapping on their smartphone.

Feeling a little like Marty McFly in Back to the Future? Hold on to your hoverboards, because these are the endless possibilities that AudioCraft brings to life.

Meet the AudioCraft Family

AudioCraft isn’t a lone ranger in the Wild West of AI technology. It’s more like the Avengers of audio modeling, a family of models with unique capabilities that work in harmony. Let’s introduce MusicGen, AudioGen, and EnCodec, the power trio behind AudioCraft.

MusicGen and AudioGen, backed by an arsenal of Meta-owned and specifically licensed music and public sound effects, are the maestros generating music and sound effects from text-based user inputs. They’re like an AI Beethoven taking your sonnets and converting them into symphonies.

EnCodec, meanwhile, is the improved decoder, the secret weapon behind high-quality, artifact-free music generation. It’s like the Gandalf to our Middle Earth, magically enhancing the world of audio modeling.

AudioCraft: A Leap Forward for Researchers and Practitioners

With AudioCraft, we’re placing the infinity stones of generative AI audio models into the hands of researchers and practitioners. The goal? To push the boundaries of current technology and unlock new realms in music generation and sound effects.

Historically, audio has been the Cinderella of the generative AI ball, often overshadowed by other forms of media. This is primarily due to the Herculean task of generating high-fidelity audio that requires modeling complex signals and patterns across varying scales.

Remember trying to beat the Water Temple level in the Legend of Zelda: Ocarina of Time? Yeah, generating music can be as complex and challenging as that. Previous attempts often used symbolic representations like MIDI or piano rolls. These efforts were like trying to learn the Force from a manual, often failing to capture the expressive nuances and stylistic elements found in music.

Breaking the Sound Barrier with the AudioCraft Models

The AudioCraft models are our Luke Skywalkers, taking on the Darth Vader of challenges in audio generation. By leveraging self-supervised audio representation learning and a hierarchy of models, they generate music that Han Solo would proudly blast on the Millennium Falcon.

AudioCraft’s innovative design extends beyond just generating music and sound effects. It’s also a game-changer in audio compression. It’s the Swiss Army knife for those who want to build better sound generators, compression algorithms, or music generators. It’s a one-stop codebase that users can enhance and build upon like Minecraft players crafting their virtual worlds.

Generating SFX from text prompts.
Generating Music from text prompts.

Creating Custom Audio Experiences

The generation of audio from raw signals is like trying to solve a Rubik’s cube blindfolded. The sequences involved are long and complex. To solve this, we’ve employed the EnCodec neural audio codec, our audio whisperer that learns discrete audio tokens from the raw signal. This gives us a new “vocabulary” for music samples.

AudioCraft’s AI audio models can generate environmental sounds and music tracks from text descriptions. This text-to-audio generation capability is what sets AudioCraft apart like an Ewok at a Wookiee convention. It’s a powerful tool for creating immersive audio experiences.

Spreading the Sound with Open-Source

Sharing is caring. Maybe that’s why Meta open sourced AudioCraft. It’s a call to arms for the research community to come together like the Fellowship of the Ring and explore the possibilities that AudioCraft offers.

The future is full of possibilities. Musicians can provide inspiration, kickstart brainstorming sessions, and transform the early prototyping stages. It’s a tool for everyone, from AAA developers building worlds for the metaverse to the indie musicians working on their next album, to the small business owner keen on enhancing their creative assets.

The Future of AudioCraft

The open-sourcing of AudioCraft is a giant leap forward in the development of advanced human-computer interaction models. With its simple approach to generating robust, coherent, and high-quality audio samples, we’re as excited as a kid opening a Wonka bar to see what the community will create with it.

In conclusion, AudioCraft isn’t just a development in the field of audio modeling; it’s a revolution. With AudioCraft, we can’t wait to hear what the future sounds like.