NVIDIA Unveils Fugatto: A Revolutionary 2.5B Parameter AI Audio Generator

Written by

Published December 15, 2024

DJ mixer with headphones — Top view of DJ Mixer with headphones. Elements and details of artists working tools - DJ console with knobs and black headphones. Soft focus.

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In a groundbreaking leap for audio AI, NVIDIA has introduced Fugatto, a 2.5 billion-parameter AI audio generator designed to redefine how sound is created and transformed. Developed by a team of generative AI researchers, Fugatto is a versatile tool capable of producing and manipulating music, voices, and sounds using simple text and audio prompts. This innovation, heralded as a “Swiss Army knife for sound,” pushes the boundaries of AI-powered creativity, enabling users to generate sounds never heard before.

The All-in-One Solution for Audio AI

While many AI models specialize in isolated tasks like composing songs or voice modulation, Fugatto’s unparalleled flexibility sets it apart. From crafting music snippets based on textual descriptions to adding or removing instruments from existing tracks, it seamlessly handles multiple audio generation and transformation tasks. Fugatto’s versatility unlocks potential in multiple fields:

Music Production: Artists can prototype song ideas, experiment with styles, and fine-tune tracks.
Advertising: Agencies can tailor voiceovers with different accents and emotional tones for localized campaigns.
Education: Language learning tools could replicate a learner’s chosen voice, from a family member to a fictional character.
Gaming: Developers can dynamically modify or create audio assets based on in-game actions.

Emergent Capabilities in Generative AI Research

Fugatto leverages emergent properties—unexpected abilities arising from its diverse training—allowing users to combine free-form instructions into complex, layered outputs. For instance, it can produce speech in a French accent infused with sadness or blend auditory elements like thunderstorms transitioning into birdsong. With fine-tuning, Fugatto can perform tasks it wasn’t explicitly trained on, such as generating high-quality singing voices from text prompts.

Fugatto ComposableART feature enables real-time instruction blending to give creators nuanced control over attributes like accent intensity or tonal shifts. The model’s temporal interpolation feature further allows users to shape how sound evolves, such as crafting a thunderstorm crescendo that transitions into a serene dawn chorus.

“In my tests,” said Rohan Badlani, an AI researcher who helped design the model, “Fugatto often made me feel like an artist.”

Fugatto’s creation was a monumental undertaking. Its 2.5 billion parameters were trained on NVIDIA DGX systems using 32 H100 Tensor Core GPUs. The development team—a global collaboration spanning Brazil, India, China, and beyond—spent over a year curating millions of diverse audio samples and uncovering new relationships in data.

Read our review of the Murf AI text-to-speech generator to learn more about how generative AI can be used for audio production.

NVIDIA Unveils Fugatto: A Revolutionary 2.5B Parameter AI Audio Generator

The All-in-One Solution for Audio AI

Emergent Capabilities in Generative AI Research

Get the Free Newsletter!

Get the Free Newsletter!

MOST POPULAR ARTICLES

9 Best AI 3D Generators You Need...

RingCentral Expands Its Collaboration Platform

8 Best AI Data Analytics Software &...

Zeus Kerravala on Networking: Multicloud, 5G, and...

Datadog President Amit Agarwal on Trends in...

Advertisers

Menu

Our Brands