Meta announces the AI tool AudioCraft, which will convert simple text into audio and music

A new open-source AI tool called AudioCraft has been made available by Meta. According to the company, this programme is made to let both professional artists and regular people generate audio and music using simple text prompts.

MusicGen, AudioGen, and EnCodec are the three models that make up AudioCraft. MusicGen can create music from text inputs and was trained using Meta’s own music library. On the other hand, AudioGen is trained at creating sound effects for the general public and can generate audio from text inputs. The EnCodec decoder has also been upgraded, enabling the creation of music with higher quality and less unwanted artefacts.

Use of new AudioCraft Tool

Meta is making their pre-prepared AudioGen models accessible, which will allow clients to generate environmental sounds and audio cues like dogs barking, cars honking, or footsteps on a wooden floor. Additionally, Meta is distributing all AudioCraft model weights and code. Composition of music, creation of sound effects, compression algorithms, and audio generation are just a few of the many uses for this brand-new instrument.

By publicly releasing these models, Meta expects to give researchers and practitioners access to prepare their own models utilizing their own datasets.

Meta claims that generative AI based intelligence has taken huge steps in pictures, video, and text, yet sound has not seen a similar degree of improvement. AudioCraft tends to this hole by giving a more open and easy to understand stage for generating high-quality audio.

In its official blog, Meta makes sense of that creating realistic and high-fidelity audio is especially difficult as it includes demonstrating complex signals and examples at various scales. Music, being a structure of local and long-range patterns, presents an exceptional test in sound age.

Over long durations, AudioCraft is capable of producing audio of high quality. The organization claims it improves on the plan of generative models for audio, making it simpler for clients to explore different avenues regarding the current models.