School of Computer Science › Language Technologies Institute › News and Events › news › LTI's Watanabe Helps Expand Music Generation With Adobe

An abstract image representing a soundwave using a wide variety of colors

April 11, 2024

LTI's Watanabe Helps Expand Music Generation With Adobe

The LTI's Shinji Watanabe is part of a group of SCS researchers were part of the research team behind one of Adobe's newest ventures, the generative AI music creation and editing tool called Project Music GenAI Control.

By Marylee Williams

Media Inquiries

Aaron Aupperlee

aaupperlee(through)cmu.edu
412-268-9068

Shih-Lun Wu is an avid classical piano and viola player, but he learned viola because all the violin seats in his school orchestra had been taken. Now a student in Carnegie Mellon University's School of Computer Science, Wu uses generative AI and machine learning to make music creation more accessible and engaging for people of all abilities.

Wu, who is pursuing a master's degree in language technologies, was part of the research team behind one of Adobe's newest ventures, the generative AI music creation and editing tool called Project Music GenAI Control. Wu co-developed an aspect of the project, Music ControlNet, with SCS faculty members Chris Donahue of the Computer Science Department and Shinji Watanabe of the Language Technologies Institute. The CMU researchers collaborated with Nicholas J. Bryan, a senior research scientist and head of the Music AI research group at Adobe Research.

Music ControlNet gives users command over aspects of generated audio like melody, rhythm and dynamics. Users can play, compose or draw what they want these elements — or any combination of them — to look like. The user then combines that information with a text prompt, such as "happy jazz" or "sad country," that they send to a generative AI model to produce audio. Music ControlNet can also co-create with the user, improvising on a partial melody or other inputs.

Wu and colleagues adapted previous research into pixel-level controllable image generation to create Music ControlNet. The melody, rhythm or dynamic references and text inputs are transformed into an image-like representation of the music, which is then converted to audio. This work was accepted to IEEE Transactions on Audio, Speech and Language Processing, a journal for audio and music work.

"This is just the beginning," Wu said. "Ultimately, with more effective and comprehensive controls, we hope to accelerate the creative workflow for music professionals, and we can make music creation easier for the general public."

Adobe's Project Music GenAI Control also involved researchers at the University of California, San Diego. Learn more about the project on Adobe's website.