gaudio

뒤로가기back

Exploring ICML 2024: Latest Advances in AI and Audio Research

2024.08.30 by Kaya Chung

Introduction

 

Hello again, it’s Kaya back with more updates!
As a researcher focused on audio AI at Gaudio Lab, I often get the chance to attend various conferences. You might remember my recent post on ICASSP 2024 & Gaudio Night.

 

This time, I’m excited to share my experience attending the ICML (International Conference on Machine Learning) 2024, which was held in beautiful Vienna, Austria.

 

ICML is one of the most prominent AI conferences in the world, along with ICLR and NeurIPS, and Gaudio Lab makes it a point to attend every year. It’s an incredible opportunity for researchers like me to connect with others in the field and discover the latest breakthroughs.

 

Unlike my last post, which focused on the event atmosphere, this one will dive into the innovative ideas and insights I gained from ICML 2024!

 

오스트리아 빈에서 개최된 2024 ICML 현장. Wien~

The venue of ICML 2024 held in Vienna, Austria. Wien~

 

 

 

Highlights from ICML 2024

 

Spotlighted Papers

 

Let’s start with a few standout research papers I encountered. The first one that caught my eye was "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution." This study introduces a novel approach to handling discrete data, showcasing a method called SEDD that outperforms existing SOTA models like GPT-2 in the natural language processing field. This research significantly broadens the applicability of diffusion models and has made a notable impact in the AI community.

 

 

ou, Aaron, Chenlin Meng, and Stefano Ermon. "Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution."

Lou, Aaron, Chenlin Meng, and Stefano Ermon. "Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution."

 

 

 

Another intriguing study was titled "Debating with More Persuasive LLMs Leads to More Truthful Answers." This paper presents experimental evidence that AI models can enhance their accuracy through debates—a concept that once seemed confined to the realm of imagination. This research could play a crucial role in addressing issues with misinformation in models like ChatGPT.

 

 
 

Khan, Akbir, et al. "Debating with More Persuasive LLMs Leads to More Truthful Answers."

 

 

 

 

The Introduction of “Position” Papers

 

A new category of papers made its debut at this year’s ICML: Position papers. These papers don’t necessarily propose new models or ideas; instead, they challenge current academic norms and provoke deep reflection.

 

One particularly compelling paper was titled "Position: Measure Dataset Diversity, Don’t Just Claim It." This study argues that simply asserting dataset diversity isn’t enough. The authors analyzed 135 image and text datasets to offer a fresh perspective on the topic, reminding us as AI researchers to deeply consider the fairness and inclusiveness of datasets.

 

 

 

Trends in Audio Research

 

ICML 2024 also featured a number of exciting papers on audio AI.

 

The current trend in this field is a focus on more sophisticated generative models, which is evident across areas from music generation to general-purpose audio synthesis.

 

One of the most impressive studies was "DITTO: Diffusion Inference-Time T-Optimization for Music Generation." This paper introduces a technique that allows for precise control over the intensity, melody, and structure of generated music, marking a significant step forward in music AI. The future of music generation looks incredibly promising with such advancements.

 

 

 

Novack, Zachary, et al. "Ditto: Diffusion inference-time t-optimization for music generation."

 

 

 

Additionally, Video-to-Audio Generation has emerged as a hot topic. With tools like OpenAI’s “Sora” pushing the boundaries of video generation, the creation of matching audio for these videos has become a critical research area. Google proposed a model called "VideoPoet" that generates both video and audio simultaneously, while Adobe introduced "Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity," focusing on syncing sound effects with video actions.

 

 

 

 

 

These developments naturally reminded me of Gaudio Lab’s own sound effects generation model, FALL-E. It’s a technology that even caught the attention of Microsoft’s CEO, Satya Nadella, at CES. A few months ago, we demonstrated how FALL-E could generate sound effects perfectly synchronized with Sora videos. Seeing these trends at ICML reinforced my pride in how quickly Gaudio Lab is catching on to industry trends and reaffirmed our team’s research direction.😎

 

 

 

Conclusion

 

In this post, I’ve shared some of the diverse trends and fascinating research from ICML 2024.
I was so engaged in exploring all the new studies that I found myself running around the conference hall to keep up with everything! 🏃‍♀️

 
 

One particularly fun experience was a poster session where researchers were given questions about large language models (LLMs) and then encouraged to debate, turning the session into a lively and spontaneous event.

 

 

Attending ICML 2024 and absorbing all the new trends and research has left me feeling inspired and ready to return to Korea to continue developing smarter AI models. I’m excited to apply the knowledge and inspiration I gained from the conference.

 

Hopefully, at the next conference, we’ll see Gaudio Lab’s research featured as a spotlighted paper! 💪 Until then, this wraps up my ICML 2024 recap. 😁

 

 

pre-image
Introducing Gaudio Sing, the Next-Generation Karaoke System

  Ever Heard of a Global Audio AI Tech Company Created Karaoke System?   After we, Gaudio Lab, captured worldwide attention with our generative AI 'Fall-E', we're now on the brink of officially launching Gaudio Sing, our cutting-edge karaoke system. You might be wondering, "Karaoke?" Well, the concept behind Gaudio Sing is actually quite relatable—it all started with a simple desire that many of us have had: singing duets with our favorite artists or belting out tunes to original backing tracks.   At the heart of our innovation lie our core AI technologies. GSEP (Gaudio source SEParation) allows real-time vocal removal, while GTS (Gaudio Text Sync) synchronizes lyrics—an AI marvel adored by music streaming services worldwide. Together, these technologies give birth to a karaoke feature where you can sing along to original tracks as if by magic.   But that's not all; our expertise in signal processing and AI enhances features like sound filters (Reverb, Echo), drum/beat separation for pitch and speed adjustment, and detailed scoring capabilities. What may seem like a simple karaoke setup is, in fact, a sophisticated blend of state-of-the-art audio technologies.   Gaudio Sing embodies our commitment to revolutionizing audio experiences, perfectly aligning with our mission to deliver the pinnacle of sound quality to users worldwide.       "Revolutionizing Karaoke ‘Spaces’ with Innovative Technology"   Traditional karaoke spaces are currently challenged by various karaoke apps and software. These are available on a wide range of platforms, from smartphones to smart TVs, but they still haven't become widely popular. A comfortable 'space' to sing is crucial for users, yet existing apps often overlook the importance of a new user experience and a comfortable singing environment.   We believe in the importance of karaoke spaces and the experiences within them. Providing an exceptional sound experience in a karaoke space is paramount. Our goal isn't merely to launch new karaoke software but to revolutionize the existing karaoke experience with Gaudio Lab's technology.   With Gaudio Sing, in a karaoke room, users can select songs via their smartphones and enjoy features like the automatic creation of personalized playlists. We also offer fun elements that allow users to assess their singing skills on a national level, enabling competition and collaboration with others. Since our system is implemented purely through software without traditional accompaniment machines, we can use augmented reality (AR) technology to provide an even more immersive karaoke room experience.   We aim to transform traditional karaoke rooms into cultural spaces where families, friends, and individuals can relax and enjoy themselves. This experience, which can't be replicated by mobile apps or smart TVs, is unique to physical spaces.       Starting in the Birthplace of Karaoke: Japan   Our karaoke culture originated from Japan, the birthplace of karaoke, but it has evolved differently in Korea. In Japan, it's common to see long lines forming outside popular karaoke facilities early in the morning, a stark contrast to Korea's karaoke rooms that typically operate only at night. Japanese karaoke isn't just about singing; it's a "multi-room" concept where people can gather to sing, practice instruments alone, watch idol performances, or even wait for friends as an alternative to a café.   With Gaudio Lab's core technology, GSA (Gaudio Spatial Audio), users can enjoy optimal sound effects that make them feel like they're in a concert hall. Additionally, GSEP allows for the separation of vocals and various instruments, enabling users to practice guitar solos to original tracks in karaoke rooms.   The COVID-19 pandemic heavily impacted Japan's karaoke market, but it still maintains a significant size of 2.8 billion dollars and is quickly recovering to pre-pandemic levels. The market is dominated by two hardware manufacturers and over ten karaoke chain operators who use this hardware, leading to a closed structure that has slowed digitalization. This market size, the positive image of karaoke culture, and the outdated user experience due to the monopolistic structure present a unique opportunity for us. Our collaboration with a key player in Japan's karaoke industry has affirmed that Japan is the perfect place to bring our vision of Gaudio Sing to life.   In Japan, karaoke goes beyond mere entertainment; it's a crucial cultural element that fosters social bonds. People use karaoke to relieve stress and enjoy time with friends and family. Gaudio Lab aims to leverage this positive image of karaoke and introduce innovative technology to carve out a new market within Japan. Success in Japan will also positively impact our expansion into other countries.       In Conclusion: Music - The Universal Language   Music possesses a powerful ability to unite people across languages and cultures. The phrase "Music is the universal language" exists for a reason—it's beloved worldwide. At the heart of this culture lies karaoke. We believe that Gaudio Sing, offering this universal joy in a new way, will bring positive changes not only in Japan but also in our own country.   Thank you for joining us on this journey into the future of karaoke. Stay tuned as we revolutionize the way we sing, connect, and celebrate through the power of music.  

2024.07.22
after-image
HEY MA&PA - An unforgettable day at work with our kids

    We organized the "HEY MA&PA" event for Gaudio members and their children, who were on summer break, to spend some quality time together.   The event featured a variety of activities designed for both parents and kids, allowing families to bond and create lasting memories. Most importantly, we wanted the children to experience their parents' workplace firsthand, strengthening family ties and instilling pride in the work their parents do.   At first, the kids shyly held their parents' hands, but as the day went on, their excitement grew, and they became more engaged. Curious about what made this event so memorable for them? Let’s take a look!       A Day at Dad’s Workplace!           The theme of the event was "A Picnic at Mom and Dad's Office 🧺." With picnic chairs, checkered mats, and delicious food set up in our lounge area, the day kicked off with a fun adventure for the kids. Gaudio Lab’s lounge, usually a relaxation spot for employees, was transformed into a cozy picnic area where families could unwind. The kids had the chance to explore the office and see what their parents' workplace is like up close.     <Jayden's child, Shinyoung, first to arrive>       Exploring the Office       As soon as the children arrived, they embarked on a series of missions, guided by our staff. These missions took them to various parts of the office, giving them a glimpse of where their parents spend most of their day.   Gaudio Lab’s office, modeled after locations in Jeju Island, is designed to support sound research and development, making it a unique space for the kids to explore.   At the "Bijarim" area, the kids got to experience Gaudio Lab’s audio technology, taking on the role of junior engineers. They also posed for pictures at their parents' desks, tapped away on keyboards, and explored the workspace while enjoying snacks. Some even checked out their parents' New Year’s resolutions, with a few children candidly revealing whether their parents were sticking to them! 😆   After completing all the missions, the kids received family t-shirts and cute stickers as rewards, and the day ended with a family photo in front of our large media wall, "Ora".     <Children experiencing Gaudio Lab's spatial sound technology at Bijarim>      <Kids learning about what their parents do at work>     <Johnny's family taking a photo at the photo zone>       CEO Henney's Special Mission: Keeping the Kids Engaged         No company event would be complete without a welcome speech! Our CEO, Henney, was tasked with introducing Gaudio Lab to the children in an exciting way. By demonstrating Gaudio Lab’s music separation technology using the kids’ favorite songs, Henney kept their attention while proudly highlighting the important work their parents do. It was a proud moment for both the kids and the parents. 😃       Bath Bomb Making Time   To give the children a memorable experience in the “Aewol” meeting room—typically reserved for client meetings—we decided on a bath bomb-making activity. This hands-on project was chosen because it was something kids of all ages could enjoy. We brought in an instructor, and the kids threw themselves into the creative process, crafting their own bath bombs with enthusiasm. It was heartwarming to see them proudly showing off their creations to their parents 🥰. Meanwhile, the parents enjoyed a well-deserved moment of relaxation.       <Fun bath bomb making activity>        The Grand Finale: A Group Photo   To wrap up the event, we gave each family matching t-shirts, which they wore for a group photo. Seeing everyone smiling and wearing their shirts together was the perfect way to close the day. The memories captured in these photos will be treasured by all who attended.       For the children, this day was likely an unforgettable experience spent with their parents. And for the parents, sharing a part of their daily work life with their kids made the day truly special. This event wasn’t just a family outing—it was an opportunity for the children to better understand and take pride in their parents’ work. Gaudio Lab is dedicated to creating more moments like this for families in the future. A big thank you to everyone who participated, and we look forward to seeing you at the next event! 🥳     Related article: https://www.econovill.com/news/articleView.html?idxno=663840

2024.09.30