Introducing Gaudio Sing, the Next-Generation Karaoke System

2024.07.22 ・ by Jin Yoon

Ever Heard of a Global Audio AI Tech Company Created Karaoke System?

After we, Gaudio Lab, captured worldwide attention with our generative AI 'Fall-E', we're now on the brink of officially launching Gaudio Sing, our cutting-edge karaoke system.

You might be wondering, "Karaoke?"

Well, the concept behind Gaudio Sing is actually quite relatable—it all started with a simple desire that many of us have had: singing duets with our favorite artists or belting out tunes to original backing tracks.

At the heart of our innovation lie our core AI technologies. GSEP (Gaudio source SEParation) allows real-time vocal removal, while GTS (Gaudio Text Sync) synchronizes lyrics—an AI marvel adored by music streaming services worldwide. Together, these technologies give birth to a karaoke feature where you can sing along to original tracks as if by magic.

But that's not all; our expertise in signal processing and AI enhances features like sound filters (Reverb, Echo), drum/beat separation for pitch and speed adjustment, and detailed scoring capabilities. What may seem like a simple karaoke setup is, in fact, a sophisticated blend of state-of-the-art audio technologies.

Gaudio Sing embodies our commitment to revolutionizing audio experiences, perfectly aligning with our mission to deliver the pinnacle of sound quality to users worldwide.

**"Revolutionizing Karaoke ‘Spaces’ with Innovative Technology"**

Traditional karaoke spaces are currently challenged by various karaoke apps and software. These are available on a wide range of platforms, from smartphones to smart TVs, but they still haven't become widely popular. A comfortable 'space' to sing is crucial for users, yet existing apps often overlook the importance of a new user experience and a comfortable singing environment.

We believe in the importance of karaoke spaces and the experiences within them. Providing an exceptional sound experience in a karaoke space is paramount. Our goal isn't merely to launch new karaoke software but to revolutionize the existing karaoke experience with Gaudio Lab's technology.

With Gaudio Sing, in a karaoke room, users can select songs via their smartphones and enjoy features like the automatic creation of personalized playlists. We also offer fun elements that allow users to assess their singing skills on a national level, enabling competition and collaboration with others. Since our system is implemented purely through software without traditional accompaniment machines, we can use augmented reality (AR) technology to provide an even more immersive karaoke room experience.

We aim to transform traditional karaoke rooms into cultural spaces where families, friends, and individuals can relax and enjoy themselves. This experience, which can't be replicated by mobile apps or smart TVs, is unique to physical spaces.

Starting in the Birthplace of Karaoke: Japan

Our karaoke culture originated from Japan, the birthplace of karaoke, but it has evolved differently in Korea. In Japan, it's common to see long lines forming outside popular karaoke facilities early in the morning, a stark contrast to Korea's karaoke rooms that typically operate only at night. Japanese karaoke isn't just about singing; it's a "multi-room" concept where people can gather to sing, practice instruments alone, watch idol performances, or even wait for friends as an alternative to a café.

With Gaudio Lab's core technology, GSA (Gaudio Spatial Audio), users can enjoy optimal sound effects that make them feel like they're in a concert hall. Additionally, GSEP allows for the separation of vocals and various instruments, enabling users to practice guitar solos to original tracks in karaoke rooms.

The COVID-19 pandemic heavily impacted Japan's karaoke market, but it still maintains a significant size of 2.8 billion dollars and is quickly recovering to pre-pandemic levels. The market is dominated by two hardware manufacturers and over ten karaoke chain operators who use this hardware, leading to a closed structure that has slowed digitalization. This market size, the positive image of karaoke culture, and the outdated user experience due to the monopolistic structure present a unique opportunity for us. Our collaboration with a key player in Japan's karaoke industry has affirmed that Japan is the perfect place to bring our vision of Gaudio Sing to life.

In Japan, karaoke goes beyond mere entertainment; it's a crucial cultural element that fosters social bonds. People use karaoke to relieve stress and enjoy time with friends and family. Gaudio Lab aims to leverage this positive image of karaoke and introduce innovative technology to carve out a new market within Japan. Success in Japan will also positively impact our expansion into other countries.

In Conclusion: Music - The Universal Language

Music possesses a powerful ability to unite people across languages and cultures. The phrase "Music is the universal language" exists for a reason—it's beloved worldwide.

At the heart of this culture lies karaoke. We believe that Gaudio Sing, offering this universal joy in a new way, will bring positive changes not only in Japan but also in our own country.

Thank you for joining us on this journey into the future of karaoke.

Stay tuned as we revolutionize the way we sing, connect, and celebrate through the power of music.

AI Text Sync (GTS)GSEP MusicSeparation

Design in Audio AI: Do we really need it?

Hello! I'm Anne, a designer at Gaudio Lab :) GAUDIO STUDIO has been revamped. We recently launched a new look for our online AI stem splitting serivce, Gaudio Studio. Gaudio Studio is a service that separates vocals and instruments (bass, drums, electric guitar, piano, etc.) into individual tracks from the music you want. It is equipped with GSEP, a music separation technology developed by Gaudio Lab, and shows excellent performance compared to other music separation services. I was responsible for the UIUX redesign of Gaudio Studio, a project to improve the beta version and launch it as a public service. In this post, I will talk about "the considerations an audio AI company designer had" in launching the improved service. Is design important for an audio AI company? When I first joined Gaudio Lab, I thought, 'This is an audio AI company, so the culture will be more focused on technical roles than design.' But as time went by, my preconceptions disappeared. Instead, I realized that design is important because we're an audio AI company. This is because we need to visualize the invisible "sound" and we need to make AI services, which may be unfamiliar to the general public, easy to use for everyone. Design Goals for the NEW Gaudio Studio Our goal for the UIUX of the new Gaudio Studio was to "provide a friendly design so that users don't get lost in the music separation process". Have you ever been amazed by a website at first glance but found yourself confused when trying to use it? Asking questions like, 'Where is the menu? What do I click next? Where does this button take me?' repeatedly. I thought that since Gaudio Studio is a place where users come not just to find information but to use the service, the design should serve as a good guide. So that users can take their next action without hesitation or questions as soon as they enter. It's an AI service. Does design even matter? This is the question a friend asked me while watching me struggle to achieve the above goal. Without hesitation, I shouted "Yes, it matters!" but I couldn't explain exactly why, because in the back of my mind, I thought that maybe all users want is cheap and good technology. Every time the thought crossed my mind, I kept recalling what I wanted to achieve after the service launch. What I wanted was, To make our service more appealing to users who have been using other sites for separating music, to give existing users of the beta version of Gaudio Studio an upgraded experience, and I wanted to provide a design that new users could easily adapt to without difficulty. When designers lose their edge, product owners (POs) lose their hair Unfortunately, I didn't achieve what I wanted from the beginning. After the service launch, I received feedback that the music separation process was confusing. It's embarrassing, but I'll share my trial and error. This is the screen I initially designed. The user uploads a song and selects the instruments they want to separate.If you press the vocal button on this screen with nothing selected, the music is separated into two tracks: one with vocals and one without vocals. I lost sight of my initial goal of aiming for 'easier, more convenient' and tried to create a clean screen by omitting various explanations. As a result, after the service launch, I received a lot of feedback saying, "I want to create an instrumental(backing track) file, but I don't know how." When we looked at the data, many users selected all instruments instead of just the vocals because they thought they had to remove the vocals. User frustration mounted, the PO's hair was falling out, and my guilt was building... We had an emergency meeting and decided to revamp the design, and here are the new screens. Easier, more convenient! The biggest change is the addition of a preview screen. From the moment nothing is selected, it shows the number of tracks that will be separated and provides a guide. When the vocal button is pressed, it shows in advance that two tracks will be provided, indicating that the vocal and instruments (backing track) will be provided. This clearly shows the results that the user will get before they purchase credits or request music separation. We're an audio AI company, so design matters After improving the instrument selection screen, I first showed the changed screen to my friend who asked, "Does design even matter?" This friend also stabbed my heart by asking, "How do you use this?" when we first opened, but fortunately, she said it was much easier to understand after seeing the improved design. At this point, I think it's safe to say that design matters because it's an audio AI company. Right? (Please say yes...) In conclusion... At the time of writing this post, the improved instrument separation screen is not yet live. I'm sure you're curious to see how real users react to it - I know I am! If I get the chance, I'll come back with reviews and more design stories unique to Gaudio Lab. This concludes the design story of an audio AI company told by Anne, a designer at Gaudio Lab. Oh! Please use our service, https://studio.gaudiolab.io/ a lot :)

2024.06.28

Exploring ICML 2024: Latest Advances in AI and Audio Research

Introduction Hello again, it’s Kaya back with more updates!As a researcher focused on audio AI at Gaudio Lab, I often get the chance to attend various conferences. You might remember my recent post on ICASSP 2024 & Gaudio Night. This time, I’m excited to share my experience attending the ICML (International Conference on Machine Learning) 2024, which was held in beautiful Vienna, Austria. ICML is one of the most prominent AI conferences in the world, along with ICLR and NeurIPS, and Gaudio Lab makes it a point to attend every year. It’s an incredible opportunity for researchers like me to connect with others in the field and discover the latest breakthroughs. Unlike my last post, which focused on the event atmosphere, this one will dive into the innovative ideas and insights I gained from ICML 2024! The venue of ICML 2024 held in Vienna, Austria. Wien~ Highlights from ICML 2024 Spotlighted Papers Let’s start with a few standout research papers I encountered. The first one that caught my eye was "Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution." This study introduces a novel approach to handling discrete data, showcasing a method called SEDD that outperforms existing SOTA models like GPT-2 in the natural language processing field. This research significantly broadens the applicability of diffusion models and has made a notable impact in the AI community. Lou, Aaron, Chenlin Meng, and Stefano Ermon. "Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution." Another intriguing study was titled "Debating with More Persuasive LLMs Leads to More Truthful Answers." This paper presents experimental evidence that AI models can enhance their accuracy through debates—a concept that once seemed confined to the realm of imagination. This research could play a crucial role in addressing issues with misinformation in models like ChatGPT. Khan, Akbir, et al. "Debating with More Persuasive LLMs Leads to More Truthful Answers." The Introduction of “Position” Papers A new category of papers made its debut at this year’s ICML: ✨Position✨ papers. These papers don’t necessarily propose new models or ideas; instead, they challenge current academic norms and provoke deep reflection. One particularly compelling paper was titled "Position: Measure Dataset Diversity, Don’t Just Claim It." This study argues that simply asserting dataset diversity isn’t enough. The authors analyzed 135 image and text datasets to offer a fresh perspective on the topic, reminding us as AI researchers to deeply consider the fairness and inclusiveness of datasets. Trends in Audio Research ICML 2024 also featured a number of exciting papers on audio AI. The current trend in this field is a focus on more sophisticated generative models, which is evident across areas from music generation to general-purpose audio synthesis. One of the most impressive studies was "DITTO: Diffusion Inference-Time T-Optimization for Music Generation." This paper introduces a technique that allows for precise control over the intensity, melody, and structure of generated music, marking a significant step forward in music AI. The future of music generation looks incredibly promising with such advancements. Novack, Zachary, et al. "Ditto: Diffusion inference-time t-optimization for music generation." Additionally, Video-to-Audio Generation has emerged as a hot topic. With tools like OpenAI’s “Sora” pushing the boundaries of video generation, the creation of matching audio for these videos has become a critical research area. Google proposed a model called "VideoPoet" that generates both video and audio simultaneously, while Adobe introduced "Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity," focusing on syncing sound effects with video actions. These developments naturally reminded me of Gaudio Lab’s own sound effects generation model, FALL-E. It’s a technology that even caught the attention of Microsoft’s CEO, Satya Nadella, at CES. A few months ago, we demonstrated how FALL-E could generate sound effects perfectly synchronized with Sora videos. Seeing these trends at ICML reinforced my pride in how quickly Gaudio Lab is catching on to industry trends and reaffirmed our team’s research direction.😎 Conclusion In this post, I’ve shared some of the diverse trends and fascinating research from ICML 2024. I was so engaged in exploring all the new studies that I found myself running around the conference hall to keep up with everything! 🏃‍♀️ One particularly fun experience was a poster session where researchers were given questions about large language models (LLMs) and then encouraged to debate, turning the session into a lively and spontaneous event. Attending ICML 2024 and absorbing all the new trends and research has left me feeling inspired and ready to return to Korea to continue developing smarter AI models. I’m excited to apply the knowledge and inspiration I gained from the conference. Hopefully, at the next conference, we’ll see Gaudio Lab’s research featured as a spotlighted paper! 💪 Until then, this wraps up my ICML 2024 recap. 😁

2024.08.30