Gaudio Lab's Continuous Challenge towards Clear Voice

Gaudio Lab's Continuous Challenge towards Clear Voice - Just Voice Lite Release

2024.03.08 ・ by Howard Jo

I Want to Hear Your Voice Better

Bam! Bang! Tat tat tat tat I’m out, I need another ??? ??? anybody ??? ???

Hi, this is Howard, the product manager for Just Voice Lite.

Imagine a scene in a war movie, filled with the sound of bombs exploding.

The characters on the screen are having a conversation, but it's almost impossible to hear them over the bombs, gunfire, and loud background music. In such situations, we want to hear the actors' voices more clearly. If the sound effects and background music added by the sound director to enhance the atmosphere end up drowning out the voices of the main actors, we might miss out on crucial parts of the story.

At this point, what would happen if you increased the audio volume to hear the dialogue better?

The overall volume might become so loud that it feels like your eardrums are about to burst. Particularly if you're wearing earphones or headphones, the audio volume you've set might already be too loud to increase further for better dialogue clarity. Conversely, reducing the volume might render the dialogue almost inaudible, creating an ironic situation where you can't lower the volume either. In the end, these situations ultimately make it necessary to rely on subtitles even when watching movies in one's native language.

Eardrum Torchure

This isn't what I wanted...

Wouldn't it be great if there were a beautiful technology that could selectively enhance the voices of actors in such situations, making them clearer?

Or better yet, if the content were crafted with better dialogue clarity from the outset, such dilemmas wouldn't arise. The perspectives of artists striving to convey the essence of their content and audiences eager for better dialogue clarity are always bound to differ.

Is it just movies, though?

We often find ourselves straining to hear the speaker's voice in poorly recorded concert performances, travel YouTubers' video recorded in noisy environments, and videos of bike/car club rides, etc. What if you are on a scenic beach with waves crashing in, or at an outdoor coffee shop, and you're strumming a guitar and singing to your girlfriend/boyfriend over video call?

When noise or clutter makes it difficult to hear the voices we care about,

and when we want to hear the other person's voice in that space a little better,

we seek solutions for a better listening experience.

Just Voice, the voice enhancement application, emerges to quench this thirst. 🤓

Real-time Processing Using On-device AI

Let's get a little technical.

So how on earth do we solve this problem?

Some might suggest, "What if there were a voice volume-up button on the remote control that could amplify only the vocal elements?" ~~You're kidding!~~

It’s not as easy as it sounds. Technically speaking, such an action would need to be processed within milliseconds (thousandths of a second). That's because it needs to be done in real-time while you're watching the video. In other words, it means that the voices should be promptly separated and processed to ensure clear audibility while playing, and of course, the video and audio should be perfectly synchronized when the output is sent back to you.

But what if GaudioLab's researchers step in?

After nearly two years of extensive research, Gaudio Lab has successfully developed this challenging technology.

Utilizing the world's leading voice separation technology GSEP(Gaudio Source Separation) and On-device AI technology, we've created an engine that enhances voice clarity in real-time, Low Delay GSEP. (To be more precise, it takes less than 30ms to process) This technology eliminates surrounding noise, and accentuates the desired voice, making it easier to hear. Of course, to achieve real-time processing, there's a slight, very slight performance trade-off compared to the non-real-time voice separation technology GSEP.

With this technology, you can hear the desired voice more clearly without being disturbed by environmental or background music in all the video content you consume. We believe it will provide a better listening experience for those who want to enjoy content according to their preferences.

Experience Just Voice Lite for macOS

Just Voice Lite

Try Just Voice Lite!

The first in the Just Voice app series, Just Voice Lite for macOS, is now available.

If you're a macOS user, you can enhance voice clarity in all sound environments, whether it's video conferencing, watching movies, or listening to music, through the Just Voice Lite app. Designed to amplify voices while leaving surrounding sounds intact, Just Voice Lite ensures that viewers can fully enjoy the content without compromising the intended sound effects by the creators.

How much does it cost?

Just Voice Lite is available for free.

To put it bluntly, this app simply separates voices from content and slightly increases voice volume. But this technology has infinite potential.

What if we could modify the separated voice in real-time? What if we could apply real-time pitch shift or de-reverberate the voice recorded in echoing spaces? Or, by separating the voice and applying Gaudio Lab's spatial audio technology (GSA, Gaudio Spatial Audio) to the surrounding noises, we could enhance the audio spatial perception as desired.

Instead of passively consuming content created by artists, imagine being able to change elements of the content according to the audience's real-time demands. If you put a value on that freedom, how much are you willing to pay for it? The continuous endeavor to support the audience's free consumption of content is Gaudio Lab's direction towards Metaverse audio.

“We Also Want to Process it On-device!”

Do you need Just Voice SDK?

If you're a developer, there's good news. We have also prepared the Just Voice SDK. If you need an audio engine for hearing aid software, video conferencing systems, AI Contact Centers(AICC), or language learning applications, don't hesitate to reach out.

Oh! Of course, while the Just Voice Lite app is developed for the purpose of amplifying voices, the Just Voice SDK can completely eliminate surrounding noise with its noise reduction(De-noise) capabilities. The choice of how to utilize it is entirely up to you as the user.

Oh, by the way, you still haven't tried Just Voice Lite?

If you're a macOS user, give it a try anytime.

Experience the future of audio that will make the voices you want to hear clearer! Try it now!

🔗Go to Mac App Store

SeparationJust Voice

Gaudio Studio – Your favorite songs, as you’ve never heard them before

Gaudio Studio – Your favorite songs, as you’ve never heard them before Last year, MusicRadar reviewed five of the most popular stem separation software tools that are available today, and Gaudio Studio emerged as winner of the battle against Serato Sample, Acon Digital, DeepRemix and FL Studio. To the judge’s surprise, the champion was the only one free. We are honored that Gaudio Studio’s first overseas coverage has such a flattering review, as well as valuable feedback, and believe it’s a good time as any for its official introduction by Gaudio Lab. Gaudio Studio is our web-based AI sound source separation service, powered by cutting-edge audio AI models. Currently under beta, Gaudio Studio offers two features that are fun and easy for anyone to use and still performs frighteningly well: Instrument Separation – A stem separation tool that can isolate vocals and instruments from any music that you want Noraebang – An instant karaoke maker with vocal separation and lyrics synchronizing capabilities, title after the Korean word for karaoke. Audio Stem Separation Before we go on, what is stem separation? While sound source separation refers to the general practice of eliminating or extracting desired sounds from the original audio, stem separation refers to the more specific task of isolating sounds of individual tracks, or ‘stems’ from a mix. The conventional problem definition in the modern music industry is the separation of four stems, namely the vocals, bass, drums and other instruments, for their ubiquity and distinction in character. Traditionally, stem separation relied on signal processing techniques using manually crafted features and were mostly limited for use in simple audio scenarios. However, recent advancements in artificial intelligence have opened up the possibility for stem separation of more complex tracks with many instruments and diverse tones. Given enough training data, deep learning models can be trained to distinguish the intricate patterns of different instruments autonomously and adaptively. But even with deep learning, designing a well-performing stem separation model is no walk in the park, and many AI-based programs available today still produce results mixed with artifacts and distortions. This is especially so for mixes with multiple instruments masking each other in terms of timbre and loudness. In fact, it is often very much a challenging task even for humans to do with untrained ear, let alone for AI. GSEP and Instrument Separation Gaudio Studio’s Instrument Separation provides one of world’s most reliable – if not the most reliable – stem separation service out there, as tested and approved by our users each and every day. With a simple utility, the current version supports isolation of up to 6 instruments for the music of your choice, including electric guitar and piano on top of the aforementioned four-stem system. The other unselected or undefined stems are all tied up into the Other Instruments stem. After the instruments are chosen, the separation request is loaded to a queue and the processed results become available for playback and download. At the core of the technology is Gaudio Lab’s AI separation model GSEP, short for Gaudio source SEPeration, which boasts state-of-the-art performance that has outshined its competitors since its release in 2021. Developed with utmost attention to greater sound quality, GSEP delivers clean and natural separation results that are often indifferentiable from stand-alone studio recordings. Compared to other AI separation solutions, some of the most common issues that plague sound quality such as over-suppression (muffled sounds) and loudness inconsistency (fluctuations) are rarely heard. Of course, readers are welcome to listen for themselves, either by trying out with their own examples or checking out some of the comparisons already made by other users, like this one. Sure, GSEP sounds good (no pun intended). But it has also surpassed many other stem separation models under objective criteria, having reached an SDR (Signal-to-Distortion Ratio) of 10 dB for vocals and 16 dB for accompaniments in a 2021 external evaluation. Here, SDR is a key metric commonly used for audio separation. It measures the amount of undesirable distortions in the result in comparison to the ideally separated signal. For reference, every 10 dB increase in SDR means that the distortions of the results are 10 times less significant. While this in itself implies that GSEP’s record is an impressive feat, it also means that GSEP scores even higher than the latest version of Meta’s Demucs. Behind GSEP’s exceptional quality lies Gaudio Lab’s sincerity and passion for audio in general. Not only are our AI team members also audio enthusiasts, but they create a special synergy with our Audio team, strongly based in audio signal processing, for applying deep learning within the domain of sound. Together they decide what kind of psychoacoustic considerations, additional databases and model architecture would lead to more versatile and reliable audio separation. GSEP is continually refined by our developers with ongoing training aimed at not only achieving higher SDR but also actual superior sound quality, ensuring that the results meet the highest standards at the perceptual level. GTS and Noraebang GSEP’s clean vocal-accompaniment capabilities naturally led to the idea of a karaoke backing track generator. Together with an automatic lyrics synchronization technology, the idea was soon developed and implemented as Gaudio Studio’s Noraebang. With it, all you need to do is upload a music of your choice along with its lyrics, and the rest of the karaoke experience is set up by the AI engine. The web interface of Noraebang displays the synchronized lyrics highlighted word-by-word in precise timing with the music playback, delivering a karaoke experience accessible from any device. Working in tandem with GSEP under the hood of Noraebang is Gaudio Lab's GTS – Gaudio Text-Synchronization – a robust tool for aligning speech audio with corresponding text. While the challenge of first identifying vocals within complex musical structures is rendered trivial with GSEP’s sound separation capabilities, GTS handles the remaining problem of correlating and generating time stamps between the speech information and the natural language text. GTS is an adaptable AI model that is designed to be robust against across different rhythmic styles, tempos and vocal nuances. A part of its adaptability comes from its indifference to the specific language of the text, as it is not trained to recognize the sounds of individual languages, but rather the sounds of phonemes that match with the International Phonetic Alphabet (IPA). Simply put, all GTS needs in order to learn a new language is its pronunciation scheme using a dictionary of words tagged with their IPA symbols, a well-documented data for most common languages. GTS achieves highly consistent results independent of the song’s genre or artist, but without compromise in speed and quality. Processing long text and audio sequences requires high computation cost and time. GTS’s model deals with this problem by adopting a hierarchical structure in which alignment predictions are first made at sentence level, then recursively at word level. This allows inference time of under 5 seconds to synchronize an entire song and an impressive accuracy of around 99% regardless of the song’s length and complexity. Using Gaudio Studio Beta So, you can use Instrument Separation and Noraebang to create and share isolated tracks on a whim and even instant karaoke versions of your favorite songs. Of course, no worries even if the music of your choice is instrumental only – GSEP is trained on individual stem types and faithfully works on those requested by the user. Another reason why Gaudio Studio is so useful is that you can use its services wherever you want, however you want. It supports audio inputs from lossless to compressed formats (including flac, wav, mp3 and m4a), as well as video files video urls without the need of conversions or downloads. Since Gaudio Studio is accessible through either PC or mobile devices, it is as easy to use for casual mobile users who want to try out a few songs for fun, as it is for more serious hobbyists and musicians who want to process batches of high-quality samples in their desktops. Despite all that, Gaudio Studio is still under beta and there are a few limitations. While GSEP and GTS are frontrunners in their fields without a doubt, there is much room for improvement with corner cases and functionalities. Our developers are not satisfied short of perfect and are constantly investigating and logging points of improvement and tweaks. Users may also feel that they currently have to wait a bit too long for their requests to be processed and may wish to download the results in a higher quality format than mp3. We want to assure fans and supporters that future updates are under way and that they can look forward to added stem options, higher performance and better utility. Try for yourself. At Gaudio Lab, we love to hear how the users of Gaudio Studio apply stem separation in so many diverse ways, from simplifying transcription tasks by separating individual instruments to crafting personalized backing tracks for practice sessions, and even extracting unique samples for homage in new compositions. Now and then, we are pleasantly surprised when we come across use cases that we could not have imagined. What would you do with Gaudio Studios’s AI sound separation technology? Try it out for yourself! We are eager to find out.

2024.02.15

GAUDIO STUDIO Sound Separation Tips - A Sound Engineer's Guide 🐝

Hello, this is Bright, a sound engineer from Gaudio Lab! These days, many fields are utilizing AI to increase productivity, and I'm sure you've come across AI tools at one point or another. But have you ever wondered how a sound engineer at an audio AI company utilizes AI? I remember back when I was young, I had to struggle through Google to practice MR production and mixing. I recall the difficulty of separating MRs myself or downloading multi-tracks shared as learning materials. The processes were cumbersome, and even the works completed with such effort, the quality were not so good. 😭 But now, with the era of AI, all that hardship has become a thing of the past! Especially with the commercialization of AI technologies for separating audio sources, many tasks in the audio industry have become much simpler. As a sound engineer, I think it's a great era where we can fully focus on creativity. Today, I'd like to introduce various tips for GAUDIO STUDIO, one of the tools I use the most. It boasts top-notch performance among various AI audio separation services, and by following along slowly, you too can become a top-notch sound engineer like me😎 🍯 Tip 1 - Create an MR Step 1 - Separating the vocals How do I remove vocals from the music in GAUDIO STUDIO? This is one of the most common questions I get asked, and I believe many people use GAUDIO STUDIO primarily for MR production for events like karaoke, celebrations, and more. Only 'Vocals' and 'Other Instruments' selected All instruments selected In GAUDIO STUDIO, you can separate the sound source by selecting the instruments you want (vocals, drums, bass, electric guitar, piano, and other instruments). So if you select vocals only and separate them, you can create an MR, right? The AI will take care of the rest, making it easy to create MRs with just a few simple clicks! Step 2 - Key up / down How do I customize the key of my MR? If you don't already have a music editing program, I recommend Audacity - it's free, has tons of hidden features, and I used it a lot during my student days. Now that you're all set up, let's try it step by step! First, click [File] → [Import] → [Audio] at the top to import the sound source, then double-click on the loaded file to select it entirely. Then, go to [Effect] → [Pitch & Tempo] → [Change Pitch] to adjust the key and that's it! You can also fine-tune it, so play around with it a few times to get it to the pitch you want. For those who followed along well so far, do you feel something strange? Or do you want to create a high-quality MR different from others? There's one thing we often overlook. Drums don't have pitches! Because of this, if you change the key with a drum track included, the pitch of the drumbeat also changes, affecting the overall quality. 😎 Now, here's a trick: try adjusting the key of the rest of the instruments, this time without the drum track, and then put them back together with the drums. That weird dissonance should be gone! Step 3 - Put it to use So what more can I do with this? After separating the MR and adjusting the keys, you can create content like this. Do you get the idea? You can create duets with singers who have different vocal keys! If you further process the separated voices with the Voice Conversion AI learning model, you can also create AI cover content, which is trending these days. Of course, the better the quality of the separated voices, the better the trained results, which is why I've heard that many people use GAUDIO STUDIO a lot. 👀 Aren't you curious about your favorite singer singing songs by other artists? 😎 There are endless possibilities for using GAUDIO STUDIO like this. 🍯 Tip 2 - Adjusting a specific track in an already recorded song This time, let me show you an example of how you can use GAUDIO STUDIO in situations you might encounter in your daily life. Situation 1 - You've just finished a really great ensemble, but the drums are just too loud! In such cases, if you separate only the drum track and adjust the volume, you'll be able to bring out the other instruments. Similarly, reducing excessively thumping beats in concert footage can highlight the artist's voice more. Even recordings that seemed impossible to separate or footage that seemed impossible to adjust specific sounds can now be excellently mixed and uploaded! Situation 2. I filmed a vlog in a cafe and it was recorded with copyrighted music! If you've ever filmed a outdoor vlog for YouTube and the music from a store is recorded along, it could be detected as a copyrighted element, which could limit your monetization. Perhaps until now, you've probably just turned down the volume or raised your voice, and if that didn't work, you might have ended up deleting all the sounds and recording narration separately. 😎 Now, you don't need to do that anymore. Just separate your voice and cleanly remove unwanted music. With just GAUDIO STUDIO, you no longer have to suffer from unexpected copyright issues! How did you find the endless applications of AI music separation that I introduced? I'm often amazed at how tasks that were difficult or required tremendous effort in the past are now so easily accomplished. Why not use the magic of GAUDIO STUDIO to create and enjoy your own unique content? GAUDIO STUDIO will continue to evolve until, in the not-too-distant future, all track stems will be neatly separated when you just insert a stereo file. We look forward to your continued interest and enjoyment!

2024.03.26