My Score Is... Introducing the MUSHRA Listening Test

2024.04.11 ・ by Ted Lee

Hello, I'm Ted, and since the start of Gaudio Lab, I've been on board tackling a myriad of tasks.

Recently, we've conducted listening tests to assess the performance of the technology we've developed. Thinking it would be beneficial to have an easily understandable explanation of this listening test, I've decided to jot down a few notes.

Just as when you visit a hospital or watch a medical drama, you might occasionally be asked this question: "On a scale from 0, representing no pain at all, to 10, denoting the most intense pain imaginable, how would you rate your pain right now?" While writing this article, I learned that such a question is referred to as an NRS (Numeric Rating Scale). Since the experience of pain is subjective, the NRS helps to simplify and quantify it in an easily understandable manner, thus effectively aiding in pain management and treatment. It might feel odd discussing medical terminology in an audio-related blog. 🙂

Can Sound Be Quantified? - The MUSHRA Listening Test

So, what about sound?

When there are two sounds, how can we evaluate which one is better?

In the audio field, there have been numerous attempts to develop technology that can objectively evaluate sound without human hearing. Unfortunately, such technology has not yet been perfected. In other words, we have not reached the point where a machine can analyze sound and declare, "This sound scores an 80, human.🤖"

Instead, methodologies that involve listening to and evaluating sounds have been widely used for some time. For example, there are MUSHRA (Multiple Stimuli with Hidden Reference and Anchor), ABX, and MOS (Mean Opinion Score), among others. Today, I'd like to introduce the MUSHRA evaluation method, which is particularly tailored to assess the subtle differences between high-quality audio samples.

MUSHRA stands for the Method of Assessment of audio systemS Handling of Degraded Reference Signals. It is primarily used to evaluate high-quality audio technologies/systems. Standardized by the International Telecommunication Union (ITU), it is especially useful for assessing the subtle differences between audio samples. The core principle of the MUSHRA evaluation involves presenting several test samples simultaneously and asking participants to compare them, rating each on a scale from 0 to 100.

MUSHRA Test

The samples provided include:

Hidden Reference: A high-quality version of the original audio track, used as the highest benchmark for participants to compare other samples against. Participants are unaware that this sample serves as the reference.
Anchor: Typically, a lower-quality audio sample that acts as the lower benchmark for evaluation. This helps participants have a clearer understanding of the rating scale.
Test Samples: Samples generated through various audio systems that are being evaluated.

The Hidden Reference is considered the "correct answer," or 100 points, and the Anchor is set as a low benchmark, roughly equivalent to 20 points. Test Samples are then evaluated on a scale from 0 to 100.

Comparing this to the NRS, if we were to draw parallels, the Hidden Reference could be likened to the most intense pain imaginable, while the Anchor would represent no pain at all. However, unlike the NRS where the no pain mark is set at 0, the Anchor is not set at 0 in MUSHRA evaluations because Test Samples may perform worse than the Anchor. Another distinction from the NRS is that while the most intense pain can vary from person to person, the Hidden Reference is a consistent sound for everyone, making it more objective.

Moreover, MUSHRA includes a post-screening rule to ensure that evaluators do not rate randomly, understand the given instructions well, and have the capacity to sufficiently distinguish between performances. It's quite a systematic approach, isn't it?

We've Tried MUSHRA Listening Test Ourselves.

Understanding this might still be challenging, so let me illustrate with an example from a subjective performance evaluation of the Just Voice SDK, conducted by Gaudio Lab in January.

1) MUSHRA Test Design

The Just Voice SDK is designed for implementation in Mobile, PC, and Embedded systems, offering the capability to eliminate noise in real-time. We aimed to compare its performance with that of Krisp, a noise cancellation technology integrated into Discord, focusing on two main aspects: the effectiveness of noise removal and the clarity of the voice. Both performances were assessed using the MUSHRA method.

The Hidden Reference was recorded in a quiet studio, simulating a typical scenario like a video conference, using various smartphones. The Test Samples were created by adding noise with an SNR of 5dB to the Hidden Reference and then processing these signals with the Just Voice SDK and Krisp SDK for noise removal, respectively, for comparison.

What's interesting is the Anchor. Since the two performances evaluated are different, they necessitated setting different Anchors. For the first performance evaluation, noise removal, the Anchor was set as the signal mixed with noise at an SNR of 5dB before noise removal. For the second performance evaluation, voice clarity, the Anchor was set as the Hidden Reference passed through a 3.5kHz Low-pass filter, leaving only the lower frequency bands - a common method used in voice quality evaluation.

2) MUSHRA Test Procedure

The evaluation was carried out using a tool WebMushra, which features the following UI setup. The Reference plays the Hidden Reference, and Cond. 1 to 4 randomly play the Hidden Reference, Anchor, and Test Samples (Just Voice SDK, Krisp). Evaluators listen to and compare Cond. 1 to 4, attempting to identify the Hidden Reference to award it 100 points, and the Anchor to give it a score around 20 points, a relatively low score. For the remaining two Conditions, they are to assign scores relative to the Reference and Anchor.

WebMushra test page

When conducting evaluations with multiple Test Items, the scores assigned by each evaluator for each Condition are recorded in a csv file, as shown in the image below.

Mushra test result csv

How Did the Results Turn Out?

1) Interpreting MUSHRA Test Results

Once all evaluators have completed their assessments, the post-screening rule is applied to exclude any unfit results. Then, the average scores for each Condition, along with their 95% confidence intervals, are plotted for comparison. A 95% confidence interval means there's a 95% probability that the scores given by evaluators fall within a specific range.

Below are the results for the noise removal performance from our experiment. The grey markers represent the averages, while the blue and orange markers indicate the maximum and minimum of the 95% confidence intervals, respectively. If these confidence intervals do not overlap, it signifies a statistically significant difference in performance between the conditions, meaning they are distinguishable from each other. Moreover, the more evaluators there are, the narrower these confidence intervals become.

2) Noise Removal Evaluation Results

This experiment, with a participation of 66 individuals, was large scale, resulting in quite narrow confidence intervals. Comparing the benchmark (Krisp) with Just Voice, we observe that the confidence intervals do not overlap, and there is a difference of 12.5 points between them. Such a margin clearly indicates a distinguishable performance difference between the two technologies.

Just Voice MUSHRA noise reduction test

When analyzing the listening test results in detail, it's important to examine the outcomes for each Test item. Just Voice was found to have significantly better noise removal performance than the Benchmark (Krisp) in 7 out of 16 Test items at a 95% significance level (indicated in green).

An interesting observation was that in 3 Test items (14p-03_office, 15p-02_hallway, s20p-04_office), the average scores for Just Voice were higher than those for the Hidden Reference (indicated in blue and orange). This was attributed to the inclusion of noise in the smartphone-recorded References used to simulate real-world environments. Just Voice managed to remove noise more effectively without distorting the voice, resulting in higher scores than the Reference, making it nearly indistinguishable from the Reference in terms of noise removal.

Remarkably, for the 14p-03_office item, Just Voice achieved results that were not only statistically significant at a 95% confidence level but also scored higher than the Reference (indicated in orange), effectively being judged as better than the Reference itself.👍

Just Voice MUSHRA noise reduction test result

3) Voice Clarity Evaluation Results

For those curious about the voice clarity experiment results, I've attached them below. Using the method described earlier, you can interpret the results directly.😉

Just Voice MUSHRA voice clarity test

Just Voice MUSHRA voice clarity test result

Concluding Thoughts

Today, we delved into the MUSHRA, a subjective audio quality evaluation method used to compare the performance of high-quality audio/systems. Evaluating subjective audio quality requires considerable thought and effort, from determining what to use as the Hidden Reference and Anchor, to ensuring the experiment runs smoothly.

Personally, I'm looking forward to the day when AI technology advances to the point where it can say, "This sound scores a 95, human. 🤖" with high precision.

If you're interested in learning more about the MUSHRA methodology or other subjective audio quality evaluation methods like ABX or MOS, please leave a query. I'd be happy to write more on this topic.🙂

SeparationJust Voice

GAUDIO STUDIO Sound Separation Tips - A Sound Engineer's Guide 🐝

Hello, this is Bright, a sound engineer from Gaudio Lab! These days, many fields are utilizing AI to increase productivity, and I'm sure you've come across AI tools at one point or another. But have you ever wondered how a sound engineer at an audio AI company utilizes AI? I remember back when I was young, I had to struggle through Google to practice MR production and mixing. I recall the difficulty of separating MRs myself or downloading multi-tracks shared as learning materials. The processes were cumbersome, and even the works completed with such effort, the quality were not so good. 😭 But now, with the era of AI, all that hardship has become a thing of the past! Especially with the commercialization of AI technologies for separating audio sources, many tasks in the audio industry have become much simpler. As a sound engineer, I think it's a great era where we can fully focus on creativity. Today, I'd like to introduce various tips for GAUDIO STUDIO, one of the tools I use the most. It boasts top-notch performance among various AI audio separation services, and by following along slowly, you too can become a top-notch sound engineer like me😎 🍯 Tip 1 - Create an MR Step 1 - Separating the vocals How do I remove vocals from the music in GAUDIO STUDIO? This is one of the most common questions I get asked, and I believe many people use GAUDIO STUDIO primarily for MR production for events like karaoke, celebrations, and more. Only 'Vocals' and 'Other Instruments' selected All instruments selected In GAUDIO STUDIO, you can separate the sound source by selecting the instruments you want (vocals, drums, bass, electric guitar, piano, and other instruments). So if you select vocals only and separate them, you can create an MR, right? The AI will take care of the rest, making it easy to create MRs with just a few simple clicks! Step 2 - Key up / down How do I customize the key of my MR? If you don't already have a music editing program, I recommend Audacity - it's free, has tons of hidden features, and I used it a lot during my student days. Now that you're all set up, let's try it step by step! First, click [File] → [Import] → [Audio] at the top to import the sound source, then double-click on the loaded file to select it entirely. Then, go to [Effect] → [Pitch & Tempo] → [Change Pitch] to adjust the key and that's it! You can also fine-tune it, so play around with it a few times to get it to the pitch you want. For those who followed along well so far, do you feel something strange? Or do you want to create a high-quality MR different from others? There's one thing we often overlook. Drums don't have pitches! Because of this, if you change the key with a drum track included, the pitch of the drumbeat also changes, affecting the overall quality. 😎 Now, here's a trick: try adjusting the key of the rest of the instruments, this time without the drum track, and then put them back together with the drums. That weird dissonance should be gone! Step 3 - Put it to use So what more can I do with this? After separating the MR and adjusting the keys, you can create content like this. Do you get the idea? You can create duets with singers who have different vocal keys! If you further process the separated voices with the Voice Conversion AI learning model, you can also create AI cover content, which is trending these days. Of course, the better the quality of the separated voices, the better the trained results, which is why I've heard that many people use GAUDIO STUDIO a lot. 👀 Aren't you curious about your favorite singer singing songs by other artists? 😎 There are endless possibilities for using GAUDIO STUDIO like this. 🍯 Tip 2 - Adjusting a specific track in an already recorded song This time, let me show you an example of how you can use GAUDIO STUDIO in situations you might encounter in your daily life. Situation 1 - You've just finished a really great ensemble, but the drums are just too loud! In such cases, if you separate only the drum track and adjust the volume, you'll be able to bring out the other instruments. Similarly, reducing excessively thumping beats in concert footage can highlight the artist's voice more. Even recordings that seemed impossible to separate or footage that seemed impossible to adjust specific sounds can now be excellently mixed and uploaded! Situation 2. I filmed a vlog in a cafe and it was recorded with copyrighted music! If you've ever filmed a outdoor vlog for YouTube and the music from a store is recorded along, it could be detected as a copyrighted element, which could limit your monetization. Perhaps until now, you've probably just turned down the volume or raised your voice, and if that didn't work, you might have ended up deleting all the sounds and recording narration separately. 😎 Now, you don't need to do that anymore. Just separate your voice and cleanly remove unwanted music. With just GAUDIO STUDIO, you no longer have to suffer from unexpected copyright issues! How did you find the endless applications of AI music separation that I introduced? I'm often amazed at how tasks that were difficult or required tremendous effort in the past are now so easily accomplished. Why not use the magic of GAUDIO STUDIO to create and enjoy your own unique content? GAUDIO STUDIO will continue to evolve until, in the not-too-distant future, all track stems will be neatly separated when you just insert a stereo file. We look forward to your continued interest and enjoyment!

2024.03.26

Behind the Scenes: Gaudio Lab's B2C App Debut, Just Voice Lite

🎙️ Interviewer’s note Hi there! I'm Harry, a marketing intern at Gaudio Lab. 😊 We've just released our first B2C app, Just Voice Lite, and our marketing team took a behind-the-scenes look at the team behind the app. We interviewed Howard, our PO, who came up with the idea for our first B2C service; Joey, an 8-year veteran developer; Jack, who juggles audio SDK and app development; and Steven, the team's trusted app developer. People involved in the development of Just Voice Lite We thought we could scale up,if we could broaden the product to a general user audience. Q. What was the reason for GaudioLab, which had been developing B2B audio solutions, to start developing B2C services? Howard (PO) : I didn't listen to the company. 🙂 I suggested B2C as soon as I joined the company. Just like waiting for the fruit to fall from the tree, with B2B, you have to wait for customers, right? I thought scaling up would be possible if we could broaden the product to target general users. Q. What were you hoping to accomplish with your first B2C app? Howard (PO): We weren't sure if we were going to be able to make a ton of money with this, so it was more like, 'Let's start lightly and for free for now.'. Joey (Dev): Just Voice Lite was more like an app to promote GaudioLab's technology rather than an app for revenue. The idea was to showcase our technology as a B2C product to attract B2B customers. Jack (Dev) : I think we can put a lot of our SDKs like spatial acoustics, EQ, loudness normalization, etc. into the app now, and it will be possible as we grow the app. Q. Were there any additional considerations in developing a B2C service compared to a B2B service? Howard (PO): When you're selling an SDK (Software Development Kit) to an enterprise, even if it's a little difficult to use, you can explain it in the user manual, but it's a different story when you're trying to convince a regular user. If there's any hassle or inconvenience, they'll just delete it. The moment they have to click one more time or change their experience, they'll stop using it. Joey (Dev): Since we have to use a virtual driver, we thought a lot about how to make sure that users can use the core features of the product smoothly without any obstacles. (Left) Previously developed app (Right) Newly released app Even with the same technology, it can be used differently depending on the user. Q. Has the development intention changed significantly from existing apps? Howard (PO) : I wanted to create an app that solves different pain points from existing noise reduction apps. So, we changed the main target from office workers who frequently participate in video conferences to fans who like artists. By slightly changing the shape to an app that boosts voices instead of an app that removes noise, I suggested targeting the content streaming market. For example, when watching a concert video, you can remove surrounding noise and focus on the artist's voice, or when watching a movie, you can make the actor's voice clearer if it's not audible. Joey (Dev): The same technology can be used differently depending on who's using it, and Howard was really good at spotting this. Jack (Dev): I liked this perspective. Originally, Just Voice Lite has a feature called GSEP (Gaudio Source Separation), which is 'Denoise.' But now, looking at the application form, it's 'Speech Enhancement.' It seemed impressive to make it look like it's made with different technology. Joey (Dev) : Yeah, I think Howard did a good job with that. If you explain audio-based technology to a general user, they wouldn't understand the need for it, but we targeted the product to B2C with flexibility, saying, "It makes artists' voices sound better. Howard (PO): I wrote about it in the blog, but for example, Joey is at Google Meet right now, and he's singing and playing guitar, and if you put noise reduction on it, you can't hear the guitar. When fans of artists watch content, it's not just the voice, it's also the background music, so we thought, "Why not boost voice? Q. What are the technologies behind Just Voice Lite? Jack (Dev): Just Voice Lite has an AI technology called GSEP that removes noise. It's SDK'd, and it's the noise removal technology that's been rated as the most effective in listening evaluations. And this noise separation algorithm runs in real-time. When we brought this technology to the Mac OS app, we put a lot of effort into making the usability of it a seamless experience. And Joey worked with us to make sure that the video sync works well when using Bluetooth on the desktop, so you can watch content seamlessly. In summary, I think Just Voice Lite's strengths are the technology behind the algorithm, the performance of the SDK, and the know-how to make the application seamless. The app finally passes review, and Henney hits the publish button. Apple is very selective when it comes to first-time app releases. Q. It took a while to get through the App Store review, didn't it? Joey (Dev) : Companies releasing their first apps usually face strict reviews from Apple. That's what they say in the industry. When I asked my friends at other companies, they said that if it's the first app, it's considered good if it passes on the 10th attempt. Some even said they get rejected up to 30 times... 🙂 Harry (Marketer): How long does it usually take to get a review response? Howard(PO) : It depends on the mobile review, but I think the desktop review was done within a day. Steven (Dev) : Because of the time difference with the US, we would submit, go to bed, wake up, and get rejected. 🙂 Q. What were the reasons for the rejections? Steven (Dev): The first rejection we got was that we shouldn't force the user to install the driver. Joey (Dev): We were given the guidance not to force driver installation with the phrase 'Do not expose driver installation on the main page view.' Different team members interpreted this differently... We kept testing while submitting reviews, and fortunately, Howard explained well in the comments to the reviewer, so we were able to expose the driver installation on the main page. He put it into words well. 🙂 Howard (PO): We should never go against their wishes. Steven (Dev): After that, we caught on to other small details. 'The user manual explanation is lacking,' 'It would have been nice to add marketing information,' 'The export function should be completed within 15 minutes.' Also, there were times when they rejected by saying, 'Why didn't you fix what I told you to fix?'... But Howard explained well to the reviewers, and we moved on. There were several back-and-forth communication processes with the reviewers. Q. How did you feel when your app passed the review? Steven (Dev): We were in the middle of a meeting thinking, "What if we get rejected again?" and then it just happened. Howard (PO): I screamed because I was so happy that it passed. Steven (Dev): I was like, 'Let's finish the meeting quickly.' Howard (PO): It seemed intentional. So that we would feel happy later. 🙂 It's always nice when they say no at the beginning and then they do it later. Q. Did the feedback you received during the app registration process actually help you? Howard (PO): I think it's been a good process, because from Apple's point of view, they're filtering out weird apps in the store because they can destroy the ecosystem. From our perspective, it wasn't bad because we were able to include file handling features and do QA thanks to it. 🎙️ We also interviewed PO Wan to talk about the SDK! Wan(PO): Hello, I'm Wan. Since this year, I've been in charge of the Product Owner role for the SDK product line. Just Voice SDK was released earlier this year and is being pushed as a flagship product by the SDK squad. Every SDK product says it’s easy to integrate, but after working with customers, we found that this one is really easy. Q. Can you give us a brief introduction to the Just Voice Lite SDK? Wan (PO): It's AI-based, but it can be run on-device, not on a server, like on phones or laptops. And it doesn't take long and can be processed in real-time. Our research head always emphasizes, 'Faster than the blink of an eye.' It's numerically 3/100 of a second, but if you actually hear it, you won't feel the speed at all. So, it's a solution that suppresses noise and only makes voices crisp in any environment with various noises. Q. What are some scenarios where the Just Voice Lite SDK can help? Wan (PO): Calls and video conferencing are the most basic scenarios that you can think of, and there are also service cases that are well used by companies that provide radio solutions for noisy industrial sites. Also, when agents are answering calls in call centers, customers often call in noisy environments, right? If you use Just Voice in such scenarios, you can hear your voice clearly. Q. What are the biggest advantages of the Just Voice Lite SDK? Wan (PO): Actually, many SDK products claim that integration is very easy, but when we actually worked with customers this year, we found that this one is really easy. We have prepared a well-prepared guide document, so I think you can try it in about 30 minutes. We have uploaded a trial version on our website, so you can apply it directly to your environment, app, or device. Just Voice Lite SDK can be installed on all laptops and phones. It can be applied to applications running on laptops or phones. We have also prepared a version that can run on low-spec devices, such as wireless earphones, much lower performance than smartphones, so now we can say 'it can run on most devices.' Also, we have many audio experts at GaudioLab, so we quickly consult with experts to provide the parts you need according to the situation. 🙂 Just Voice Lite on the App Store! Q. Finally, how did you feel about developing Gaudio Lab's first B2C app? Howard (PO): I think it was a good attempt. We actually did a lot of demos with the app on B2B alone. Saying, 'Try Denoise like this.' So, I think there is meaning in that alone, and more B2C customers will emerge. Please promote it a lot. 🙂 Joy (Dev): It was a good attempt, but I don't think Just Voice Lite is the kind of app that people install and say, 'This looks fun.' What I want to create is a product that anyone can use regardless of age or gender. If there are such ideas, I want to create an interesting product for the next project. Jack (Dev): Howard has been thinking about putting various sound effects into Just Voice. I think that's one of the points that will make it interesting as Joy mentioned. I don't know when it will happen, but I hope that day comes soon. Steven (Dev): Since we broke through with the first B2C app at GaudioLab, we gained know-how once. I think there will be fewer trial and error next time. 🎙️ In conclusion… Through the interview, we were able to glimpse the difficulties encountered and the process of overcoming them while developing the first B2C app. It was a valuable opportunity for me to indirectly experience the entire process of app development. Sincere thanks to the app team for participating in the interview. 🙂 Taking this experience of developing the first B2C app as a stepping stone, Gaudio Lab plans to launch various B2C services gradually. Please show interest in Just Voice Lite and the new services to be launched in the future!

2024.04.30