Get ahead of the curve with Apple's Vision Pro and spatial audio! [Part2]

2023.09.21 ・ by Dewey Yoon

Get ahead of the curve with Apple's Vision Pro and spatial audio! [Part2]

Did you enjoy reading the previous post?

Martin and Ben's story will be continued in the next post, let's learn a little more about the immersive experience that Apple's Vision Pro will bring.

Part 2 consists of the following overview:

▪︎The Significance of the Plausibility

▪︎Spatial Technology in Earbuds for Augmented Reality

▪︎Spatial Audio in Vision Pro

▪︎Audio Technologies in Vision Pro

▪︎Outro: Spatial Computing meets Spatial Audio

Please click here[link] to read the next part and discover more insights and perspectives of these audio enthusiasts! (The link will take you to Martin's page!)

Spatial AudioGSA(GAUDIO Spatial Audio)Sound QualityStreaming

Get ahead of the curve with Apple's Vision Pro and spatial audio! [Part1]

Intro: Get ahead of the curve with Apple's Vision Pro and spatial audio! Do you ever wish for a more complete sound experience? Have you heard of something called spatial audio, and wondered what the hype is all about? If so, then this blog post can answer your questions! Follow this conversation of two industry pioneers Ben Chon from Gaudio Lab and Martin Rieger - who have been hard at work pushing the boundaries of audio technology. What we are currently witnessing is an exciting new era for applications specifically designed to utilize spatial audio. You will learn how these developments promise to bring us much closer to having true 360° immersive sound experiences - those that make us feel truly present in our digital domain! With the recent announcement of Apple's Vision Pro, it's easy to get swept up in the hype. As we all eagerly awaited Apple's next big thing, the "immersive" experience promises seem to be more than just marketing talk. One of the most exciting features of the Vision Pro is the introduction of spatial audio, which has been hailed as a game-changer for the world of VR and AR. It's clear that the arrival of Apple's Vision Pro is going to be a massive step forward for immersive technology, and also the stock price has reached new highs. Prepare yourself for a journey into the world of 3D Audio; buckle up because this is where it gets really interesting. Spatial Audio spreading over industries Martin: Hi, I am an enthusiastic and knowledgeable 3D audio expert named Martin Rieger. I specialize in immersive audio productions and am proud to be one of the few experts who work full-time in recording, post-production, and consulting. My studio, VRTonung, and dedicated blog showcase the incredible possibilities of spatial computing technology when used with creativity. Being an active member of the audio engineering society, I attended AES Dublin years ago where I had the pleasure of meeting Ben from Gaudio Lab. Since then, we have stayed in touch, discussing the latest developments in spatial audio technologies. So we thought it'd be fun to come back together and discuss the spatial computing revolution. 360 degrees videos are underrated I am truly passionate about my field and always strive to create something unique that wouldn't have been possible in stereo. While I appreciate films and music in formats like Dolby Atmos or Sony 360 Reality, I think they don't add too much value for spatial technology. So I'm also a big critic of the whole trend of making everything spatial audio for the sake of it. But more on this where I see the true spatial audio potential later. My personal favorites are still 360 videos, which I believe are still quite underrated and haven't been utilized to their full potential mostly. In these immersive videos, sound plays a crucial role in guiding users' attention by utilizing head-tracking. Additionally, combining the sound with head-locked stereo elements such as narrators or background music enhances the immersive experience and contributes to a captivating 360 soundtrack. By incorporating multichannel recordings like Ambisonics, we unlock what I'd consider a truly immersive audio experience. Elements for a truly immersive audio experience This is where I appreciated the workflow by Gaudio Works, having all of these three elements: an objects-based approach to pan mono sounds in space, an ambisonics bed for recordings of reverbs and a head-locked stereo track. This non-spatial track, which plays independently from where people are looking to, is missing in popular formats such as AC-4, making it useless for podcasts of radio dramas in my opinion. With Gaudio Lab's works, I created the mix for a German premium furniture client. Janua's 360° VR experience, takes showcasing furniture to an entirely new level! The VR project gives us an inside look at the stories behind our creations, giving viewers an emotional connection to furniture. “I really liked mixing with Gaudio Works since it just felt like the developers put a lot of thought into the software. I’m not aware of any spatializer plugin for DAWs that allows rotating the equirectangular view or a timbre correction. I could even mix remotely on my laptop and transported easily to the Oculus Go that we were using back then” Ben: Hi, I'm Ben from Gaudio Lab. At Gaudio Lab, I've led research and development efforts for spatial audio product families, including Works—an immersive audio post-production tool for 360 videos, Craft—a 6-DOF (Degrees of Freedom) immersive audio game engine plugin for Unity and Unreal, and GSA (Gaudio Spatial Audio)—a headtracking spatial audio rendering SDK for TWS. Gaudio Works, an immersive audio post-production tool tailored for enhancing 360 videos I'm thrilled that you've enjoyed using Gaudio Works for mixing. We designed and developed Works based on three key principles. First, it's crucial to support a wide range of technologies to enable sound engineers to create the most exceptional sound experiences ever available in 360 videos. Through collaborations with mainstream Hollywood studios and small studios dedicated to 360 videos, we've learned about production-side needs. Supporting object, channel, ambisonics, and non-diegetic signals is one of the valuable lessons we've learned. Second, the tools should provide binaural rendering that sounds as natural as possible or, at the very least, offer a means to control the balance between immersiveness and naturalness. While numerous technologies with binaural rendering filters exist, we often hear artifacts from binaural filtering, especially when the filter has a long reverberant time. We've strived to design the binaural filter to have timbral changes as minimal as possible and have also incorporated a feature called "binaural strength." This empowers sound engineers to control the filter intensity for each sound element. Third, ease of use is essential for the tool. Works adopted spherical coordinates (r, θ, ϕ) instead of rectangular coordinates (x, y, z) and provided both equirectangular and head-mounted display (HMD) views. Many pioneers in the VR scene have embraced Gaudio Works, leading Gaudio to win the "Innovative VR Company of the Year" at the AMD VR Awards in 2017. The Winner of VR Awards 2017, Gaudio Lab What even is spatial audio, AR “Headtracking” Martin: Spatial audio is a fascinating technology that has the power to completely transform our listening experiences. However, it's frustrating when discussions about spatial audio only focus on its role in films and music, as if these are the only areas where it's relevant with formats such as Dolby Atmos. The truth is, spatial audio has the potential to revolutionize many areas of our lives, particularly in the realm of spatial computing, where it can push the boundaries of what's possible. Understanding how spatial audio needs to be approached differently in 0DoF, 3DoF, and 6DoF is crucial for anyone interested in this exciting technology. DoF stands for degrees of freedom and here is a little overview to give you ideas on how surround sound can be used here: 0DoF: Movies, Music, and Podcasts applications. You are not supposed to move your head and just watch the front. This is where most of the sound waves are coming from 3DoF: 360 Videos and AR Headphones. You can enable head movement and the soundfield around you adapts in real time. The directional audio filters help you to localize sound 6DoF: Extended Reality (Virtual Reality, Augmented Reality), Spatial Computing. You are in a 3D digital world and can even move toward sound. Similar to gaming applications So let's dive in and explore the future of spatial audio with multiple degrees of freedom beyond the confines of traditional media. Immersive audio isn't necessarily "immersive" As we navigate through the world of audio, we often hear chatter about the elusive and wondrous "immersive audio" experience. While it certainly has its place, we must not forget that stereo and even mono can be just as impactful when accompanied by engaging content. The key is to find that sweet spot where sound and visuals align in a perfect union. This is where spatial audio comes in for spatial computing. When executed correctly, it can transport us to another world entirely, regardless of the number of channels involved. Many game developers believe that incorporating audio into their game is as simple as checking off a 3D audio box and calling it a day. However, truly immersive audio requires careful consideration and utilization of spatial audio technology. By neglecting this crucial aspect of sound design, developers are missing out on the potential impact that audio can have on a player's overall gaming experience. Such aspects for spatial audio support include: did you match the frequency curve of sounds you randomly took from SFX library before implementing them into your game engine? did you add context to the sound effect like background a surround sound or reverberation to make your audio object blend into your virtual world? did you sort the layers of your soundtrack like voice, music, and sound effects, and distinguish between diegetic and non-diegetic elements? Head tracking is great when done right For audiophiles like me, the experience of immersive soundscapes is nothing short of a thrill ride. With platforms like YouTube360 or Facebook360, it's easy to achieve that through Ambisonics audio. But anything below second-order ambisonics may disappoint. Since the spatial resolution could be too diffuse on such social media platforms. So if you are like me, you're yearning for a more object-based approach already common in the Apple Music app to support spatial audio. That is why I can't help but wish for something more advanced - something that can capture the nuances of sounds more faithfully. If you've been following the latest advancements in technology, you might have heard of the integration of head-tracking into AR headphones - namely, Airpods. As a professional in the field of audio and video production, I've adapted my knowledge within 360 videos to this new form of AR audio. It's an exciting time for the industry, with Apple leading the charge and Android catching up quickly with products like the Galaxy Buds Pro2 or Pixel Buds Pro. Why Dolby Atmos tracks are a danger for immersive podcasts Despite all the hype surrounding Dolby Atmos technology, there is one major flaw that could be a major danger for immersive podcasts – the lack of head-locked audio. The format is designed to be listened to facing forward, just like a cinema screen or TV, leaving no room for head-tracking. This means that any audio mixing done in this format will ultimately lack the crucial element for creating truly spatial audio. Incorporating non-diegetic audio such as background music or most importantly a narrator voice-over. It's a pity that so many podcasts and even Audible are jumping on the Dolby Atmos bandwagon without considering the limitations of the technology, especially since it touts itself as being future-proof. This is exactly why I dig Apple's spatial audio approach with its RealityKitpro and the three elements of an immersive soundtrack as listed above. Story will be continued to the next post

2023.09.21

Patently Gaudio #1 - Phase-matched Binaural Rendering

The Importance of Uncompromised Sound Quality:Gaudio Lab’s Solution to Sound Quality Distortion in Spatial Audio At Gaudio Lab, we tackle the world’s audio challenges with software products that incorporate innovative audio source technology and make people's lives better. The term ‘source technology’ implies that Gaudio Lab is a hub of originality. One method we use to share our innovations with the world is through patents. Many of our core technologies are protected this way, and we hold a substantial number of patents, especially considering our company’s size and years in operation. However, sometimes we may strategically decide not to patent certain technologies, despite their originality. A patent is a system granting inventors exclusive rights for a certain period (typically 20 years), in exchange for making their technological advancements public. After this period, the technology becomes openly available, making it easy for competitors to imitate. The responsibility to prove any unauthorized use or infringement of the technology mainly falls on the patent holder. In cases where proving infringement is challenging, it might be more advantageous for us not to disclose the technology. This approach may seem at odds with the advancement of humanity, but for a company focused on profit, freely sharing extensive research and development is a tough decision. Nonetheless, we at Gaudio Lab have filed over 100 patents since we started, a noteworthy achievement even for a company driven by technology. You can find a list of our publicly disclosed patents here: https://www.gaudiolab.com/company/patents They can also be explored in detail through a simple Google search. However, for those not in the field, understanding these patent documents can be quite difficult. Therefore, we have decided to make our patents more approachable. We plan to explain them in simpler terms, focusing on (1) the problems we aimed to solve, (2) the main ideas of our inventions, and (3) the benefits these inventions bring. After careful consideration, Gaudio Lab decided that the first topic to discuss from our extensive portfolio of over 100 patents would be on Spatial Audio. This is a field where we consider ourselves to be the original experts. More specifically, we focused on the crucial technology of Binaural Rendering in headphones (earbuds). Despite narrowing down our focus, there were still over 50 patents to choose from. It’s important to note that all our patents are significant (if they weren’t, we wouldn’t have invested the time and expense in filing them). Each patent is highly valued, much like a cherished child, filled with the inventors’ hard work and dedication. While this series will eventually cover all of our patents, the first episode is always critical. The first patent we chose to discuss is US 10,609,504 B2 (Audio signal processing method and apparatus for binaural rendering using phase response characteristics). To help make understanding patents easier, let’s start with how to read the unique number of this patent. ‘US’ indicates that it is a United States patent. Other countries use two-letter codes like KR for Korea, CN for China, and JP for Japan. The ‘B2’ shows that this patent has been ‘granted’, meaning it has been reviewed and recognized by the United States Patent and Trademark Office. Patents that are still under review and not yet granted are marked with symbols like A1/A2. The number ‘10,609,504’ is the serial number assigned by the US Patent Office, suggesting it’s approximately the 10,609,504th patent since the patent system was established in the United States. Back in Edison’s era, patents were numbered in the tens to hundreds of thousands. The text in the brackets is the title of the invention. Sometimes this title clearly describes the invention, while other times it can be vague or less helpful, and this is often intentional. The title does not limit the rights of the patent, allowing some freedom in its wording. This patent also has been filed and registered in KR, CN, and JP, with the following equivalent patents: KR 10-2149214 (Audio signal processing method and apparatus for binaural rendering using phase response characteristics) CN 110035376B (使用相位响应特征来双耳渲染的音频信号处理方法和装置) JP 6790052 B2 (位相応答特性を利用するバイノーラルレンダリングのためのオーディオ信号処理方法及び装置) The full texts of these patents are available at the following link. For US patents, visit: https://patents.google.com/patent/US10609504B2/en?oq=US10609504B2 These patents are collectively known as family patents. As patent laws vary by country, securing global protection for an invention requires separate applications and registrations with each country’s patent office. The recognition of a technology and the extent of rights it receives can differ from one country to another, and even among examiners within a patent office. As a result, the same technology might have a different scope of rights in various countries, and in some instances, it might not get registered at all. Now, let’s look more closely at what the patent entails. [The Problem Addressed by the Invention] Binaural rendering, a method for producing spatial sound through headphones, involves adding a filter known as HRTF (Head-Related Transfer Function) to the audio signal. For a detailed overview of the binaural rendering technology, please see this link. However, often it’s necessary to apply not just one, but several filters simultaneously. An HRTF is a filter linked to a specific point in space. For example, if the sound of a sparrow corresponds to one point, then the roar of an elephant might be associated with a broader area, requiring multiple HRTFs to accurately depict this sound. Similarly, if a sound reflects off a wall and reaches our ears, it would need different HRTFs for the original and reflected sounds, as they come from different directions. Furthermore, if the impact of the HRTF is too strong (which might mean a significant change in sound quality) and needs to be softened, the process of softening also requires a filter, leading to the use of multiple filters. In real-world applications of binaural rendering, there are many situations where applying multiple filters to a single sound source becomes necessary. This is not exclusive to binaural rendering; in general, adding effects to audio often involves the complex use of multiple filters. What issue arises when multiple filters are superimposed? Different time delays (Delay) for each filter, as the input audio signal passes through, can unintentionally distort sound quality, known as the 'Comb Filter' Effect. This term comes from the pattern that appears on the frequency spectrum, resembling the teeth of a comb, where certain frequencies are significantly amplified and others are notably reduced, altering the original sound dramatically. It typically occurs when signals pass through two filters with disparate delays. Fig.1: Example of the comb filter effect(image source: http://www.sengpielaudio.com/calculator-combfilter.htm) Fig.2: An example of the comb filter effect – Periodic distortions that mimic the appearance of a comb’s teeth emerge in the frequency response (Referenced from Fig. 24 of patent US 10,609,504 B2) It would seem that if we could just align the delays of the multiple filters we plan to use in parallel, there would be no issue. And that is a valid point. However, there’s an added complexity when it comes to HRTFs. A HRTF is a filter that’s obtained by measuring sounds played from various directions using microphones placed in human ears or a mannequin. Despite precise measurements, slight variations in delays between filters are unavoidable due to measurement errors that can arise from many factors. These variations cause slight differences in delay at each frequency of the filter response, leading to the unintentional comb filter effect when multiple HRTFs are used in parallel, which can degrade the audio quality. This patent was developed to overcome this specific problem. [The Fundamental Concept of the Invention (Subject Matter)] Identifying the problem clearly can sometimes lead to a surprisingly straightforward solution. This is the situation with the problem this invention addresses. Since the central issue is the varying delays across each frequency of each HRTF filter, the core idea, or the Subject Matter, of this invention is to “align them uniformly.” Up to this point, we have used the term delay for clarity, but in signal processing (Signal Processing) terminology, this becomes a value known as phase (Phase) on the frequency axis. To align this phase response linearly (Linear) means setting a consistent delay, and by ensuring this fixed delay is the same for every filter, we can eliminate the comb filter distortion. Below is a graphic representation showing the phase response of the original HRTF before and after this linearization process. Fig.3: Fig. 4 from U.S. Patent US 10,609,504 B2 – Demonstrates the phase response of the original HRTF alongside the linearized phase response Integrating the concept of linearization into HRTFs involves a significant issue. HRTFs consist of filter pairs that represent the acoustic paths from the sound’s origin to both ears. The spatial effect of the sound depends on the relative relationship of these filter pairs, which includes their phase responses. If we linearize each filter within a pair without considering this relationship, we would alter their inter-aural phase difference (IPD), a key element in spatial perception. Such an alteration could lead to a loss of the spatial effect, which is central to HRTF functionality. To address this, the invention suggests linearizing only one filter of each pair and adjusting the phase of the other to maintain the IPD. The filters in an HRTF pair are distinguished as ipsilateral HRTF for the nearer ear and contralateral HRTF for the farther ear. Since the ipsilateral HRTF captures more energy (sound is perceived as louder by the nearer ear), the method involves linearizing the phase response of the ipsilateral HRTF. [Impact of the Invention] This method allows the layering of any number of HRTFs without the issue of comb filter distortion. It significantly reduces one of the most prevalent challenges in spatial audio production: preserving the integrity of the original sound quality. Spatial audio is a technology that simulates the effect of sounds as if they are occurring in a real space, making it essential for applications like gaming, films, virtual/augmented reality, and Spatial Computing. Imagine creating the illusion that you’re in your room, yet feeling like you’re at a Taylor Swift concert in Carnegie Hall – this is often described as the “Being There” Experience. However, the application of HRTFs inherently alters the original sound, as it involves applying a transformative filter, leading to distortion as we’ve discussed. Thus, the quality of spatial audio technology hinges on its ability to minimize such distortion while delivering a lifelike auditory experience. The technology from this invention is expected to be increasingly important in this respect. We have made an effort to thoroughly explain a core component of Gaudio’s spatial audio technology. We hope that you have gained some understanding of our commitment to crafting high-quality spatial audio experiences. Gaudio Lab is excited to continue showcasing our dedication to exceptional sound experiences through our patents, and we invite you to stay engaged with our future updates.

2023.12.07