Sound matters: The role of audio quality in video conferencing
What we know about hybrid work is this: the challenges evolve as the digital workplace does. And for workers who are constantly video conferencing, we know that meeting fatigue is at an all-time high.
But there’s more to that fatigue than being on camera. Like the metaphorical iceberg whose tip belies a massive structure beneath, video conferencing fatigue is really just one part of the overall fatigue we feel as hybrid and remote workers.
What makes up this iceberg of anxiety? For a lot of people, it’s all about sound.
- What is sound quality?
- Why is sound quality important for video conferencing?
- What is the best sound quality for video conferencing?
- The role of hardware on sound quality
- What devices are best for sound quality and video conferencing?
- What software features improve sound in video calls?
- Tips for how to improve video conferencing sound quality
What is sound quality?
Can you remember a time when sound placed you completely under its spell? A long drive on a quiet highway with the stereo full blast. Face to face with the roar of the ocean on a cold winter’s beachhead. That live show in a cramped bar where the sound was at 11/10, but nobody cared. The moment bellowing horns announced a certain movie franchise’s epic intro text, scrolling down through a galaxy far, far away.
We experience sound subjectively, in ways that can soothe and, sometimes, irritate. One person’s relaxing ambient music is another person’s veto on the work playlist. Some of us need a background podcast for mundane tasks, others can’t follow the pod if we’re deep into focus work.
But all these examples assume one thing: crystal clear, seamless audio. And in the digital age, where sound compression and streaming are incredibly sophisticated, this audio experience is no luxury—it’s a baseline for what sound should be.
So, what distinguishes this lush sound experience we all expect, if not crave? Let’s take a moment to cover some basics.
Sound travels in waves, caused by vibrations which vary in frequency. These frequencies are measured in Hertz (Hz), and we perceive them in terms of pitch. Human speech can range from 80 Hz – 14 kHz (kilohertz). Our ears hear pitches ranging from 20 Hz – 20 kHz. Lower frequencies mean lower pitches, like an internal combustion engine firing up, a funky bass line, or a baritone singer. Higher frequency waves mean higher-pitched sound—think of fork tines tapping glass or a whistled tune.
However, when we transmit sound, things get a little more complicated. It starts with a radio transmitter, which generates an electrical signal containing audio information. Next, an antenna amplifies the signal, which is carried by radio waves to a radio receiver. After that, the receiver extracts the information and sends it to a device (speaker, display screen, etc.).
In the early 20th century, these components were separate and massive, connecting city skyscrapers with distant neighborhoods—a miraculous feat that brought the world together by providing a shared experience.
Fast forward to today? Every smartphone contains this technology. Sound is broken down, digitized, and transmitted in real-time, across the planet, from any device that connects to the internet.
In terms of video conferencing, most participant audio is now transmitted through VoIP (Voice over Internet Protocol). Basically, your audio is sent over the internet rather than through a cellular network. Audio quality for VoIP video conferencing is more dependent on an individual’s internet speed than, say, cell tower proximity is for a traditional phone call.
Sound quality depends on many variables, but these 4 aspects are key:
- Sample rate. The number of digital samples taken per second from the original analog audio. Typically, a higher sample rate means higher quality audio, expressed in kHz (we often see 8 or 16 kHz for standard telephony, and 44.1 kHz for streaming audio).
- Bitrate. This refers to the amount of data a digital audio file contains. Bitrate is measured in kilobits per second (kbps). Like sample rate, a higher bitrate often indicates better audio quality.
- Audio codecs. Algorithms that compress and decompress digital audio. For decades, the G.711 narrowband codec (feel free to read that in C-3P0’s voice) was the standard for telephony. But we’ve now entered an era where HD codecs like G.722 (same) and others meet wideband standards and provide higher quality audio.
- Bandwidth. At the end of the day, your bandwidth is maybe the most critical piece for audio quality in VoIP calls and video meetings. Most platforms will default to a narrowband audio codec if your upload speed is low. With a faster internet speed, wideband and full band codecs are available, which provide HD audio.
Take a moment to consider about how a video conference can complicate these variables. Even a handful of participants—some using a cell network via their smartphone, some using laptops or other devices, all with varying internet speeds and providers—adds layers of potential audio problems.
Why is sound quality important for video conferencing?
We’re nearing two years since COVID-19 suddenly changed our world and the paradigm of work. It’s important to look at how the global shift to hybrid and remote work has affected workers, given how often we speak to and collaborate with one another over video.
As more research and analysis comes out, we see that exhaustion with video conferencing is on the rise. We know that nearly half of workers report feelings of isolation when working remotely, and that 61% stated video meeting fatigue has increased. Perhaps even more concerning: 90% of survey respondents experience collaboration issues when working from home.
When it comes to sound, the potential problems are easy to identify:
- Stretched bandwidth can make your audio quality suffer. Think of that instant anxiety when colleagues say you’re cutting out!
- Reverberant sound can also bring meetings to a halt and makes the audio experience grating for everyone.
- Crosstalk presents a challenge for those who are more reticent to speak, a glaring issue for companies focused on creating inclusive experiences.
- Constant, unaddressed background noise can stop a speaker in their tracks, distract the listener, and completely undermine the meeting.
Over time, these issues can snowball into larger-scale, long-term anxieties about virtual collaboration. If we continually experience problems with audio performance, the less we’ll want to collaborate.
That’s really the key here, and something we take for granted. Sound is a primal, core aspect of our daily experience, whether we’re collaborating or simply perceiving the world. Research shows that sound in certain contexts can be a great reliever of stress. Conversely, studies have also revealed that sound can cause anxiety and even depression.
In The Design of Everyday Things, Don Norman notes the dual nature of sound in the context of product design, specifically as a signifier for users:
“Sound is tricky. It can annoy and distract as easily as it can aid. One of the virtues of sounds is that they can be detected even when attention is applied elsewhere. But this virtue is also a deficit, for sounds are often intrusive.”
So, how do we get started in overcoming audio anxiety, and what exactly goes into better sound quality when we video conference?
What is the best sound quality for video conferencing?
As we’ve noted in this piece, bandwidth, compression, and codecs are crucial to audio quality. So, let’s dive a bit deeper into the difference between wide band (HD) vs. narrowband audio.
Narrowband audio uses an Adaptive Multi-Rate (AMR) speech codec. Essentially, AMR codecs utilize a limited frequency range of sound when compressing and transmitting over a live stream (200 Hz to 3.4 kHz). The AMR codec also features a variable bitrate that changes based on bandwidth (about 5-12 kbps). In cases where your sound quality is poor, it’s likely due to associated issues with low bandwidth—the AMR codec is moving to a lower bitrate to accommodate.
As high-speed internet becomes more accessible, higher quality sound has taken center stage: wideband audio, a high-definition format designed specifically for VoIP.
Wideband uses Adaptive Multi-Rate Wideband (AMR-WB) speech codecs that provide a wider frequency range (50 Hz – 7 kHz). This means higher and lower pitched sounds are picked up and transmitted, providing a much richer, more robust sound quality.
As we mentioned earlier, lower internet speeds tend to make video conferencing solutions default to a narrowband audio codec. Faster speeds open up the ability to use wideband (HD) codecs. But that sounds a bit undemocratic, right? Should internet speed really dictate inclusivity to that degree?
This is why Webex utilizes Opus (a more versatile, scalable audio codec) to maintain an inclusive audio experience for every participant.
Opus can provide great audio quality even at lower bit rates. But it can also flex its muscles for wideband and full band audio, which covers more of the sound spectrum than humans can perceive (20 Hz – 20 kHz).
We should stop to spotlight how the flexibility of Opus recently helped solve a people-centric collaboration challenge.
Opus’ ability to provide crystal clear audio across the sound frequency spectrum makes features like Webex music mode possible. In this audio mode, sound is optimized for music instead of human speech, preserving the original sound much more clearly.
The Indianapolis Children’s Choir (ICC) was sidelined and unable to practice together for months due to the pandemic. They decided to use music mode to ramp up choir practice. And they also provided feedback to help Webex improve the feature even more.
Check out this video to see how music mode empowered the ICC to return and pursue their passion in the face of unprecedented obstacles:
With so many moving, overlapping pieces affecting the sound of your video conference, it’s important to consider other potential challenges.
Let’s explore how hardware can transform how you hear, and what your colleagues hear, during video conferencing.
The role of hardware on sound quality
On a very basic level, the acoustic signal your mic picks up is kind of everything. This is the first touchpoint before digitization, compression, and decompression. A simple computer mic, an external mic, a device with a microphone array—these can all affect specific audio issues like reverberant or tinny sound.
Depending on your own workstyle and workspace, different types of devices can dramatically optimize your audio experience, both what you hear and how you’re heard. I spoke with our Acoustic Engineer, Patrick Achtelik, about Webex hardware and the advanced audio technology that hones in on the speaker’s voice while simultaneously reducing unwanted noise.
“Beamforming essentially uses several microphones that are omni-directional,” Patrick explained. “The microphone itself picks up sound equally from all directions.”
However, as you place more omni-directional mics together, you can make them more directive. As a result, the mics are more effective for more frequencies of sound. As Patrick notes:
“To get directivity over a larger frequency range, you need more microphones. In the Desk Pro, for example, on the left of the bezel there are 6 beamforming microphones spread at different distances, but not spaced equally. This allows the microphones to work at different frequencies and in different frequency bands.”
This alignment also means that sounds above and below the device aren’t picked up, while sounds in front of the microphone array—like your voice—are focused on and optimized.
But that’s only one piece of the puzzle. Patrick described an important marriage between software and hardware that greatly impacts how speakers and listeners avoid pitfalls like echo, which can sometimes feel out of our control:
“Acoustic echo cancellation (AEC) needs to work perfectly well for Webex full duplex to function. The microphone on one person’s side picks up sound from the loudspeaker as well. Without AEC, you would hear yourself echoing back.”
AEC functionality is key to video conferencing. When we think of full duplex, a technology that allows for multiple speakers to talk at once, we think of something that should work all the time, but many platforms without features that account for echo and reverberation get tripped up. Patrick made it clear how much distance matters:
“Hearing an echo can start with distortion from the loudspeaker. If you turn up the volume on tiny laptop speakers, they distort quite quickly. Physical distance can reduce the amount of sound going from speaker to mic, but it can also place the mic closer to the user. Which makes your voice clearer!”
Take a moment to see the relationship between microphones and loudspeakers in Patrick’s Focus on Sound vlog here:
What devices are best for sound quality and video conferencing?
For remote and hybrid workers, upgrading the headset can be a great first step to audio enhancement. Why? Patrick explained it like this:
“Built-in laptop microphones are relatively far from the user and close to the laptop’s speakers. For the most part, your voice may have a faraway quality and the AEC may struggle because of proximity to the speakers. A headset breaks the acoustic connection between loudspeaker and mic, as the sound on the headphones doesn’t reach the headset mic.”
The Cisco 730 headset, a Red Dot winner, helps crystallize video conference sound. The design is boomless for a more natural speaking experience (no more mic boom in front of your mouth). It features beamforming technology, with 4 mics arrayed in the headset to form a sort of audio bubble that focuses on your voice. This headset can move from adaptive noise cancellation, which automatically adjusts to noisy environments, to ambient mode, so you can hear conversations in a shared workspace if you’re feeling a more collaborative vibe.
The newest Cisco headset, designed in partnership with industry leader Bang & Olufson, provides even more audio features. The microphones (6) are thoughtfully placed in a geometric design to better isolate your voice while utilizing advanced algorithms to cancel out background noise.
Even a simple move from laptop mic to a headset will transform the way you experience meetings. But when you’re considering an overhaul to your video conferencing experience, collaboration devices like the new Webex Desk Mini may be the answer. Replete with the intelligent microphone array technology and focused sound pickup we’ve discussed, this device also provides HD video and allows you to co-create in real time with digital whiteboarding.
In essence, we can think of hardware as the engine of our audio experience, the motor that drives what we hear and how we’re heard. If that’s the case, we can think of software as the fuel that ignites the engine and powers its performance.
What software features improve sound in video calls?
We’ve all come to expect it, and many of us dread it: the pall of background noise. Since the world moved to hybrid work, it’s become one of the toughest challenges workers have had to grapple with.
But it shouldn’t be surprising that background noise causes stress. The video conference setting is a microcosm of a pain point felt around the world. Research shows that noise annoyance in general is real and harmful. And it’s crucial to understand that anxiety is caused specifically by unwanted sound. When a dog barks at a package delivery. When a child interrupts as you’re listening to important project details, or a blender or vacuum cleaner starts up just as you’re ready to chime in.
We want a focused work experience in an environment that’s often anything but. To get that experience, we need technology to combat the audio challenges that are sometimes out of our control. And technology is up to the task.
In 2020, Cisco acquired BabbleLabs, a leader in noise removal software. Using AI and machine learning, they’ve enhanced noise removal in the Webex tool and brought a stunning, game-changing technology to the forefront.
Machine learning encapsulates many granular, complex processes. Countless hours of training data are used to allow machine learning algorithms to differentiate human speech from sound. When deployed, specific noises are identified and removed before they’re transmitted and heard. This also takes plenty of human ingenuity, specifically deducing which noises are most likely to interrupt and distract people working from home.
To gain a better understanding of how AI has transformed the Webex experience from an audio perspective, I spoke with another expert: Keith Griffin, Distinguished Engineer for AI and Machine Learning in our office of CTO.
“When you’re asking people to repeat themselves or if you’re in a noisy environment, you feel bad.” This was a source of anxiety in years prior to the pandemic, Keith explained. “In the past, people wouldn’t join because they weren’t confident in the environment they were in.”
But the features deployed by Webex have sought to address these challenges, which have been intensified by the pandemic and the shift to hybrid work. One prime example is the optimize for my voice feature, which incorporates the fundamental concept of distance that Patrick emphasized.
“I’m amazed to see what our Machine Learning/AI teams have been able to achieve,” Keith said. “Not just with noise removal but in how they are evolving the technology to solve other use cases such as optimize for my voice. What optimize for my voice does is determine the active speaker based on a number of parameters. It picks up the primary speaker and any other human voice that’s detected is just filtered out.”
When it comes to noise removal, Keith has noted some of the more intriguing specifics of machine learning. To cover as many bases as possible, Webex software can identify and remove sounds like keyboard tapping, sirens (the software is capable of recognizing siren sounds from different countries), garden machinery, and dogs barking. In fact, our original noise detector design could identify over 100 different breeds of dog by their distinctive bark.
As Keith described, noise removal is about more than just removing background noise. It’s so effective that it allows for more inclusive, flexible collaboration. And that means teams engage with confidence during the meeting experience.
“My site leadership meeting for Cisco Galway has up to 14 people at different times. Today, there were 12 on the call. Three were in the car coming back from dropping their kids to school. Four of the team members were walking their dogs.”
This anecdote is *super* important in terms of how we think about audio quality, audio anxiety, video conferencing fatigue, and how each affects team collaboration. Keith said:
“There are types of meetings where people should be able to carry on in their everyday life and feel confident joining a meeting no matter the environment. From the car or on a walk, there may be dogs barking and cars going by, but they know all we’re going to hear is their voice. It’s exactly what helps with hybrid work and the quality of audio.”
The amount of work put in to deploy Webex Audio Intelligence—which encompasses noise removal, optimize for my voice, and more—has been massive. The results? To date, Webex has removed 16 billion minutes of background noise from our users’ video conferences.
These innovations in the world of audio translate to real, tangible benefits for workers and organizations. It’s why Aragon Research has again identified Webex as a leader in video conferencing software.
Now that we’ve explored what makes for better sound quality—leading-edge hardware, advanced software, and powerful AI—it’s time to give you some actionable tips to improve your audio experience.
Tips for how to improve video conferencing sound quality
Take stock of the spaces where you most frequently take video meetings. How likely are unwanted noises to interrupt at a given time of day? How do you typically sound to your colleagues? Webex makes it easy to test your mic beforehand.
Dip your toe into the basics of room acoustics so you can be more comfortable solving everyday sound issues. Our friend Patrick can get you started:
Don’t use your computer microphone if you don’t have to! Whether you prefer basic headphones, an external mic or the superior quality of a Cisco headset, moving away from your computer mic is the quickest way to alleviate audio anxiety and improve sound quality.
When you have time for focused work, use the positive nature of sound. Turn your concentration playlist up and put your Cisco headset on. Or turn music mode on in your Webex meeting if you and your team want to do some focus work in real-time together.
Test out noise removal with your teammates to get a sense of what they can’t hear. Often, we’ll hear our dog barking and apologize. Our colleagues will say, for what? As you understand just how much noise is removed, you’ll start to be less distracted because you’ll know that nobody is hearing it but you.
Help your team conquer audio anxiety and meeting fatigue by exploring Webex audio and the products that make it superior.
Dec 2, 2022 — Lorrissa Horton
Dec 2, 2022 — Emily Brooks