With COVID-19 driving work from home scenarios now and hybrid work environments into the foreseeable future, organizations are more dependent than ever on video conferencing with excellent audio quality to foster team collaboration. ai speech
However, even 145 years after the invention of the telephone, we face hurdles with speech quality and understanding amid disruptive background noise. Using conference calling and video conferencing technology only makes the problem worse, as we now contend with noise from multiple participants in multiple environments. If you have 12 people with barking dogs, yelling kids and loud traffic, you can bet it will inhibit effective team collaboration.
Fortunately, recent developments in deep learning neural network algorithms have dramatically improved the ability to enhance speech by separating noise in communications. But when it comes to integrating speech enhancement solutions into video conferencing technology, all algorithms aren’t created equal.
Audio engineers have made slow but steady progress over the decades to develop noise reduction algorithms focused on fixing the easiest part of the noise problem. They reduced certain background noises like the low, steady hum from fans and air conditioners – and ate away at the speech signal in the process.
It wasn’t until the rise of deep learning methods over the past few years that speech scientists gained a potent new tool to handle noise in live speech. Today, speech clarity and noise reduction are not just incrementally better than prior methods, they are dramatically better.
Neural networks offer a powerful new methodology for scrubbing, deconstructing, manipulating, resynthesizing and analyzing real-time speech streams. A small number of organizations are harnessing this power to offer credible noise reduction and speech enhancement in video conferencing technology applications. But most only focus on obvious impairments like background noise.
There are other speech challenges that reduce comprehension and promote cognitive load, or audio overstimulation. These include reverberation, network latency that leads to discontinuities in speech streams, and bandwidth compression issues that remove higher sibilant sounds like “ess,” and “zee.” Fixing these corrupted sections of speech is crucial in achieving effective team collaboration over video conferencing technology.
Noise reduction and a whole lot more are now part of Webex Meetings starting with the October update. With the acquisition of BabbleLabs, Cisco collaboration solutions have integrated best-in-class deep learning and speech science software that addresses the full range of video conferencing technology audio challenges. Our algorithms go beyond just eliminating noise that will prevent a participant’s dog from having a say in the conference call. They produce a clear speech experience with less feedback that improves people’s ability to listen, understand and perceive nuances.
Webex speech enhancement, also available with the October update of Webex Meetings, provides users an unparalleled comprehension rate with speech quality that is up to 10,000 times better (up to 40dB noise reduction). And it does so using minimal compute power. Our low latency solution has a tiny footprint designed to run on phones, laptops and room devices, as well as in the cloud environment.
While we believe we have the best speech enhancement technology on the market, we won’t stop there. We will continue to use deep learning advancements to enhance our speech technology with the goal of helping organizations achieve better team collaboration through their video conferencing technology.
Sign up for a Webex free trial and enable a better team collaboration and video conferencing technology experience today.
About the author
Cisco VP, Engineering, Voice Technology
Chris is a Silicon Valley entrepreneur and technologist known for his groundbreaking work developing RISC microprocessors, domain-specific architectures and deep learning-based speech software.
Before joining Cisco, he was the cofounder and CEO of speech science technology company BabbleLabs, which was acquired by Cisco in 2020. Prior to that, he founded the processor licensing company, Tensilica, and led it as CEO before it was sold to Cadence in 2013.
Chris holds greater than 40 US and international patents, an MSEE and PhD in electrical engineering from Stanford and a BA in physics from Harvard. He is an IEEE Fellow.
Still need help?
What would you like to do?