State-of-art speech enhancement and noise reduction technology
At the heart of video conferencing is one goal – hearing and understanding. Without clear voice, conferencing is almost useless and team collaboration kaput. Never has this been more evident than during COVID-19 when people work from anywhere and everywhere.
That’s why Cisco Webex has invested in state-of-the-art speech enhancement and noise reduction technology. With the latest AI-based software built into Webex app, Webex Meetings, and Webex Desk devices like Webex Desk Pro users have at their fingertips the world’s best available sound experience with every remote collaboration.
Our speech system is built for enterprise-grade team collaboration and conferencing to provide consistent, effective speech experience for all users everywhere. It is designed to address complex system challenges and maintain speech integrity that produces audio excellence.
Benefits of Webex speech enhancement technology
With Webex speech enhancement technology, enterprises can count on:
Technology that Overcomes System Complexities
- Runs efficiently across breadth of user’s edge devices: Webex clients are available across Windows, MacOS, iOS, Android as apps and in browsers. That includes the thousands of different laptop and phone configurations released in the past decade, reflecting an enormous range of CPU variants, OS releases and user system setups.
- Addresses a range of connections. There are video+audio connections made through the client app, but others may include dial-in voice connections through a legacy bridge, non-native video+audio packaged as SIP (VoIP) connections, and connections from specialized conferencing devices. Each connection’s type may support a different subset of features. Our technology provides adaptations and transcoding requirements for seamless connections for all.
- Adapts to varied bandwidth and latency: The conferencing system must adapt itself to the conditions, potentially changing audio and video resolutions and encoding rates, error correction and redundancy strategies on the fly on a user-by-user basis. And every bandwidth and latency adaptation tactic must be tested against the range of possible network conditions and machine configurations.
- Maintains robust security: Security is essential to enterprise team collaboration. This means error- and attack-proof protection of participant audio and video streams, shared content, user information, recordings and transcripts on the devices, in the network and in the infrastructure. It doesn’t happen by itself – security capabilities must be architected, maintained and tested continuously.
Audio Excellence that Maintains Speech Integrity
- Captures audio: Reverberation, background noise and imperfect microphone setups are inevitable. The shift to remote work makes this problem even more serious, as the audio capture setting at home is generally less controllable than in offices. Moreover, as conference hours per day grow, users are resistant to living in headphones and earbuds – they prefer the physical comfort of laptop and other remote microphones and speakers. Multiple microphone capture and advanced acoustic echo cancelation are essential innovations to grab the greatest possible audio information and separate local talkers from far-end talkers coming from the loudspeakers.
- Enhances audio: Once the audio scene is fully captured, audio enhancement plays a central role. Cisco Webex has a new standard for noise removal, dereverberation and multi-microphone beam-forming technologies for conferencing, virtually eliminating the annoyance and impaired comprehension of difficult source environments.
- Encodes and decodes audio: Capturing and cleaning speech at the source is not enough. We also need to deliver it to conference participants globally. This means encoding it to a modest bit-stream and delivering it to the conference connection infrastructure and to recipient devices where it is decoded and presented. Work-from-home settings have made typical networks much worse, with overloaded home WiFi, limited uplink bandwidth on DSL and cable home access and unpredictable delays through global Internet service providers.
- Renders audio: Finally, the decoded audio must be presented to listeners at the receiving end. In the simplest case, this is just playback of the decoded audio. But there’s more to it. We must render not just one stream but mix and optimize all the different sources into a comfortable conference audio stream. Second, the cumulative impairments from data transmissions, including codec imperfections and data loss, create additional opportunities for advanced signal processing to help the listener. Many of the receiving systems also support multiple speakers, so intelligent audio rendering of the received speech can further enhance comprehension.
Major parts involved in creating enterprise speech technology
Looking at the diagram can provide appreciation for the challenge in designing, building and testing speech technology for the enterprise.
The transmit and receive functions are largely separate, because the transmit side concerns itself exclusively with the local talker’s speech, and the receiver with remote talker’s speech. The only significant interaction is in acoustic echo cancellation for speakerphone configurations where the remote talkers’ voices coming from the loudspeakers must be subtracted out of the stream captured from the local microphones to prevent echoes and feedback loops.
Modern systems accommodate virtually unlimited numbers of connections in order to support large conference calls, lectures and events. The sheer scale of the system is daunting – Cisco is the biggest video conferencing supplier, with particular strength in medium and large organizations [Source Synergy Research Inc.]. As global team collaboration and work from home become ever more deeply engrained in our permanent work patterns, the role of conferencing will only grow. As it does, Webex will continue to meet the rising demand for scalable, AI-based speech enhancement solutions and expand on its uses.
In conjunction with high-quality video conferencing, we are developing the ability to understand not just the words but the intonation, emotion and body language. We are developing high quality transcriptions and natural language assistants to support meeting productivity and provide seamless language translation on the fly, while pushing down media latency to make meeting interactions more natural. We will continue to find innovative ways to use AI speech enhancement technology to deepen understanding and enhance collaboration for people working anywhere.
Learn how to take advantage of audio excellence while using Webex products.
Sign up for a Webex free trial and enable a better team collaboration and video conferencing technology experience today.
About the author
Cisco VP, Engineering, Voice Technology
Chris is a Silicon Valley entrepreneur and technologist known for his groundbreaking work developing RISC microprocessors, domain-specific architectures and deep learning-based speech software.
Before joining Cisco, he was the cofounder and CEO of speech science technology company BabbleLabs, which was acquired by Cisco in 2020. Prior to that, he founded the processor licensing company, Tensilica, and led it as CEO before it was sold to Cadence in 2013.
Chris holds greater than 40 US and international patents, an MSEE and PhD in electrical engineering from Stanford and a BA in physics from Harvard. He is an IEEE Fellow.
Still need help?
What would you like to do?Read more