Categories: Engineering Innovation

Cisco AI researchers publish a novel crowdsourced speech intelligibility test framework at ICASSP 2024

Cisco is committed to continuously advancing state-of-the-art speech enhancement AI technology, such as Background Noise Removal, Optimize for My Voice, and HD Voice. These innovations are designed to benefit a wide variety of users across a diverse range of languages, accents, devices, and environments. Ensuring these features perform well for everyone involves large-scale testing to confirm they are effective, useful, and aligned with Cisco’s Responsible AI Principles. To tackle the challenges of conducting such tests without incurring high costs, the Collaboration AI Engineering team has developed an approach to perform repeatable and cost-efficient crowdsourced multilingual speech intelligibility assessments.

Our researchers are presenting details of the methodology at the upcoming International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 in Seoul, Korea, next week. Additionally, Cisco is committed to contributing to the wider research community by making the software and dataset used in this testing method publicly available.

The Challenges with Traditional Speech Intelligibility Tests

Traditional speech intelligibility tests involve expert listeners under carefully controlled laboratory conditions adhering to established standards and recommendations by institutions, such as the International Telecommunication Union (ITU) and The American National Standards Institute (ANSI). As a result, they are costly and time-consuming to run.

In practice, researchers may have limited access to laboratory tests, as time and cost would make them impractical for frequent testing. This is especially true during AI model training, where several instances of a model are evaluated concurrently. Running laboratory testing for each version of the AI model would be prohibitively slow and costly, limiting the researchers’ ability to uncover issues at an earlier stage of the AI development. This could hurt user experience and fairness.

Advancing State of the Art Speech AI Technology Through Crowdsourcing

To address this, Cisco AI researchers designed a method for easy-to-run, multi-language, cost-effective, and scalable evaluation of speech intelligibility based on crowdsourcing. Since crowdsourcing allows for larger participant numbers, the statistical noise can be averaged out and the results, particularly relative rankings of different conditions, are highly reliable and reproducible.

AI researchers design a speech intelligibility test and deploy the task online to crowd workers across the globe. As crowd workers can take the test from the comfort of their own homes, and the time spent on a task is significantly less than a laboratory testing session. The results are then collected via an online platform and can be easily accessed. While professional test laboratories may take days or weeks to deliver results, crowdsourced results are available a few hours after releasing the task and at a fraction of the cost.

Although there are clear advantages in running crowdsourcing-based testing, there are several challenges as well. For example, the crowd consists of everyday (rather than expert) listeners and their listening environment is not easily controlled. We propose strategies to mitigate these challenges, for example via screening questions, listening in noise qualification tests, worker selection based on, e.g., language proficiency, inclusion of catch trials, and motivational rewards for accurate responses.

The proposed method is not meant to replace laboratory testing in all cases, but to provide AI researchers with a viable option to perform frequent and rapid testing. For example, AI researchers may deploy crowdsourced testing during AI model training, and then leverage laboratory-based testing for characterization of the models selected for productization.

Availability of The Framework

The details of the proposed methodology will be presented by the Cisco AI Research Team at the International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 to be held in Seoul, Korea, from April 14th – 19th, 2024 [1]. The preprint of the article is available here.

Furthermore, Cisco is contributing to the research community by open-sourcing the test preparation and analysis software and by making the multilingual speech data publicly available. We hope this will help reduce the barrier to wider exploration in this domain and drive the adoption of crowdsourced assessment of speech intelligibility across both academia and industry. The publicly reduced materials can be accessed through the Cisco GitHub repository.

Learn More – Join Us at ICASSP April 14th-19th

For those attending ICASSP 2024 in Seoul from April 14th – 19th, 2024, the Cisco AI Research team is looking forward to meeting you.

About the authors:

Laura Lechler loves working at the interface between AI-driven speech technology and qualitative linguistic research, driving the implementation of a crowdsourcing-based subjective test framework for of speech AI products at Cisco Webex.

Kamil Wojcicki is a Principal Engineer with Webex Collaboration, leading the research and development efforts in next-generation audio AI technologies. With a focus on delivering and advancing critical audio features like Webex’s HD Voice, Neural Codec, and Noise Reduction, Kamil is dedicated to significantly elevating the Webex audio experience for customers.

Ferdinando Olivieri is a Product Manager for Webex AI Audio Innovations, with the objective of advancing audio AI technologies to significantly enhance the Webex audio experience for customers. He is currently focusing on using generative AI for speech coding and for improving the quality of narrowband audio.

Resources:

More from the Webex blog:


[1] L. Lechler and K. Wojcicki, “Crowdsourced Multilingual Speech Intelligibility Testing,” ICASSP 2024 – IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 2024, pp. 1441-1445, doi: 10.1109/ICASSP48485.2024.10447869. https://ieeexplore.ieee.org/document/10447869

Share
Published by
Ferdinando Olivieri

Recent Posts

  • Engineering

Modern Video-Conferencing Systems: Negotiating H.264 in SDP

In previous blog entries, we covered SDP, and how it is used in negotiations via…

4 days ago
  • Collaboration
  • Innovation

Amplify collaboration & CX with AI at WebexOne! Registration is now open!

Are you ready to ride the waves of innovation in Miami at WebexOne 2024?! Surf’s…

4 days ago
  • Collaboration
  • Hybrid Work

8 reasons why millions trust the AI-powered Webex Suite

Today’s workplaces must evolve to support modern workforces that require always-on, anywhere connectivity. Connecting and…

5 days ago
  • Engineering

Modern Video-Conferencing Systems: Understanding SDP Offer/Answer Negotiation

In the previous blog entries in this series we introduced SDP (Session Description Protocol) and…

2 weeks ago
  • Collaboration Devices
  • Customer Experience
  • Customer Stories
  • Hybrid Work

Webex Earns TrustRadius 2024 Top Rated Awards for Webex Contact Center, Webex Suite, and Cisco Devices

Webex Contact Center, Webex Suite, and Cisco devices all received 2024 Top Rated Awards from…

2 weeks ago
  • Collaboration
  • Hybrid Work

Webex and Microsoft: The most extensive integration in collaboration

In today’s modern work, businesses are looking for secure, streamlined access to essential applications and…

2 weeks ago