Categories: Engineering Innovation

Cisco AI researchers publish a novel crowdsourced speech intelligibility test framework at ICASSP 2024

Cisco is committed to continuously advancing state-of-the-art speech enhancement AI technology, such as Background Noise Removal, Optimize for My Voice, and HD Voice. These innovations are designed to benefit a wide variety of users across a diverse range of languages, accents, devices, and environments. Ensuring these features perform well for everyone involves large-scale testing to confirm they are effective, useful, and aligned with Cisco’s Responsible AI Principles. To tackle the challenges of conducting such tests without incurring high costs, the Collaboration AI Engineering team has developed an approach to perform repeatable and cost-efficient crowdsourced multilingual speech intelligibility assessments.

Our researchers are presenting details of the methodology at the upcoming International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 in Seoul, Korea, next week. Additionally, Cisco is committed to contributing to the wider research community by making the software and dataset used in this testing method publicly available.

The Challenges with Traditional Speech Intelligibility Tests

Traditional speech intelligibility tests involve expert listeners under carefully controlled laboratory conditions adhering to established standards and recommendations by institutions, such as the International Telecommunication Union (ITU) and The American National Standards Institute (ANSI). As a result, they are costly and time-consuming to run.

In practice, researchers may have limited access to laboratory tests, as time and cost would make them impractical for frequent testing. This is especially true during AI model training, where several instances of a model are evaluated concurrently. Running laboratory testing for each version of the AI model would be prohibitively slow and costly, limiting the researchers’ ability to uncover issues at an earlier stage of the AI development. This could hurt user experience and fairness.

Advancing State of the Art Speech AI Technology Through Crowdsourcing

To address this, Cisco AI researchers designed a method for easy-to-run, multi-language, cost-effective, and scalable evaluation of speech intelligibility based on crowdsourcing. Since crowdsourcing allows for larger participant numbers, the statistical noise can be averaged out and the results, particularly relative rankings of different conditions, are highly reliable and reproducible.

AI researchers design a speech intelligibility test and deploy the task online to crowd workers across the globe. As crowd workers can take the test from the comfort of their own homes, and the time spent on a task is significantly less than a laboratory testing session. The results are then collected via an online platform and can be easily accessed. While professional test laboratories may take days or weeks to deliver results, crowdsourced results are available a few hours after releasing the task and at a fraction of the cost.

Although there are clear advantages in running crowdsourcing-based testing, there are several challenges as well. For example, the crowd consists of everyday (rather than expert) listeners and their listening environment is not easily controlled. We propose strategies to mitigate these challenges, for example via screening questions, listening in noise qualification tests, worker selection based on, e.g., language proficiency, inclusion of catch trials, and motivational rewards for accurate responses.

The proposed method is not meant to replace laboratory testing in all cases, but to provide AI researchers with a viable option to perform frequent and rapid testing. For example, AI researchers may deploy crowdsourced testing during AI model training, and then leverage laboratory-based testing for characterization of the models selected for productization.

Availability of The Framework

The details of the proposed methodology will be presented by the Cisco AI Research Team at the International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 to be held in Seoul, Korea, from April 14th – 19th, 2024 [1]. The preprint of the article is available here.

Furthermore, Cisco is contributing to the research community by open-sourcing the test preparation and analysis software and by making the multilingual speech data publicly available. We hope this will help reduce the barrier to wider exploration in this domain and drive the adoption of crowdsourced assessment of speech intelligibility across both academia and industry. The publicly reduced materials can be accessed through the Cisco GitHub repository.

Learn More – Join Us at ICASSP April 14th-19th

For those attending ICASSP 2024 in Seoul from April 14th – 19th, 2024, the Cisco AI Research team is looking forward to meeting you.

About the authors:

Laura Lechler loves working at the interface between AI-driven speech technology and qualitative linguistic research, driving the implementation of a crowdsourcing-based subjective test framework for of speech AI products at Cisco Webex.

Kamil Wojcicki is a Principal Engineer with Webex Collaboration, leading the research and development efforts in next-generation audio AI technologies. With a focus on delivering and advancing critical audio features like Webex’s HD Voice, Neural Codec, and Noise Reduction, Kamil is dedicated to significantly elevating the Webex audio experience for customers.

Ferdinando Olivieri is a Product Manager for Webex AI Audio Innovations, with the objective of advancing audio AI technologies to significantly enhance the Webex audio experience for customers. He is currently focusing on using generative AI for speech coding and for improving the quality of narrowband audio.

Resources:

More from the Webex blog:


[1] L. Lechler and K. Wojcicki, “Crowdsourced Multilingual Speech Intelligibility Testing,” ICASSP 2024 – IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 2024, pp. 1441-1445, doi: 10.1109/ICASSP48485.2024.10447869. https://ieeexplore.ieee.org/document/10447869

Share
Published by
Ferdinando Olivieri

Recent Posts

  • Collaboration Devices
  • Innovation

The Business Impact of Design at Cisco

Today is International Design Day, the perfect day to celebrate our recent wins in the…

3 days ago
  • Event Management

Webex Events | Host engaging in-person, virtual, and hybrid events

Here at Webex, we are dedicated to keeping people connected for seamless collaboration. In today’s…

7 days ago
  • Engineering

Modern Video-Conferencing Systems: An Introduction to the Session Description Protocol

This series focuses on SDP, the Session Description Protocol, the method by which almost all…

2 weeks ago
  • Customer Stories
  • Hybrid Work

Cisco named a Customers’ Choice in 2024 Gartner Peer Insights™ Voice of the Customer for Meeting Solutions

We’re excited to share that Cisco was recently named a 2024 Customers’ Choice in the…

2 weeks ago
  • Hybrid Work
  • Video Conferencing

Webex Video Messaging (Vidcast) | Empowering seamless workflow with asynchronous collaboration

In this hybrid work environment, we strive for work-life balance by making our day more…

3 weeks ago
  • Collaboration
  • Hybrid Work

Refine your Workflow with Powerful Webex Partner Integrations

Last week, Orlando was teeming with Webex partners, customers, and IT pros from across different…

4 weeks ago