Cisco is committed to continuously advancing state-of-the-art speech enhancement AI technology, such as Background Noise Removal, Optimize for My Voice, and HD Voice. These innovations are designed to benefit a wide variety of users across a diverse range of languages, accents, devices, and environments. Ensuring these features perform well for everyone involves large-scale testing to confirm they are effective, useful, and aligned with Cisco’s Responsible AI Principles. To tackle the challenges of conducting such tests without incurring high costs, the Collaboration AI Engineering team has developed an approach to perform repeatable and cost-efficient crowdsourced multilingual speech intelligibility assessments.
Our researchers are presenting details of the methodology at the upcoming International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 in Seoul, Korea, next week. Additionally, Cisco is committed to contributing to the wider research community by making the software and dataset used in this testing method publicly available.
The Challenges with Traditional Speech Intelligibility Tests
Traditional speech intelligibility tests involve expert listeners under carefully controlled laboratory conditions adhering to established standards and recommendations by institutions, such as the International Telecommunication Union (ITU) and The American National Standards Institute (ANSI). As a result, they are costly and time-consuming to run.
In practice, researchers may have limited access to laboratory tests, as time and cost would make them impractical for frequent testing. This is especially true during AI model training, where several instances of a model are evaluated concurrently. Running laboratory testing for each version of the AI model would be prohibitively slow and costly, limiting the researchers’ ability to uncover issues at an earlier stage of the AI development. This could hurt user experience and fairness.
Advancing State of the Art Speech AI Technology Through Crowdsourcing
To address this, Cisco AI researchers designed a method for easy-to-run, multi-language, cost-effective, and scalable evaluation of speech intelligibility based on crowdsourcing. Since crowdsourcing allows for larger participant numbers, the statistical noise can be averaged out and the results, particularly relative rankings of different conditions, are highly reliable and reproducible.
AI researchers design a speech intelligibility test and deploy the task online to crowd workers across the globe. As crowd workers can take the test from the comfort of their own homes, and the time spent on a task is significantly less than a laboratory testing session. The results are then collected via an online platform and can be easily accessed. While professional test laboratories may take days or weeks to deliver results, crowdsourced results are available a few hours after releasing the task and at a fraction of the cost.
Although there are clear advantages in running crowdsourcing-based testing, there are several challenges as well. For example, the crowd consists of everyday (rather than expert) listeners and their listening environment is not easily controlled. We propose strategies to mitigate these challenges, for example via screening questions, listening in noise qualification tests, worker selection based on, e.g., language proficiency, inclusion of catch trials, and motivational rewards for accurate responses.
The proposed method is not meant to replace laboratory testing in all cases, but to provide AI researchers with a viable option to perform frequent and rapid testing. For example, AI researchers may deploy crowdsourced testing during AI model training, and then leverage laboratory-based testing for characterization of the models selected for productization.
Availability of The Framework
The details of the proposed methodology will be presented by the Cisco AI Research Team at the International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 to be held in Seoul, Korea, from April 14th – 19th, 2024 [1]. The preprint of the article is available here.
Furthermore, Cisco is contributing to the research community by open-sourcing the test preparation and analysis software and by making the multilingual speech data publicly available. We hope this will help reduce the barrier to wider exploration in this domain and drive the adoption of crowdsourced assessment of speech intelligibility across both academia and industry. The publicly reduced materials can be accessed through the Cisco GitHub repository.
Learn More – Join Us at ICASSP April 14th-19th
For those attending ICASSP 2024 in Seoul from April 14th – 19th, 2024, the Cisco AI Research team is looking forward to meeting you.
About the authors:
[avatar user=”llechler” size=”100″ /]
Laura Lechler loves working at the interface between AI-driven speech technology and qualitative linguistic research, driving the implementation of a crowdsourcing-based subjective test framework for of speech AI products at Cisco Webex.
[avatar user=”kamkowal” size=”100″ /]
Kamil Wojcicki is a Principal Engineer with Webex Collaboration, leading the research and development efforts in next-generation audio AI technologies. With a focus on delivering and advancing critical audio features like Webex’s HD Voice, Neural Codec, and Noise Reduction, Kamil is dedicated to significantly elevating the Webex audio experience for customers.
[avatar user=”feolivie” size=”100″ /]
Ferdinando Olivieri is a Product Manager for Webex AI Audio Innovations, with the objective of advancing audio AI technologies to significantly enhance the Webex audio experience for customers. He is currently focusing on using generative AI for speech coding and for improving the quality of narrowband audio.
Resources:
- Webex AI Codec: Delivering Next-level Audio Experiences with AI/ML
- Webex AI Codec whitepaper
- Responsible AI
More from the Webex blog:
- Sound matters: The role of audio quality in video conferencing
- HD Voice: crystal clear audio powered by AI
- Real-Time Media | The Fundamentals of Audio
[1] L. Lechler and K. Wojcicki, “Crowdsourced Multilingual Speech Intelligibility Testing,” ICASSP 2024 – IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 2024, pp. 1441-1445, doi: 10.1109/ICASSP48485.2024.10447869. https://ieeexplore.ieee.org/document/10447869