Since we launched Webex Assistant in 2020, the most common question we have received from our customers is: “Is it accurate?” And I get it; customers want to make sure that if they opt in to use Webex AI (Artificial Intelligence) automated transcription engine, that it will deliver on its promise to keep an accurate record of the meeting, lets meeting attendees focus on the conversation instead of typing meeting notes, and make meetings more inclusive through accessibility features. There are so many examples where artificial intelligence over-promises and under-delivers, and for business-critical tasks, Webex has taken great strides to ensure a relentless focus on accuracy.
As the world moves into a hybrid work model, features like closed captioning, transcription, and capturing action items have become more important than ever in driving equal and inclusive meeting experiences, regardless of what language users speak, what accessibility needs they might have or whether they choose to skip a meeting to juggle their busy lives and lean on Webex Assistant to provide a recap. Our goal is to leverage AI and Machine Learning to make every meeting experience better for everyone.
Building state of the art AI transcription engines is one way of achieving that goal.
Given the investment Webex has made in building out robust, end-to-end labeling, training, and machine learning pipelines, we are proud to be able to use this foundation to rollout an English transcription engine that has industry leading accuracy for the Webex meeting experience when compared to some of the best-in-class speech recognition engines in the market. In an effort to expand the reach of our technology to cover more than 98% of Webex customers worldwide, we will be rolling out Spanish, French and German ASRs (Automatic Speech Recognition engines) built entirely in-house, which will be offered for free for all Webex Assistant users in H1 of this year.
When we think of an accurate transcription of a conversation, we often envision that if we have a human transcriber listen to this audio file, the transcript will reflect an exact record of what was said. However, to put things in perspective, human error rate has been measured on some of the popular datasets such as “CallHome” and the best results reported so far is 6.8% error rate; meaning if you have a transcript of 100 words, approximately 7 of them would be transcribed inaccurately by a human. It is also worth mentioning that “CallHome” is a dataset that constitutes unscripted 30-minute telephone conversations between native speakers of English. [1] It is expected that the percentage error for a dataset with speakers of different English accents to be higher.
What is even more interesting is that inter-transcriber agreement as measured by the Linguistics Data Consortium (LDC) ranges between 4.1% and 9.6% depending on whether it is careful multiple transcriptions versus quick transcription [2]. What that means is that if you give an identical audio file to 2 humans, they will still not produce an identical record of what was said even in perfect environmental conditions.
Our goal as we continue to improve Webex transcription to not only be on par with human transcription but to surpass it and achieve the best-in-class accuracy for every language we ship across different accents, genders, and acoustic environments.
So, to answer the question “is it accurate?” It’s critical to outline the different dimensions of accuracy in automatic speech recognition:
See below example
Not quite. However, it is a marathon not a sprint. We believe that by continuing to train on domain-specific data while striving to mitigate bias and maintaining our customer’s data privacy and security, our in-house developed AI transcription engine for Webex would get on par, if not, exceed human word error rate.
If you would like to experience it for yourself, sign up for a free trial today
Citations
1Training data is only collected under strict privacy and confidentiality terms for users who opt-in to share their data to help improve the quality of the product
[1] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dim-
itriadis, X. Cui, B. Ramabhadran, M. Picheny, L.-L. Lim,
B. Roomi, and P. Hall, “English conversational telephone speech
recognition by humans and machines”, arXiv:1703.02136, Mar.
2017.
[2] M. L. Glenn, S. Strassel, H. Lee, K. Maeda, R. Zakhary, and X. Li,
“Transcription methods for consistency, volume and efficiency”,
in LREC, 2010
[3] What is WER? What Does Word Error Rate Mean? – Rev https://www.rev.com/blog/resources/what-is-wer-what-does-word-error-rate-mean.
Learn more
Joining Webex as a machine learning engineer: An interview with Ritvik Shrivastava
How our pursuit of inclusive audio/video AI is powering the future of collaboration
We are thrilled to announce that registration is now open for WebexOne 2022! As always,…
It’s awards season! I’m excited to announce that nominations are now open for the 2022…
Webex Meetings is trusted by hybrid workers, enabling more than 250 million monthly participants and…
As the business world adapts to hybrid ways of working, digital collaboration and communication will…
We at Webex recently hosted a week of events focused on inclusivity and accessibility across…
With over three million agents using Cisco contact center solutions, it would be fair to…