Posts by Category

Man in justice industry typing on a computer with Webex Legislate
What’s new in Webex: November 2020

Welcome to what’s new in Webex! This article highlights new Webex features and capabilities that we have recently introduced, as well as a preview of new features coming in November and later this fall.

MEETINGS

WEBEX TEAM COLLABORATION

CALLING

DEVICES

INTEGRATIONS AND INDUSTRY SOLUTIONS

CONTROL HUB, SECURITY AND EDGE SERVICES

MEETINGS

Remove Background Noise from Meetings and Events – Windows and Mac

Having drywall put up in another room or a dog barking to go outside? Filter out those noisy background disruptions before (or during) a call so that others only hear your voice. Just turn on the Remove background noise setting and you’re all set. Find this feature at Settings > Remove background noise.

Work your way: Introducing the Dark Theme

Reduce eye strain by choosing a theme that works best with your lighting conditions. Webex is expanding dark and light theme choices available for the team collaboration to meetings.  Now you can now choose whether you want to view the meeting window in the standard Light Theme or the new Dark Theme. You can select your preferred option from the View menu.

Darktheme

Automatic in-meeting mute warning

Whoops, don’t share your great idea while on mute.  We’ve been making a lot of enhancements to mute and unmute functionality in Webex to remove some of the common friction points in meetings. Soon, automatic notifications in Webex Meetings and Webex Events will let you know if you’re trying to speak while on mute. If you start to talk while your microphone is muted, Webex will automatically show you a notification so that you can unmute yourself to speak.

Automatic in-meeting mute warning

WEBEX TEAM COLLABORATION

We’re adding more intelligence to enhance team collaboration. Use BOTs to automate your workflows and provide proactive notifications.

Microsoft Outlook Bot

We’re making it easy to transition from traditional email to team collaboration.  Don’t miss an important email while you are working in Webex.  Now you can get alerts within your team spaces of email from your client, boss or related to key projects.  Use notification rules for important subjects and people.  No need to create separate rules within Outlook – use an adaptive card within Webex space.

Microssoft Outlook Bot

Confluence Cloud Bot—Windows, Mac, iPhone, iPad, and Android

Get notified of important updates to Pages and Blogs in Confluence within your team workspace.   We’re making it easy to keep on top of changes to documented processes, release plans, or product requirement pages from Confluence.   Now you can quickly get updates for pages recently updated, watched, and even saved for later.

Leave a comment to a change on a page directly from any Webex space

Leave a comment to a change on a page directly from any Webex space

Birthday Bot—Windows, Mac, iPhone, iPad, and Android

Celebrate your teammates and get in a party mood.  We’ve created a super easy way to keep track of and celebrate birthdays within your team with a new Birthday Bot! Once available, you’ll find it on the Webex App Hub.

CALLING

Webex Calling

Remove Background Noise

Continue communicating without noisy background disruptions. Filter out those noisy background disruptions during calls using the Webex app so that others only hear your voice.

You will soon be able to select a setting option to remove background noise.  Just as in the Webex Meeting experience, when the Remove Background Noise setting is selected, the unified Webex app will remove leaf blowers, vacuum, and other background noise, while enhancing your voice. This feature will be available in November.

Call queue availability

Business agents who are members of a Call Queue can set their availability to receive calls in order to make call routing more effective.  This feature makes working from home easier for the end user when you do not have a hard phone. Users can set their availability from the Webex desktop or mobile client call settings page.

Call Preferences

Update your contacts list from call history

We’re making it easy to add or edit your contacts, not only from Chat, but also from your call history.  Now you can right-click an entry in your Call History and have several convenient options at your fingertips. You can call the person back, add the person to your Contacts list, send a message, send an email, or even start a meeting.

DEVICES

Webex is continues to add features to enable common app and device experiences to support more hybrid work scenarios.

Remove Background Noise

Continue communicating without noisy background disruptions. Later this month, Webex RoomOS devices (Desk Pro, Board and Room Series) will support advanced AI to remove background noise and enhance human voice.

Mute Controls

New expanded roster view & mute experiences create more consistent experiences across app and device meeting experiences. Hosts and co-hosts now have more control over the audio in Webex Meetings directly from their Webex device.  New features include: Mute and unmute everyone, enable “Mute on entry” (all users are automatically muted when they join), and control if meeting participants can unmute, or if only hosts can control who is unmuted.

 

INTEGRATIONS AND INDUSTRY SOLUTIONS

Expanded Webex Expert on Demand features

Webex Expert on Demand lets frontline workers collaborate hands free with subject matter expert anywhere to be more effective. Webex Expert on Demand integration with RealWear HMT-1 continues to expand features. With the Expert on Demand 1.7 release in November, frontline workers will have: expanded group call features to show active calls or initiate group call, search and select space to place or join calls, advanced camera controls to adjust contrast and exposure as well as take HD resolution photos, enhanced shared content controls to zoom in on content, simplified login options via a companion app with QR code and expanded language support with the addition of Korean, Thai, German, French, Polish and Russian.

Webex Expert on Demand

Public Sector Options:  Webex Legislate

Webex Legislate is a purpose-built, complete collaboration solution that enables legislatures to vote and conduct hearings virtually, just as they would in personWebex Legislate is built and secured on the global Webex platform to enable high availability and reliability for institutions around the globe, and is customizable to meet the needs of federal, state and local organizations.

CONTROL HUB, SECURITY AND EDGE SERVICES

If you are an admin, Webex Control Hub places management, monitoring, troubleshooting, analytics, edge and hybrid services across all your collaboration resources in one pane of glass.

Support for Azure Conditional Access to Microsoft OneDrive and SharePoint Online – Windows, Mac

If your organization uses Azure Conditional Access, you can now extend access to users to sign into Microsoft OneDrive and SharePoint Online content management from within Webex Teams across Windows and Mac devices.  Uses Proof of Possession API to verify device in windows to seamlessly authenticate and use ECM-MS functionality.

Support for Azure

Expanded VDI Support

Webex is offering best in class VDI support for collaboration. Webex supports full featured collaboration and advanced features across meetings, calling, team collaboration and content sharing.  Media is optimized to support 720p video and eliminate content hair-pining.  Additionally, expanded experiences such as virtual background, video breakout rooms and Webex Meeting Assistant are supported in VDI clients so users get richer collaboration experiences even in virtual desktop environments.  Meetings app platform support will be expanded in November to include Citrix and Windows/Ubuntu/eLux Thin Client OS, VMware and Windows Thin Client OS.

Expanded VDI Support

Space Classification

Admins can now define Webex space labels based on data governance policies and enforce all users to classify the Webex team collaboration spaces they create by Public, Confidential, Highly Confidential, or Secret.  This prompts user awareness and compliance with data classification policies where sensitive and confidential content may be shared.  External access can also be restricted to spaces in adherence to your organizations Enterprise Data Classification policies via DLP controls.

Image13 Viewing a Classified Space in a Classified Team.png

Webex Edge Connect – PacketFabric option

Expanded private connectivity options to Webex for hybrid deployments now include PacketFabric. Webex Edge Connect service supports a dedicated private connection to the Cisco Webex Cloud; this option offers added reliability and security for those organizations who prefer not to use OTT connectivity at the edge. With hundreds of global locations, the PacketFabric option for Webex Edge Connect allows organizations to interconnect to Cisco Webex via their owned physical network port and managed virtual connections using a flexible subscription model to adjust bandwidth as needed.

New API options

Webex Control Hub now provides easier access to data via APIs for integrations, and automatic data access for quality data, aggregated trending data and detailed reports. Admins can now pull the data in Control Hub via APIs to integrate with other dashboards or use for further analysis.

For more information on all of these features and upcoming update to Webex, please visit the What’s New Articles for Webex Services.

Learn More

Explore the new Webex Legislate experience

Webex for Government 

Webex Legislate [Website]

Webex Innovation Keeps Governments Running Effectively and Securely From Anywhere

Still Need Help?

Join a Webex online meeting

Learn more about Webex, join one of our upcoming training sessions

Explore daily product demonstrations

Future of Government: Hybrid Workspace for Legislature [Live Class]

Sign up for Webex

Visit our home page or contact us directly for assistance.

Read more
Woman pressing happy smiley face emoticon on virtual touch screen. Customer service rating concept.
Are you happy?

Chelsea Miller & Maryam Tabatabaeian –  A data-driven framework for analyzing user satisfaction with Cisco Webex Assistant

Voice assistants are the user interfaces of our future. Voice user interfaces (VUIs) allow us to communicate in one of the most natural ways possible—talking, just as we would with any of our friends or colleagues. While VUIs once felt like a novelty and weren’t very robust (we can all remember the days when the response to just about any command was “Do you want me to search the web?”), today they’ve integrated into our routines, helping us check the weather, play music, and get directions. They’re even in our workplaces.

As the popularity of voice assistants grows, so do users’ expectations for seamless, human-like interactions. How do we evaluate if the assistants we build are satisfying our users? What does it mean for a user to be “happy,” and how do you even measure happiness?

On the MindMeld team, we’ve been investigating how to evaluate users’ experiences when talking with Webex Assistant, the first conversational assistant for the enterprise space. We answer the question “are users happy?” by developing a quantitative framework to address a historically qualitative user experience problem. We hope this investigation sparks interest in how we can think about measuring user experience online and in real-time.

Why is evaluating user satisfaction so difficult?

Evaluating user experience with artificial intelligence (AI) products is a difficult problem for a few reasons: AI systems are probabilistic in nature, natural language is variable, and user satisfaction is subjective. We expect the system’s output to be variable since AI is non-deterministic; deciding if the result of the system is expected variation or an error is one layer of difficulty. Not only can the output of the system vary, but the input from the user varies, too. People speak in different ways, using different words and sentence structures, so the system must be robust to understanding lots of natural language variation. Finally, how users are feeling is hard to understand, especially when you can’t ask them, and even harder to quantify. What satisfies a user can differ from individual to individual, and one user’s feelings can change based on the context and their expectations.

Previous research focused on understanding users’ experiences falls into two main categories: accuracy-based approaches and user studies & surveys. In accuracy-based approaches, accuracy is used as a proxy for satisfaction. If the system has high accuracy, the user is likely satisfied. However, these approaches often rely on single utterances and can miss the overall experience of a conversation. For example, if a user says “next page,” and the system identifies the query correctly and moves to the next page, this would be a success! But if the user says “next page” several times, this might indicate that they haven’t been able to find what they were looking for. That potential user frustration wouldn’t be captured when just looking at the single utterance accuracies. User surveys & studies provide a great window into user behavior, but conducting longitudinal user studies is costly in terms of resourcing participants and time spent on qualitative & quantitative data analysis. This approach is much harder, if not impossible, to use at scale. User studies don’t use real-time user data and take place in artificial lab settings, which might differ from real users’ experiences.

Our framework

We try to take the best of each of these approaches, while addressing some of their shortcomings. We want to create a system that captures real users’ judgements, focuses on the larger user experience at the level of the conversation, and uses real-time data, so our approach can be scalable and automatic.

At a high-level, our framework:

1. Captures interactions at the level of the conversation
2. Automatically classifies conversations based on their conversational goal and conversational end state
3. Automatically assigns conversations a satisfaction label

Capturing conversations

The first challenge we tackle is how to capture users’ interactions with Webex Assistant. This is especially important for conversational AI, where analysis can happen at many different levels of an interaction. We choose to focus on conversations. We define a conversation as an interaction initiated by a user or the assistant, which can have single or multiple turns, and contains one conversational goal.

To capture each conversation, we introduce a common ID to thread that conversation from beginning to end. We log an event, called the “trigger,” each time a conversation is initiated. The trigger event includes the conversation’s unique ID and the goal of that conversation. For us, conversational goals most closely map to use cases, like calling a person or joining a meeting. Any queries the user says that move them towards the completion of the goal of the use case count as part of that conversation.

The image below shows an example of what we consider a conversation. Here, we’ll take a look at the “call person” use case. The conversational goal of the “call person” use case is to, ideally, successfully call a person.

We capture all the turns taken between the user & the assistant. When the conversation ends, we log the final state with the same ID as the trigger event. Our common ID allows us to follow the course of the conversation as it unfolded in real-time. In our example, we would log all these queries as part of one conversation with the conversational goal of “Call Person.”

Conversational end states

Retrospective analysis of historical data uncovered patterns in how user’s conversations with Webex Assistant end. After manual analysis, we decided on four categories to capture conversational end states. Here are the possible categories we use to automatically classify conversations by their end states:

Fulfilled      the assistant successfully fulfills the user’s goal
Error           the assistant fails to fulfill the user’s goal
Exited          the user abandons the conversation or cancels
Modified    the user decides to change part of the request or restart the conversation

Here are examples from the “call person” use case:

Satisfaction labels

Now that the conversation has been captured from beginning to end and it contains a label for the conversational end state, the next step is to automatically assign a satisfaction label. The goal of the satisfaction label is to capture how users might feel after having a conversation. We wanted these labels to be user-friendly: high-level enough to understand at a glance, but granular enough to capture meaningful distinctions between users’ experiences. We use the following satisfaction labels:

Happy       the user’s goal was successfully fulfilled
Sad             the user’s goal was not met
Friction    the expectations of the user are not met

Again, here are some examples from the “call person” use case:

Results of our framework

To recap our framework, we’ve threaded utterances to create conversations, automatically classified those conversations by their end state, and then automatically assigned a satisfaction label to understand users’ experiences.
Now, we’ll take a look at the results of this framework and what it helps us do. We’ll consider data from a full week of user interactions with Webex Assistant for our three most popular use cases: “call person,” “join a meeting,” and “join a Personal Room” (data not representative of real Webex Assistant performance).

 

At a glance, we can see how users feel when using Webex Assistant features—which are driving user happiness, and which might be causing the most difficulty. If “call person” shows a spike in “friction,” for example, we could investigate just those conversations. We might find that all the “friction” conversations happened on devices in Thailand, where users natively speak Thai, not English. We might hypothesize that the assistant had difficulty understanding Thai names.

Knowing how well each feature is performing in real-time allows our product team to track the real-time satisfaction of each feature and quickly identify & investigate issues. Depicted in a live dashboard, these valuable insights help us ask the right questions and directly impact the product roadmap.

Verifying our framework

We felt confident that these labels captured user satisfaction since we based them on user data, but we didn’t stop there. To be sure that the predictions we make about users’ experiences actually capture how users feel, we asked human annotators to label the data using the same satisfaction labels that our system uses: “happy,” “sad,” and “friction.” Annotators were instructed to put themselves in the shoes of the user and ask themselves, “how would I feel after this interaction?”

 

There was significant agreement between the human-labeled and system-labeled data (75% agreement, κ = 0.66). This comparison gives us confidence that our algorithm captures a realistic picture of user satisfaction. We feel confident that our framework successfully predicts user satisfaction that’s consistent with what real humans are feeling.

Takeaways

With this approach, we’re able to quickly get a snapshot of what users experience when using different Webex Assistant features. We’ve taken something as subjective as users’ happiness and broken it down into quantitative labels that can be automatically applied based on the conversation, end state, and conversational goal. We offer this as an attempt to think about how we can quantify user experience and shift the mindset of understanding users’ happiness to include quantitative methods.

About the authors

Chelsea Miller is a Data Analyst on the Webex Intelligence team at Cisco Systems. She holds a Master’s in Linguistics from UCSC, where she conducted research investigating how humans process language in real-time. Post-grad, her interest in how we understand each other, specifically how machines do (or don’t!), led her to work on conversational assistants. As a Data Analyst, she tackles problems like conversational analysis, how to securely & representatively collect quality language data, and voice interface usability. 

Maryam Tabatabaeian is a Data Scientist working with the Webex Intelligence team at Cisco Systems. Before joining Cisco, she finished her Ph.D. in Cognitive Science, doing research on how humans make decisions. Her passion for studying human behavior, along with knowledge in data science, made research on voice assistants a fascinating area to her. She now designs metrics and data models for evaluating user interactions with voice assistants. 

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
UI Meeting grid with many faces on a 5 x 5
What’s new in Webex: October 2020

New features and capabilities you can use to be productive anywhere and enable safe return to the office

Welcome to what’s new in Webex! This article offers a great summary on new features and capabilities that we have introduced, as well as a preview of new features coming in October and later this fall.

MEETINGS

We’re introducing an all-new Webex preview and in-meeting user experience, making it much more intuitive and easier to use, allowing you to be more productive anywhere you work.

Simplified Preview

The larger, optimized Audio/Video Preview window makes it easier to look your best and find the settings you need before joining a meeting. A new larger video screen lets you see exactly how you will appear in the meeting. Select a virtual background from one of our 9 preloaded options, use a blurred background, or add your own customized picture.

woman wearing glasses loking at screen

Simplified in-Meeting Controls

Meeting controls are now clearly labeled and easy to locate at the bottom of your meeting window, which means you’ll always have them in sight, without covering up shared content or video.

Clearly labeled buttons and menus mean fewer menus and clicks. Audio settings and camera settings can now be conveniently accessed in the menus at the right of their respective mute and camera buttons. Consolidated panel controls (like Participants, Chat, and Q&A) can be found towards the right, where the panel opens.

floating panel on computer screen and another graph showing redistricting projections 2021 with colored graph

And when you are sharing content, we’ve also made the experience simple and seamless.  Now you can float the participant panel, placing controls where you want them.

Throughout the month of October, we’re continuing to add more features to make your experience with Webex Meetings even more powerful.

More work-from-home options with Webex

Webex helps users work from anywhere on any device. Soon Facebook Portal users will be able to leverage their Webex Meetings app directly from their device! The experience will match what currently exists in the Android app, with audio, video, and content share enabled with the Facebook Portal device.

Webex meeting app being used directly from a hand-held device sitting on a table

Users of Amazon Echo Show or Google Home Hub Smart Display devices will be able to use Webex Meetings on these devices, as well. Amazon Echo Show devices will be able to display meeting and recording lists, as well as play back recordings. Google Nest Hub devices will be able to display meeting and recording lists, play back recordings, and schedule meetings. For more details on using Webex Meetings on experience Amazon Echo Show and Google Nest Hub Devices click here.

Manage classes more easily in Webex Education Connector

We’re expanding capabilities for instructors with Webex Education Connector to include a new recordings library, attendance management, and the ability to edit recurring class sessions.

Webex Education Connector is making it easier for instructors to record their class sessions beforehand and share those recordings to any class they teach in current or upcoming semesters. Additionally, instructors can view, rename, and delete their own recordings. Instructors can take attendance in each class, indicate excused absences, and assign a grade for students based on their attendance. Instructors can also edit their recurring class sessions. Learn more here.

WEBEX TEAMS COLLABORATION

Global App Header

We’re adding a new header in the app that gives you a quick and easy way to get to your common actions like create a space, add a contact, edit your status, and know your device connectivity without needing to go into any specific tab. See Webex Teams | New App Header

Voice Clips iPhone, iPad, and Android

No time for a phone call? Working remotely or on the fly and no time to write a message? No problem, now you can send voice clips on Webex Teams for mobile.

Example of using Webex Teams and send voice clipsMeeting and calling experiences — with virtual backgrounds

Love the virtual background or blur features? Use virtual or blurred background in calls and meetings, no matter where you work. And now you can personalize your background by uploading up to 3 images of your own… just make sure your administrator has enabled this feature for you.

Simplify complex tasks and interactions with one-click access to embedded apps

Bring important websites and web apps into the conversation. In your Webex Teams spaces, you can add a website URL as a tab and everyone in the space has access to the website with just one click. A new tab format simplifies layout and accommodates up to 10 tabs per team space. Embed a website, a BI dashboard, or a cloud document to keep work flowing quickly. This feature will be added to the Webex Teams later this month.

A layout of a Trello board within TeamsCALLING

Webex Calling

Multiline support within teams — Windows and Mac

Are you looking for an easier way to manage multiple phone lines? Now you can use Webex Teams to manage work group environments such as boss/admin, support group, or contact center all within one app. With Webex Teams you can support up to 8 phone lines and leverage advanced calling features on each line such as call forward, transfer, hunt group, shared lines, and voicemail. You can also assign different ringtones to each line, making it easier for you to know when calls are coming into certain lines. And your administrator can turn on presence for shared lines so that line status is displayed.

Multiple phone line use in Webex Teams, list of phone numbers with pastel colors to indicate different lines such as front desk, break room, and conference roomVirtual extensions

Virtual extensions allow organizations to include non-Webex Calling locations in their dial plan. Administrators can assign virtual extensions to users that aren’t on their network. Dialed Virtual Extensions are translated to routable numbers and then sent to the dialing user’s PSTN connection for outbound handling.

Unified CM

UCM Cloud Migration Assistant

For Cisco calling customers looking to take their Unified Communications Manager systems to the cloud, we now have a streamlined path to get you there, with the UCM Cloud Migration Assistant.

The Migration Assistant can automatically extract users, calling features, and system settings from your on-premises UCM and replicate them in UCM Cloud. It’s powerful enough to let you customize your migration data and optimize for cloud deployment. With the Migration Assistant, you have the flexibility to schedule your migration on a site-by-site basis and perform synthetic call testing to validate all the call routes and features are working correctly.

Migration Assistant and platform showing call routes

DEVICES

Webex is creating more intelligent solutions for the hybrid work environment. See how we’re helping organizations make returning to the office safe and productive.

New integrated sensors provide insights 

We’re integrating more sensors in Webex Room devices to monitor environmental conditions and help IT and facilities provide a safer, more comfortable working environment as they reconfigure office spaces for return to work. Expanded sensors capture not only the number of people to support social distancing, but will soon identify environmental factors such as temperature, humidity, air quality, ambient light, noise, and room acoustics quality. Analysis from intelligent sensors monitors comfort and helps improve workplace satisfaction.

New options for intelligent workplace and smart room booking

As organizations return to the office, it’s time for a more intelligent way of optimizing shared spaces. Introducing Cisco Webex® Room Navigator – an intuitive, touch panel that offers instant connections to video conferences, room controls, content sharing, room booking, even digital signage when not in use. Designed to work inside and outside the meeting room, it provides intelligent, safe room booking for users and deep data for IT and facilities managers. See it in action:

Expanded mobility options for deskless workers

Some of your most important workers are constantly on the move. Without time to get to a desk, important calls could be missed. The new Cisco IP DECT 6823 VOIP phone provides a great calling solution that enables deskless workers to stay connected with the range, mobility, and security features. Stay connected to your organization, regardless of where you are: in the field, in the warehouse, at the nursing station, on the shop floor, or in the hotel lobby. Whether it’s a quick text for information, a call for help, or even a conference call, the DECT series offers simple all-day use. Learn more here.

INTEGRATIONS

Expanded Webex Expert on Demand features

We’re working hard to make frontline workers even more effective. With the Webex Expert on Demand 1.6 release with integrated RealWear headset, frontline workers and remote experts can annotate images for even more powerful troubleshooting and real-time collaboration. The hands-free experience is improved with remote controls to zoom camera, control volume, or use a ‘flashlight’ to better see what matters.

Mechanic working hands free using controls from an eye device attached to his hat and leaning over under the hood of a car

CONTROL HUB & SECURITY

Workspace metrics in Webex Control Hub

Webex Control Hub workspace tab provides IT much-needed visibility for conference room devices, utilization, and more. Now we’re expanding insights to include visualized data and trending: providing insight in average room occupancy, booked but not used rooms, environmental conditions for productive work, and more. These insights offer actionable information to optimize real estate utilization and proactively address issues that may impact your next meeting.

Workpace Metric in Control Hub

Introducing the Organization Health Assessment Tool for IT

Let Webex Control Hub be your coach and help you take advantage of best practices, reduce turn-around times for setup, understand usage and adoption challenges, pinpoint experience challenges with their deployment and more. This online tool provides configuration insights to set up Webex with easy to understand action items and interactive walkthroughs.

configuration tool within COontrol Hub

For more information on all of these features and upcoming update to Webex, please visit the What’s New Articles for Webex Services.

Learn More

Webex AI Innovations Enable Your Team to Safely Return to the Office and Be Productive

Explore the New Webex In-Meeting Experience (Webex 40.9)

Still Need Help?

Join a Webex online meeting

Learn more about Webex, join one of our upcoming training sessions

Explore daily product demonstrations

Sign up for Webex

Visit our home page or contact us directly for assistance.

Read more
Couples with smartphones, tablets and laptops chatting online, during coronavirus self isolation, quarantine. Virtual dating during COVID-19 outbreak. Vector illustration
10 lessons that helped scale Webex during a global crisis

Panos Kozanian –  As every IT department around the globe are executing business continuity plans, Webex’s criticality to customers and its utilization soared.

Webex is part of the business continuity plan of over 95% of all Fortune 500 companies. COVID-19 changed our world forever, and working from home (WFH) became a necessity as sheltering in place became the new norm overnight. As every IT department around the globe executed their business continuity plans, Webex’s criticality to our customers and its utilization soared.

regional concurrent peak attendee
The regional concurrent peak attendee counts determine the amount of capacity that needs to be provisioned.

I’ll be sharing the story and some of the lessons learned as we scaled Webex to meet the new demand both on the technical and the process side.

Here’s a TL;DR of the lessons learned:

  1. Observe the world
  2. Have a worst-case scenario plan
  3. Have burst capacity plans to alternate clouds
  4. Have flexibility on resource intensive workflows
  5. Have a common unit for scale
  6. Decentralize decision making but keep a centralized Control Tower
  7. Proactively communicate with your customers
  8. Shield the scaling team
  9. Leverage your service providers early
  10. Promote and be an example of self-care

Before I dive into the story and the lessons learned, I want to recognize two things:

First, at Webex, we are most grateful for the people who are helping on the front lines without the luxury of being protected by the OSI layers of networking.

Second, we pride ourselves on being an extension of our customers’ IT departments. We recognize that the story of scaling Webex is a portion of the journey our customers went through. Our IT counterparts in each of our customers’ organizations deserve an incredible amount of recognition for effectively executing on some of the most challenging business continuity plans we will face in our lifetime. We at Webex are grateful to have been in a position to assist our customers, schools, hospitals, and governments around the globe in this unprecedented world event.

Early February | Tremors

Webex runs the largest global network and data center backbone dedicated to real-time collaboration. In our 24/7 Network Operation Center (NOC) we observe global events regularly: typhoons, landslides, earthquakes, and internet route disruptions and congestions are all events that we observe and react to regularly.

Observe the World

On February 3rd our network monitoring alerts were triggered by a drastic increase in traffic from our multi-national customers in China that connect via our global network along with our China-based local customer that connects through a dedicated point of presence in China – physically separate from the rest of Webex. We eventually handled a network increase of 22x over our January baselines for that region, due to the shelter-in-place orders of our customers’ employees in China who were connecting to our Webex services.

Network traffic throughput over time coming from China into the Webex global backbone.
Network traffic throughput over time coming from China into the Webex global backbone.

At this point, our NOC’s assessment was that we had at the very least an epidemic and possibly a pandemic that could affect the broader APAC regions and we started to mobilize compute and network capacity increases in the region.

Late February | Scenario Planning

The penultimate week of February our Site Reliability Engineers (SREs) were observing an unexpected increase in all regions in our Year-over-Year comparison graphs. This is when we knew we had a global event coming our way and put a team together dedicated to scenario planning for different possible outcomes based on data from recent epidemics and pandemics.

Have a worst-case scenario plan

Webex is part of the business continuity plan of over 95% of all Fortune 500 companies. We, therefore, had three scenarios planned out by our teams. One that estimated a 130% increase in peak utilization if the pandemic was fairly well contained, another that had a 150% increase in case a massive spread would happen, and a third that had a 200% increase as the “worst-case” we could imagine at that time.  In retrospect, we were underestimating tremendously and being misled by how recent epidemics and pandemics (2009 swine flu) had played out. Despite it being an underestimate, we’re glad we started executing on scenario 3 early – the worst-case scenario.

Have a common unit for scale

Capacity increases are a common task at Webex, but the scale at which this one would come our way required coordination across a very wide range of teams that were each making multiple optimizations a day and looking at capacity through different lenses: CPU utilization for compute, Gbps for network, TB for storage, QPS for databases, etc… Each of these was converted to a common metric that was relatable to our engineers and our customers: “Peak Attendees”. Conversions to Peak Attendees were used to quickly identify where bottlenecks might show up in our aggregated models across our global data center footprint.

Have flexibility on resource-intensive workflows

Scenario 3’s mitigations required an aggressive timeline. Webex, in this scenario, could temporarily run out of capacity in a given region. The plan included leveraging our global footprint, accelerated through our backbone, to deliver services to specific regions while other regions were sleeping.

Armed with plans for these three potential scenarios, we started executing on the mitigations of Scenario 3 – a 200% global increase – thinking that we were executing against what seemed like a worst-case scenario.

Early March | State of Emergency

By March 2nd, we were deploying all capacity we had on hand across the globe and augmenting our backbones for a 200% increase. We also started provisioning burst capacity in public clouds to rapidly extend our global backbone.

To assist the world in transitioning to shelter in place, we also reduced restrictions that exist in our free offering (which is isolated from our paying customers’ services) to assist enterprises, schools, hospitals, and governments in moving to shelter in place: we removed all time restrictions, we allowed up to 100 participants per meeting and even provided phone dial-ins globally.

Proactively communicate with your customers

In early March, we started reaching out proactively to customers letting them know what we were seeing and that we had our entire company’s backing to support them. Our account teams backed by our executives offered assistance to our customers in transitioning to an all work from home employee base with documentation, tutorials, and training. These early proactive communications were the first of many to come, where our account teams worked closely with our customers to assist in the massive work from home transitions.

Have burst capacity plans to alternate clouds

Webex runs our own cloud and global backbone: 16 data centers connected by a backbone that handles 1.5Tbps at peak. Our backbone further connects 5 different geographies directly to public cloud providers. In early March, we extended our services quickly into different public clouds in order to reserve capacity. This served us incredibly well as different cloud providers were themselves scaling to meet the demand coming their way.

By March 9th, most companies with flexible WFH policies had all of their employees work from home (including all of us Webex employees). We could also see in our capacity utilization that our scenario 3 was not going to be sufficient. We established a state of emergency which included getting resources across all of Cisco to assist so we could support our customers. Our own CEO, Chuck Robbins, joined our state of emergency bridge and gave us a single focus to tackle: “Do everything you need to do for our customer’s business continuity. You have the entire company behind you.”

Leverage your service providers early

Cisco is the number one manufacturer of networking equipment. We produce our own servers and have some of the best relationships in the industry with service providers. In the first week of March, we knew it was time to get all of the help we could across the company and our partners to support the business continuity plans of our customers and we were on a war footing to provide further capacity globally to Webex, led by our own CAP managers assisting in the coordination of 24/7 bridges with our vendors.

Decentralize decision making but keep a centralized Control Tower

Our COVID-19 scale-up was a unique type of incident management: its duration was nearly 100 days (from low to peak utilization), its scale was unprecedented (growing a mature business to 400%) and its impact broad (all parts of our Webex business were involved and needed coordination). Below is the process diagram that shows how we ran the efforts 24/7 for 100 days leveraging Webex Meetings itself for all the coordination.

Unified Incident Commander

Shield the scaling teams

The Change Commanders in the graph above were given full autonomy to do what they needed to sustain their area. They were also intentionally separated out of incidents and escalations by the Unified Incident Commander who was acting as a control tower for each of the scale and incident activities. This allowed the scaling team to stay focused on scaling the service while Incident Commanders were handling specific hot spots.

To give you a sense of what a 24h period on Webex looks like, you can see the graphs below. The only time of any reprieve on regional all-time high load increases was the 4h period between 22:00 and 02:00 UTC – when the sun is setting on the Americas and about to rise on APAC.

24h of load on Webex across regions
24h of load on Webex across regions

Late March | Education & Governments

By Monday, March 23rd nearly all companies, governments, and schools were sheltering in place. At this point, we had all of our processes and engagements in place, constrained only by the time-to-delivery of hardware. We could see countries around the globe lighting up as they used Webex for their business continuity plans. We also noticed a new wave of education customers around the globe who have their own discernible access patterns.

The unique access pattern of our education customers includes higher utilization of video joins, more utilization of recordings, meetings more concentrated in time, and high geographic density of participants. We quickly optimized network paths and expanded further into hyperlocal public clouds for these new education customers.

Early April | Getting ready for a second wave

The first week of April was the first time in 60 days where we saw less than double-digit growth week over week. In a twist of fate, security-aware governments and enterprises around the world noticed the security flaws with some of our competitors and some started a new migration to Webex. This meant we needed to prepare for a second wave of growth.

We had our new scaling machinery well-oiled by this point and leveraged the lessons learned from early March to accelerate our readiness for customers migrating to our secure platform. This included further scaling of our backbone, compute, storage, and public cloud extensions.

Late April | Second wave

The second half of April saw another increase of over 25% of our user base driven by a migration over from competitors to Webex. This increase was seamlessly achieved with our compute, storage, network, database, applications, and media scale-up teams able to scale to the new demand. Improved stability during this second wave is reflected in a decrease in customer-impacting incidents as shown by the graph below.

graph of Customer impacting incidents over time
Customer impacting incidents over time

The Webex process for ensuring service stability coped well, but with enormous growth and high rate of change there was some level of service disruption, predominantly in the critical month of March. In comparison, users of other comparable services experienced substantially more – up to 5x – outages throughout the Mar-May period and beyond.

May | The new normal and summer

The month of May landed us at 400% from our February baseline. This became our new plateau before a seasonal slowdown was driven by summer holidays and our education customers decreasing their number of meetings.

Promote and be an example of self-care

Under the Cisco-wide “Day for Me” program, we were given a day off on May 22nd which was for many of us on the Webex team the first day off in 100 straight days of work. It gave us the breather we needed, and it was a great way to celebrate our ability to handle the second wave of scale up gracefully.

Thank you

Our scale-up efforts were not as smooth in the early days of the shelter in place orders as we would have liked them to be. We hope, through the shared lessons above, that we can help the rest of our peer Site Reliability Engineers, Cloud Engineers, SaaS developers, and IT organizations learn from the processes and tools that helped us achieve what was arguably the biggest scale-up efforts most of us could have ever anticipated.

We also recognize that the story of scaling Webex is only a portion of the journey our customers went through and our IT counterparts in each of our customers’ organizations deserve an incredible amount of recognition for effectively executing on some of the most challenging business continuity plans we will face in our lifetime.

Finally, we at Webex are grateful to have been in a position to assist our customers, schools, hospitals, and governments around the globe in this unprecedented world event.

About the author

Panos Kozanian is a Director of Engineering responsible for the Webex Platform Organization. The Webex Platform Organization is responsible for all infrastructure assets: data centers, compute, storage and network and the PaaS layer that powers all Webex services, as well as collaboration services such as Common Identity, Control Hub and the Analytics Platform. Additionally, the Webex Platform Organization is responsible for reliability engineering, ensuring that Webex continues to be delivered with high availability and world class performance. Prior to this role, Panos led the Webex Teams Platform, establishing a modern DevOps & SRE culture supporting thousands of micro service instances and 1000+ developers. Throughout his career, he has held a number of leadership roles, including forming and leading Cisco’s Business Incubation lab, managing Cisco’s Digital Signage team, and leading Cisco’s Video Portal efforts. Panos joined Cisco in 2003 starting his career working on business incubations and executive demos. Panos earned a Bachelor of Science in computer engineering from Santa Clara University.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
African-american man talking looking at camera, black man vacancy candidate making video call for distance job interview, dark-skinned vlogger recording videoblog, view from webcam, headshot portrait
Best practices for acing a virtual interview

Video conferencing for interviews

Life lately hasn’t exactly felt like” business as usual” for job seekers and businesses looking to hire. Many companies had to shift to remote workforces overnight, closing their physical offices and spaces.

This situation creates obvious challenges for the interview process. Yet many companies have forged on with hiring plans, particularly those businesses experiencing demand spikes.

Employers have increasingly utilized virtual interview settings but now rely on them entirely. And it’s worth considering that video conferencing interviews could very well become the norm in a post-pandemic world.

With that in mind, let’s look at some of the best tips and strategies for acing your virtual interview and making the impression that you set out to make.

  1. Dress for the job, not the couch

A common virtual meeting hack that many remote workers utilize is wearing sweatpants or shorts with a business professional top. With many stuck indoors all day, it can seem pointless to dress up if you can get away with this new trick — but you’re putting yourself at risk if you do so for an interview.

If you shift out of frame or get up without remembering you have pajamas on, it would reflect very poorly on your character. Such slip-ups have been documented countless times on social media amid COVID-19.

Consider that, at the longest, you’ll be interviewing for an hour, two hours at max. Make the effort to choose a professional outfit, as it shows your interviewers that you’re serious and motivated about this opportunity.

  1. Check your internet connection and hardware beforehand

Technical difficulties often derail meetings — yet they can often be resolved more quickly or avoided altogether by checking technology and the internet connection ahead of time. Imagine the problems you’d encounter if your internet cut out in the middle of an interview.

To ensure that everything is working properly, give all everything a once over:

  • Check that you have a stable connection to the Wi-Fi. Maybe keep an Ethernet cord and adapter as a backup.
  • Test your laptop or desktop camera and microphone. Make sure there’s no echo and the picture is high resolution.
  • Become familiar with the video conferencing tool, especially if you haven’t used it before. Know all the features and commands.

Taking these precautionary steps can avoid an embarrassing situation down the line that may negatively impact your interview.

  1. Prepare like you would for any other interview

It might seem like the rules are a little relaxed for virtual interviews (which can be seen in the dress-up hack employed by many). However, that could not be further than the truth. While seasoned employees might have more informal meetings, you should treat every virtual interview as if you were going in-person.

This is not the time to cut corners. Put in the time and effort to research the business, maybe even look at the LinkedIn profiles of the interviewers you’re scheduled to meet with.

Also, take the time to prep yourself and how you’ll convey your skills, knowledge, and experience. The virtual format can be foreign for many, so perhaps take some time to practice to yourself in the mirror. It’s not a cheesy trick at all, and doing so may help you feel more comfortable and confident in your ability to articulate your strengths, your interest, and why you’d be a good fit.

  1. Lean on body language

It can be difficult to gauge whether you’re getting your point across in a virtual meeting. One way to add emphasis or otherwise convey your message is to leverage your body language. Facial expressions can be used to show you’re intently listening or talking with excitement. If you use your hands a lot when you speak in-person, don’t retract yourself in the virtual environment. It can help give interviewers a clue as to your personality and character.

The other side of this consideration is to keep an eye on any bad body language. Never, ever slouch when you’re on a virtual interview, for example. If you’re surprised or puzzled by a question, try not to let that show through.

  1. Get away from distractions

You should set up your interviewing space as far away from distractions as possible. Yet that can be a challenge when family members are roaming about or your significant other is also working at home and on a conference call.

Make sure that you position yourself away from televisions and other screens that could catch your eye or otherwise cause a distraction. Also, it’s worth talking to whoever you live with about your schedule. If they know you’re on an interview at a specific time, they can take steps to minimize noise or disruptions, like barging in on your space and asking a question.

  1. Follow up with a video message

It’s best practice to follow up after every interview. One way to differentiate yourself as a job seeker is to record a quick video message thanking the interviewers for the time and reiterating your personal value proposition and interest in the position. If you use this tactic, make sure to follow all the steps to ensure you’re recording a high-quality video that will deliver the impact you desire.

These strategies and tips can help you nail your next virtual job interview. And if you’re looking for a video conferencing tool to help power your job search effort, consider using Cisco Webex.

Try Cisco Webex free, today!

Learn More

Best practices for hosting virtual interviews

6 winning strategies for a video conference interview

Working smarter: An interview experience from a (super fun!) virtual team

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
It’s back to school time for you, and Webex

Here’s how we’re making the virtual and hybrid school year work better for you

September is a special time of year for kids. Summer comes to an end, and the excitement of summer activities is exchanged for the eager anticipation of starting a new school year with friends and classmates. This year, things certainly feel a bit different. While your kids may not be going back to the physical classroom, that doesn’t mean the excitement needs to evaporate!

A physical classroom doesn’t take away the need to buy new school supplies, the fresh smell of opening a textbook, or witnessing the twinkle of knowledge sparkling in your child’s eye. We have a solution to help make the transition a little easier.

This year, Webex is helping millions of students experience school virtually, and we’re very excited to announce a bunch of new updates and features that will make the education experience even better. Webex is already known as the most secure, most reliable video conferencing platform and now, these additional features are propelling us to the lead as the most effective platform for virtual education!

We’re excited to show you a number of exciting features that will make it even easier and more effective to use Webex at school with our Hybrid Environments for the workplace and the classroom:

Virtual classroom doors can be locked

Say good-bye to stranger-danger. Teachers can make sure no unwanted visitors crash their classrooms by allowing only signed-in students and guests to attend.

Breakout rooms for better learning

Break away from the larger class to launch smaller group sessions. This can help students feel more connected, and allow more students to vocalize and share their opinions or tackle group work.

Break out rooms UX image

Bring order to the classroom

Managing students in the classroom can be difficult, so maintaining the classrooms environment over video might seem impossible. The introduction of more teacher-oriented controls enables a better in-class experience.

Live class transcripts

The Webex Assistant can transcribe the class lecture and provide the post-class write-up, allowing students the chance to read through any missed information while providing further opportunity to let the learnings sink in.

Classroom connections

With Messaging integrated into Webex, classmates and teachers can share ideas and engage on any topic being taught in class. Learning goes beyond just what the teacher is teaching.

Messaging in Webex for education
Messaging integrated into Webex Classroom (Preview)

Read more about our announcements and get an in-depth view into the myriad of ways that Webex can create the best virtual and hybrid classroom experience.

Learn more hybrid learning tips from teachers, for teachers

Haven’t tried Webex? Try it for free today.

Additional Resources

Learn more hybrid learning tips from teachers, for teachers

The Future of Education

Webex Integration Partners join Cisco in offers for education

Experience the new Webex for Dedication – Simple and secure out-of-the-box

Blackboard Learn and Webex join forces to expand the reach of education

Education Resources

What is distance learning?

Welcome to virtual learning

Cisco Webex Education Connector

Cisco Education Home Page

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
Human face and mouth and sound waves - 3D illustration
Applied natural language processing— Using AI to build real products

Arushi Raghuvanshi – Hear an overview of key concepts for leveraging NLP in production applications like voice assistants, question answering, search, topic summarization, and more.

This is a companion blog post to my talk at the Women in Big Data event on August 20, 2020. The talk slides are available here

There are different challenges between academic or theoretical NLP and practical or applied NLP. There are quite a few online resources on the academic side, including published papers, lessons on AI and NLP theory and fundamentals, blog posts breaking down the latest and greatest models, etc. There is less information online about using all of this in practice, in real-world, customer-facing applications, which is what I’ll cover here. I will outline some key differences between academia and industry, introduce core NLP concepts that are applicable to a variety of applied use cases, go through best practices and tools for collecting data and building an application, and discuss how to securely improve deployed models over time.

Academia vs industry

The first key difference between academia and industry is data. The data available to build a production application may be very limited in quantity compared to standard research datasets. For example, SQuAD, a popular question-answering dataset, has over 100,000 questions, but developers may only have a couple hundred representative question-answer pairs to build a production system. Production data may also be noisier or have different characteristics than standard data sets. For example, it could contain a lot of domain-specific terms like product names. Because of this, pre-trained or out of the box models might not work well.

The second difference is the need for reliability and interpretability in user-facing applications. There has been a trend towards deep learning models that perform well on large amounts of data, but they may pick up on unintended or undesirable data trends because real-world data naturally has bias. For example, many companies have shown up in the news for accidentally building sexist or racially biased models. When building models in practice, it’s important to think about bias and design models that are easy to evaluate, consistent, and only rely on intended features.

Next, academic papers tend to focus on a single, well-defined component. While these components may work well individually, they often break down as part of a larger pipeline. For example, to build a voice assistant you may need a pipeline of speech recognition, natural language understanding, and question answering components. If the speech recognition is off, it makes it more difficult to understand what the user is asking, and even more difficult to answer their question or complete the task.

speech recognition icon and sound waves

In academia, accuracy tends to be the main metric that researchers are trying to improve on, but in practice, developers also care about usability and scalability. In production, device constraints, inference time, interaction design, and other factors play a role in the overall success of an application. Often, these factors that contribute to usability are more important than minor accuracy improvements, requiring a different approach to model optimization.

Finally, security is an important factor in real-world applications. AI is a data-driven field. More data leads to better models, but developers must be careful about keeping this data secure and not violating customer trust. There have been many recent news articles about data breaches and a general sentiment of users feeling like companies are spying on them or not respecting their privacy.

These are some of the fundamental differences between AI theory and practice. Next, I’ll share some best practices and tools to solve NLP problems for production systems.

Applied NLP overview

The solution to many of the problems outlined above is to break down complex problems into a series of models that can be evaluated well. Instead of training a single, deep, end-to-end, black box system, train multiple, simpler, more well-defined models. Since each of these models is solving a simpler problem, they require less data to achieve high accuracy and less compute power to train. It is also easier to evaluate each of these subcomponents quickly and thoroughly, which makes it easier to efficiently deploy fixes to issues like picking up on unintended data trends.

With that in mind, we’ve found that most practical applications of NLP can be modeled as the following four general categories, and more complex problems can be handled with a pipeline of these models:

Text classification

For all NLP models, the input is a piece of text. With text classification, given a query or piece of text, the model outputs a single label.

One example application of this type is sentiment analysis. Given some text, the model can output a label of positive, negative, or neutral. Another example is topic classification. Consider an application with pre-defined topics of weather, news, and sports. Given a user query, the model can output the appropriate label. Third, extractive summarization or highlight extraction can be modeled as a text classification problem. For each sentence in the text, the model can output a binary label of whether that sentence is a highlight (included in the summary) or not.

example of text classification using inputs and outputs
Examples of sentiment analysis, domain classification, and highlight extraction.

Some models that can be used for text classification are logistic regression, support vector machines, random forest, decision trees, and neural networks (of which there are many network architectures available).

Some features that can be used to train these models include:

  • N-grams, which are sequences of n words or tokens in the order they appear
  • Bag of words, which is a count of all the words in the text (without paying attention to the order)
  • Word shape or orthographic features that consider if there are capitalizations, punctuation, numerics, etc.
  • Length of the text
  • Gazetteers, which are indexes or dictionaries containing domain-specific vocabulary and their frequencies – the feature is whether words or phrases in the input text appear in the domain gazetteer
  • For NN models, the input can be a character, word, or sentence level embedding or vector representation

While it’s good to be aware of these models and features, there are many libraries, toolkits, and frameworks with implementations of these models and feature extractors. Most of the work you’ll do for AI in practice will be framing the problem and collecting data. The model itself will often be a few lines calling a library. When starting out, it’s more important to focus on collecting the right data and framing the problem than getting caught up in the details of the model implementations.

Sequence labeling

The next model category is sequence labeling. Now, given a piece of text as input, the model will output a label for every word or token in the text. One application of this is entity recognition, which is extracting key words or phrases and their labels. Another application is part of speech tagging.

Entity recognition with IOB tagging input and output
Example of entity recognition with IOB tagging

Models that can be used for sequence labeling include maximum entropy markov models (MEMM), conditional random fields (CRF), long short-term memory networks (LSTM), and more complex recurrent neural network architectures (bi-LSTM, bi-LSTM + CRF, etc.).

Good features to use are the same as for the text classification model described above.

Sequence to sequence

When given some text input, sequence to sequence models output a sequence of tokens of arbitrary length. Some applications for this are machine translation or natural language generation. MT requires a lot of data to get right, so in practice it is generally better to use a pre-trained model or one available through an API. NLG generally doesn’t reliably work well enough to be used in production. In practice, developers usually use rules-based or templated responses instead. Given that most developers aren’t training these models from scratch, I won’t go into architecture details on this type here.

Information retrieval

The last category is information retrieval. IR is the problem of retrieving a document from an index or database based on a search term. Some applications of this are Question Answering, Search, and Entity Resolution. For example, say someone wants to know which artist played the song Bohemian Rhapsody, and you have an index that contains songs and artist names. You can search that index with the song title Bohemian Rhapsody to get the document with the artist field populated as Queen.

Example of structured question answering for conversational interfaces
Example of structured question answering for conversational interfaces.

Note that this is more complicated than a simple database lookup because it incorporates fuzzy matching. Some relevant features that can be used to get optimal rankings include:

  • Exact matching
  • Matching on normalized text
  • N-grams for phrase matching
  • Character n-grams for partial word matching and misspellings
  • Deep embedding based semantic matching, leveraging models such as BERT, GloVe, or sentence transformers
  • Phonetic matching, which can directly use phonetic signals from the speech recognition model, or generate phonemes from the transcribed text using models such as double metaphone or grapheme to phoneme

Note that there are some areas of NLP that I didn’t cover. I didn’t touch on unsupervised models at all. But the majority of practical NLP applications can be modeled as one of these four categories, or for more complex problems, a combination of them.

Example application

To make this more concrete, let’s walk through an example application that uses the concepts we’ve discussed so far. More specifically, I’ll be giving you an example of building a food ordering conversational interface with the MindMeld platform. This is a complex problem, so it involves a pipeline of multiple models shown here:

Natural Language Processor and knowledge base

Let’s consider the example query “I’d like a hummus wrap and two chicken kebabs.”

  • The Domain Classifier is a text classification model that assigns an incoming query into one of a set of pre-defined buckets or domains. The given query would be labeled as the food ordering
  • Intent Classifiers are also text classification models that predict which of the domain’s intents is expressed in the request. In this case, an intent classifier could label the query as the build order
  • Entity Recognizers discern and label entities — the words and phrases that must be identified to understand and fulfill requests — with sequence labeling models. For our example query, this would extract hummus wrap and chicken kebabs as dish entities and two as a number
  • Entity Role Classifiers add another level of labeling a role when knowing an entity’s type is not enough to interpret it correctly. These are also text classification models. The number entity two can be further classified as the quantity role (to differentiate it from a size role, e.g. 16 drinks vs a 16 ounce drink).
  • An Entity Resolver maps each identified entity to a canonical value using Information Retrieval. For example, hummus wrap can be mapped to the closest canonical item of Veggie Hummus Wrap, ID:‘B01CUUBRZY’.
  • The Language Parser finds relationships between the extracted entities and groups them into a meaningful hierarchy using weighted rules. In this case, two and chicken kebabs would be grouped together.
  • The Question Answerer supports the creation of a knowledge base, which encompasses all of the important world knowledge for a given application use case. The question answerer then leverages the knowledge base to find answers, validate questions, and suggest alternatives in response to user queries. This is an Information Retrieval model. Since the user has not specified a restaurant name, the question answerer can be used to find restaurants that carry the requested dishes.
  • The Dialogue Manager analyzes each incoming request and assigns it to a dialogue state handler, which then executes the required logic and returns a response. This is a rule-based system. In this case, it would use a template to construct a response like “I found veggie hummus wrap and two chicken kebabs available at Med Wraps and Palmyra. Where would you like to order from?”
  • Finally, the Application Manager orchestrates the query workflow — in essence, directing the progress of the query between and within components.

MindMeld implements all of these models for you with some reasonable defaults. Once you’ve added your data, you can simply run the following in the command line to train all of these models and start testing them:

mindmeld blueprint

If you would like to further experiment with one of the models, let’s take an intent classifier for example, you can do so with the following syntax in python:

mindmeld components

To download the code and try it out yourself you can make a copy of this Google colab notebook and follow the commands. More information is available in the MindMeld documentation.

Now that you understand some fundamental NLP concepts and how to frame an NLP problem, the next step is to collect data.

Data collection

Before jumping into data collection, it’s always a good idea to check if there are any pre-trained models you can use. Hugging Face is a popular platform that has implementations of many state of the art models. CoreNLP, spaCy, and NLTK are platforms that have implementations of many NLP fundamentals, such as named entity recognition, part of speech tagging, etc. And you can always do a simple Google search to look for additional models. Even if these pre-trained models don’t perfectly fit your use case, they can still be useful for fine tuning or as features.

Example of using pre-trained sentence transformers found via Hugging Face
Example of using pre-trained sentence transformers found via Hugging Face
Example of using pre-trained Named Entity Recognition from spaCY
Example of using pre-trained Named Entity Recognition from spaCY

If you are training a model, first check to see if there are any existing datasets available. There are a lot of open-source datasets that can be used as a starting point. There may be data within your organization that you may want to use. Or you might be able to scrape or compile data from a website or publicly available API.

While it’s good to check for existing models and data, don’t hesitate to build a new dataset if one doesn’t already exist that accurately represents your use case. Representative data is essential to building a high-quality application. Crowdsourcing tools can be useful for generating initial data.

Example platforms for crowdsourcing data collection.
Example platforms for crowdsourcing data collection.

When leveraging crowdsourcing tools, it’s important to define your task well. If the task description is too specific, you will get lots of very similar looking data, but if it’s too general, a lot of the data may be irrelevant or not useful. To strike the right balance, iterate. Work in small batches, see how the results look, and update your task description accordingly.

Some data collection platforms help match you with workers who are trained in your specific use case, which is really useful if you want clean, consistent data. For cases where you want more variation or generally want to see how the public responds to certain prompts, it may be better to go with tools that anyone can contribute to. You can also do things like target specific geographic areas to get a variation in slang and regional language that people might use.

Whatever approach you take, consider implementing validation checks to automatically discard any excessively noisy or irrelevant data. You can target workers with better ratings to help reduce noise, but even then, you should implement some automated validation like checking length, removing whitespaces, and making sure at least some words appear in the relevant language dictionary.

In addition to collecting the text itself, remember that we want to collect labels for our models. It’s incredibly important for these labels to be clean, because without clean data our models can’t learn. If you use crowdsourcing tools or data teams for this, you should give contributors some training and evaluation before they start labeling. You can have multiple people label the same queries, and only accept labels with a certain level of agreement. Once you have an initial model, you can help speed up labeling time by using model predictions to bootstrap labels. This transforms the label generation task into a verification task, which is generally faster and easier.

Finally, if you don’t have any other resources, you can create and label your data yourself, in house. This can be a great way to bootstrap an initial model. It gets you to think more closely about the data you are trying to collect, and you can add data over time from user logs or other sources as resources become available.

Toolkits and frameworks

Once you’ve framed your problem and collected data, the next step is to train your model. Scikit-learn is a popular toolkit for classic models that we talked about like logistic regression, support vector machines, and random forest.

linear regression

For neural networks, you can use libraries like PyTorch or Tensorflow. Here’s a great tutorial on using a PyTorch LSTM for part of speech tagging, and here’s one for Tensorflow.

Some more NLP specific toolkits are CoreNLP, NLTK, spaCy, and Hugging Face. I mentioned these toolkits before in the context of pre-trained models, but they are also very useful as feature extractors. These toolkits can be used to generate features from text, like n-grams and bag of words. These feature vectors can then be fed into models implemented via, for example, scikit-learn.

generating ngrams

For more complex problems involving multiple NLP components, namely conversational interfaces, you can use a variety of platforms  including MindMeld, Dialogflow, Amazon Lex, wit.ai, RASA, and Microsoft LUIS. These platforms have a lot of preset defaults for feature extractors and models and have the whole pipeline set up, so all you have to do is provide your data and implement any custom logic. Even if you’re not building a full conversational interface, these platforms can be really useful for their subcomponents, like question answering or custom entity extraction.

Finally, there are tools on the infrastructure side that can be particularly useful for AI. Elasticsearch is useful because it is not only a database, but also a full-text search engine with a lot of IR capabilities built-in. AWS, Google compute engine, and other similar platforms are great for cloud compute to train heavier models efficiently. Kubernates is a platform for easy deployment and scaling of your systems. And DVC is a tool for data versioning, so that if you have multiple people training models, they can be synchronized on the data they are using.

Improving models in a secure way

The key to intelligent ML systems is to improve them over time. All of the leaders in the AI space have become so by leveraging usage and behavior data from real users to continually improve their models. As an organization, it is essential to do this in a secure way.

The most important thing to start with is communication. It is important to clearly communicate if any user data will be stored, how long it will be stored for, who will be able to access it, and what it will be used for. Even if you are abiding by data policies, if users are unaware of these agreements, it may come across as ‘spying.’ This communication can be done at onboarding, with user agreements, through an FAQ section of a website, via a published white paper, or any other accessible location.

In order to define these data policies, some things to think about include what data needs to be stored to improve your system. Can you store only some extracted trends or metadata, or do you need to keep the full raw logs? You should only store what is absolutely necessary to add value to the end user and always remove any extra sensitive or personally identifiable information. Think about how long this data will be stored. Will it be deleted after a set amount of time, say one year, or is it crucial to store it indefinitely until the user requests it to be deleted? Who will be able to access the data? If the data is never read or inspected by humans, people may be more comfortable with their data being used. If that is not possible, it is good to make the data available only to a small team of analysts who have a high level of data security training. Finally, what will the data be used for? If it provides value to the end user, they are more likely to allow you to use it. When possible, it is beneficial to provide useful reports to end users or customers and measurable accuracy improvements on models.

Once you’ve defined a data policy, you need to build a secure data pipeline that can enforce this policy.

example data pipeline
Example data pipeline. User queries and model outputs are stored in a secure temporary cache until they can be processed and saved in a more permanent data store with relevant access permissions.

For example, you need to keep track of information like from which user each piece of data came from, so you can delete it if they ask for it to be removed. The platform needs to be able to enforce permissions, so only authorized individuals are able to access data stores. You can also build models to remove sensitive information. For example, if you don’t need to store person names and those exist in your data, you can use an entity recognition model to recognize those person names and replace them with a generic token.

Once you have data, an efficient way to improve models is with Active Learning. In production, raw data is cheap, but labeling data is not. We can use model uncertainty to select which queries to label first to improve models quickly.

model performance vs queries added

To help do active learning on a regular basis, you can build out a semi-automated pipeline that selects logs from the data store, bootstraps annotations, which can be verified by a human labeler, and checks to see if the accuracy increases with the new data. If it does, the new model can be deployed, and if not, the data can be sent to the developer team for further inspection and model experimentation. In addition to increasing the training set with this pipeline, it’s good to add to the test set. For the test set, it’s better to randomly select queries to get an accurate distribution of user behavior.

You can further speed up this pipeline by using auto labeling. Tools like snorkel enable labeling data automatically, with an algorithm or model, rather than manually with a human labeler. The auto labeling system can abstain from labeling queries for which there is low confidence. These can be sent to human labelers or ignored. Either way, it allows for some model improvement without a human-in-the-loop, which is beneficial for security reasons and time or resource constraints.

About the author

Arushi Raghuvanshi is a Senior Machine Learning Engineer at Cisco through the acquisition of MindMeld, where she builds production level conversational interfaces. She has developed instrumental components of the core Natural Language Processing platform, drives the effort on active learning to improve models in production, and is leading new initiatives such as speaker identification. Prior to MindMeld, Arushi earned her Master’s degree in Computer Science with an Artificial Intelligence specialization from Stanford University. She also holds a Bachelor’s degree from Stanford in Computer Science with a secondary degree in Electrical Engineering. Her prior industry experience includes time working at Microsoft, Intel, Jaunt VR, and founding a startup backed by Pear Ventures and Lightspeed Ventures. Arushi has publications in leading conferences including EMNLP, IEEE WCCI, and IEEE ISMVL.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
media concept smart speaker
Robust NLP for voice assistants

Karthik Raghunathan – How to understand your users despite your Automatic Speech Recognition (ASR)’s bad hearing.

This is a companion blog post to my talk at the L3-AI conference on June 18th, 2020. The talk slides are available here. The talk recording is here.

NLP machines
Image credits: Bryce Durbin / TechCrunch

The MindMeld Conversational AI Platform has been used by developers to build text-based chatbots as well as voice assistants. While text-based chatbots certainly have their place and utility in today’s world, voice interfaces are a lot more intuitive and natural when they work well.

It’s been encouraging to see the general population become more comfortable with voice assistants in recent years. An early 2020 survey by Voicebot found that more than a third of US households now have a voice-enabled smart speaker.

Map of the United States from Voicebot showing the graph of US households having voice-enabled smart speaker

Another survey found that 35% of the US population are regular voice assistant users.

A graph of UC Voice Assistant users and penetration

These numbers are expected to grow even faster in this era as users start preferring touch-free interfaces. This presents a great opportunity for developers of voice user interfaces everywhere. However, anyone who’s worked on one of these systems knows that it’s no easy feat to build a production-quality voice assistant that delights users.

Several active research areas in natural language processing explore more complex and deeper neural network architectures for conversational natural language understanding, natural language generation, and dialog state tracking. But all of that great work can still get undermined by the simple fact that voice assistants often suffer from bad hearing. In real life, even simple voice commands get easily misunderstood because the assistant didn’t hear you clearly.

Alexa Voice Fails

In more technical terms, this means that the accuracy of your Automatic Speech Recognition (ASR) system has a huge impact on the overall quality of your voice assistant. This ends up being the Achilles’ Heel for most voice assistants, and if you want to see a significant improvement in user experience, focusing your efforts here will give you the most bang for your buck.

Challenges with speech recognition

Modern voice assistants are built using a complex pipeline of AI technology. At a high level, three steps are common to all voice user interfaces:

 

Source: Nvidia
  1. First, we use Automatic Speech Recognition to convert the user’s speech to text. Since building your own ASR requires prohibitively high amounts of data and resources, it’s common for developers to use an off-the-shelf cloud service like Google Cloud Speech-to-Text, Azure Speech to Text, or Amazon Transcribe.
  2. We then use Natural Language Processing to understand the transcribed text, take any appropriate actions, and formulate a text response. This can be accomplished with a platform like MindMeld that encompasses functionality for natural language understanding, dialogue management, and natural language generation.
  3. Lastly, we use Text To Speech to synthesize human-like speech for the generated text response to be “spoken” back to the user. This is commonly done using cloud services like Google Cloud Text-to-Speech, Azure Text-to-Speech, or Amazon Polly.

Since ASR is the first component in this pipeline, errors introduced at this step cascade to downstream components, causing them to make errors as well. You can use all the transformers you want in your NLP system, but if the input is garbage, you’ll still get garbage out.

In the last five years, there have been many headlines like these which may lead one to believe that ASR is an already solved problem:

Microsoft reaches ‘human parity’ with new speech recognition system

The system’s word error rate is reported to be 5.9 percent, which Microsoft says is “about equal” to professional transcriptionists asked to work on speech taken from the same Switchboard corpus of conversations.

www.theverge.com

Google’s speech recognition technology now has a 4.9% word error rate

Google CEO Sundar Pichai today announced that the company’s speech recognition technology has now achieved a 4.9 percent word error rate.

venturebeat.com

While we’ve undoubtedly made large strides in speech recognition accuracy over the last decade, it’s far from being a solved problem in the real world. In many of our production applications, we see word error rates (the metric by which ASR quality is measured) to be far higher than the ~5% numbers reported on well-studied academic datasets. Off-the-shelf ASR services like those from Microsoft, Google, or Amazon still make many mistakes on proper nouns and domain-specific terminology. When deployed in the real world, these errors are further exacerbated when dealing with users with diverse accents or non-ideal acoustic environments.

Examples of ASR mistranscriptions in Webex Assistant

Below are a few examples of ASR mistranscriptions we’ve seen in Webex Assistant, our MindMeld-powered voice assistant for enterprise collaboration.

ASR mistranscriptions

As you can see, the ASR often confuses proper nouns with common English words (e.g., Prakash’s vs. precautious or Mahojwal vs. my jaw). On other occasions, it mistakes one named entity for another (e.g., Kiran vs. Corrine or Didi vs. Stevie). There are also cases where it fuses named entities with surrounding words (e.g., Merriweather instead of me with Heather). Any of these mistakes would lead the assistant to take an unsatisfactory action since the primary entity of interest has been lost in the ASR output.

Clearly, we need to overcome these kinds of errors to understand the user correctly. But before we look at potential solutions, it’s worth emphasizing two things.

First, we’ll assume that the ASR we’re using is an off-the-shelf black box system that we can’t modify and have to use as is. This is a reasonable assumption because most popular cloud ASR services provide very little room for customization. However, we will assume that the ASR provides a ranked list of alternate hypotheses and not just its most confident transcript. This is something that all major cloud ASR services can do today.

major cloud ASR services

Note that the techniques covered below will be useful even if you had the luxury of using your own highly customized domain-specific ASR models. That’s because no ASR is ever going to be perfect, and having robustness mechanisms built into your NLP pipeline is always a good idea. The assumption about an off-the-shelf black box ASR is more to restrict the scope of the discussion here to the most common scenario that developers find themselves in.

Second, when talking about the NLP stack for a voice assistant, different implementations might involve different steps as part of the full pipeline. In this post, we’ll only focus on the three main steps common to all modern conversational AI platforms: intent classification, entity recognition, and entity resolution.

Next, we’ll look at three different techniques we’ve used in MindMeld applications to make our NLP pipeline more resilient to ASR errors.

1. ASR n-best reranking

The first technique, called n-best rescoring or reranking, applies application-specific domain knowledge to bias and possibly correct the ASR output.

While this description doesn’t do justice to all the complexities of a modern ASR system, at a conceptual level, it’s still useful to think of an ASR as having three separate stages:

Automatic Speech Recognition

First, the feature extractor extracts some useful audio features from the input speech signal. The acoustic model then maps those extracted features to phonemes representing the distinct sounds in the language. Finally, the language model takes that sequence of phonemes and transforms it into a sequence of words, thereby forming a full sentence. Like other probabilistic systems, ASR systems can output not just their best guess but also an n-best list of ranked alternate hypotheses.

The language model (LM) has a huge impact on how the audio finally gets transcribed. The LM is essentially a statistical model that predicts the most likely word to follow a given sequence of words. Conversely, it can also be used to score any arbitrary sequence of words and provide a probability measure for that word sequence.

The key thing to note here is that the LM used by an off-the-shelf cloud ASR service is a generic domain-agnostic model that may work well for web searches or general dictation tasks, but may not be best suited for recognizing the kind of language your users might use when conversing with your assistant. This is why these ASR systems often mistranscribe a domain-specific named entity as some other popular term on the web, or simply as a common English phrase. Unfortunately, in most cases, we cannot change or customize the LM used by a black-box ASR service. Therefore, we train our own separate domain-aware LM and use it to pick the best candidate from the different hypotheses in the ASR’s n-best list.

To train our in-domain language model, we need a large corpus of sentences that reflects the kinds of things our users would say to our voice assistant. Luckily, we should already have a dataset of this kind that we use to train our intent and entity detection models in our NLP pipeline. That same data (with some augmentation, if needed) can be repurposed for training the LM. There are many free and open-source language modeling toolkits available, and depending on your corpus size, you can either pick a traditional n-gram-based model or a neural net-based one. In our experience, n-gram LMs trained using the KenLM or SRILM toolkits worked well in practice.

Once we have a trained in-domain LM, we can use it to rescore and rerank the ASR n-best list such that candidates with language patterns similar to those found in our training data are ranked higher. The post-reranking top candidate is treated as the corrected ASR output and used for further downstream processing by our NLP pipeline.

NLP Pipeline

The above figure shows this technique in action in Webex Assistant. The original ASR output was trying marijuana’s emr, but after n-best reranking, the corrected output is join maria joana’s pmr, which seems more likely as something a user would say to our voice assistant. The ASR’s LM would have preferred a different top hypothesis originally because trying marijuana is a very popular n-gram on the web, and EMR, which stands for “electronic medical record” is a more popular term in general than PMR (“personal meeting room”), which only makes sense in an online meeting scenario. But our in-domain LM can pick the right candidate because it would assign higher probabilities to words like join, PMR, and possibly even Maria Joana if we had that name in our training data.

The advantage of this approach is that it isn’t directed at improving any one specific downstream task, but the entire NLP pipeline can benefit from getting to deal with a much cleaner input. This would help with improved accuracy for intent and entity classification as well as entity resolution.

natural language processor

The disadvantage is that this approach introduces one other new model to your overall pipeline that you now have to optimize and maintain in production. There’s also a small latency cost to introducing this additional processing step between your ASR and NLP. Even if you can make all those logistics work, there’s still a limitation to this approach that it cannot make any novel corrections but only choose from the n-best hypotheses provided by the ASR. So there’s a good chance that you’ll need other robustness mechanisms further down the NLP pipeline.

2. Training NLP models with noisy data

The next technique is a really simple one. NLP models are usually trained using clean data, i.e., user query examples that do not have any errors. The idea behind this technique is to spice up our labeled data with some noise so that the training data more closely resembles what the NLP models will encounter at run time. We do this by augmenting our training datasets with queries that contain commonly observed ASR errors.

Training data for intent and entity models augmented with queries containing common ASR errors (in blue)

Intent Classification

Let’s again take the example of Webex Assistant. The intent classification training data for our assistant might have query examples like join meeting, join the meeting, start the meeting, and other similar expressions labeled as the join_meeting intent. Now, if the production application logs show that join the meeting often gets mistranscribed as shine the meeting, or start the meeting often gets confused as shark the meeting, we label those erroneous transcripts as join_meeting as well and add them to our intent classification training data.

We follow a similar approach with our entity recognition model, where we add mistranscriptions like cool tim turtle or video call with dennis toy to our training data and mark the misrecognized entity text (tim turtle, dennis toy, etc.) with the person_name entity label.

If executed correctly, this approach works out really well in practice and improves the real-world accuracy of both the intent classification and entity recognition models. One could argue that you shouldn’t pollute your training data this way, and your model should learn to generalize without resorting to these kinds of tricks. There’s some merit to that argument. You should definitely start with just clean data and experiment with different features and models to see how far you can get. For example, using character-level features like character n-grams or embeddings can make your intent classifier more robust to minor errors like join vs. joint, and a well-trained entity recognizer should be able to recognize benny would as a name (in call benny would now) by relying on the surrounding context words even if the word would is mistranscribed. But there will always be ASR errors that our NLP models won’t be able to handle, and data augmentation of this kind is an effective way to help the model learn better.

Of course, you need to be careful not to go overboard with this approach. If you were to throw in every single way in which an ASR mistranscribes your user queries, that would probably confuse the model more than it would help it. So what we do is only add examples with ASR errors that are really common in our logs. We also only include near-misses where the transcription is slightly off, and don’t include cases where the ASR output has been garbled beyond recognition. Lastly, you need to ensure that you don’t provide conflicting evidence to your NLP models in this process. For instance, the ASR may sometimes misrecognize start the meeting as stop the meeting, but you shouldn’t label stop the meeting as an example for the join_meeting intent. That would introduce a confusion between the join_meeting intent and the end_meeting intent where that example should rightfully belong.

This technique was mainly about improving our intent and entity detection models. But we’ll now turn our focus to entity resolution.

3. ASR-robust entity resolution

Entity resolution, or entity linking, is the task of mapping a detected entity in the user query to a canonical entry in a knowledge base.

entity resolution

In the above example, the person name entity sheryl is resolved to a concrete entity Sheryl Lee who’s a specific employee in the company directory. It’s this resolution step that allows us to correctly fulfill the user’s intent because we now know the right employee to initiate the video call with.

Entity resolution is often modeled as an information retrieval problem. For instance, you can create a knowledge base by using a full-text search engine like Elasticsearch to index all the canonical entities relevant to your application. Then at runtime, you can execute a search query against this knowledge base with the detected entity text and get back a ranked list of matching results.

search acuracy

To improve the search accuracy, and thereby the entity resolution accuracy, there are several features we can experiment with.

Fuzzy Matching

We can encourage partial or fuzzy matching by using features like normalized tokens, character n-grams, word n-grams, and edge n-grams. We can also do simple semantic matching by using a mapping of domain-specific entity synonyms or aliases. Textual similarity features like these are useful for any kind of conversational application regardless of the input modality. But next, we’ll specifically look at additional features that make the entity resolver for a voice assistant more robust to ASR errors.

Phonetic similarity

First, we introduce phonetic similarity because textual similarity alone isn’t enough to deal with ASR errors. For example, when Kiran Prakash’s gets mistranscribed as Corrine precautious, relying purely on text similarity might not help us make the correct match because, at a textual level, these phrases are pretty far apart from each other. But since they sound similar, they should be fairly close in the phonetic space.

One way to encode text into a phonetic representation is by using the double metaphone algorithm. It’s a rule-based algorithm that maps a given word to a phonetic code such that similar sounding words have similar encodings. For words with multiple pronunciations, it provides a primary and a secondary code encoding the two most popular ways to pronounce the word. For example, the name Smith has the double metaphone codes SM0 and XMT, whereas the name Schmidt is represented by the codes XMT and SMT. The similar representations indicate that these two names are phonetically very close.

A more recent approach is to use a machine-learned grapheme-to-phoneme model that generates a sequence of phonemes for a given piece of text. Using this method, Smith is represented by the phoneme sequence S M IH1 TH, whereas Schmidt is represented as SH M IH1 T. Similar sounding words have similar phoneme sequences, and the detailed representations also make it easier to compute the phonetic similarity between words at a more granular level.

In our experiments, we found that these two methods often complement each other. Hence, we use phonetic features derived from both to improve our search.

Leveraging the ASR n-best list

One other technique that helps us significantly improve our search recall is leveraging the entire n-best list of hypotheses from the ASR, rather than just its top transcript. We run entity recognition on all the hypotheses and send all of the detected entities in our search query to the knowledge base.

Leveraging the ASR n-best list

On many occasions, the correct entity might even be present a little deeper in the n-best list, like in the above example where the correct name Sheetal was part of the ASR’s third-best guess. Even when that is not the case, pooling the various text and phonetic features across all the hypotheses has the effect of upweighting features which have more consistent evidence throughout the n-best list and downweighting outliers, thereby resulting in a much better overall match.

User-based personalization

The last thing we’ll discuss is using personalization features to improve entity resolution. User-based personalization is something that search engines use to better cater their search results to each user. Similar techniques can help us resolve entities more accurately by leveraging prior information about the user, such as which entities a particular user is more likely to talk about. This is useful for any kind of conversational application, but can especially have a huge impact for voice assistants where there is a larger potential for confusion due to similar-sounding words and ASR errors.

Personalization features tend to be application-specific and depend on the use case at hand. For example, for Webex Assistant, a major use case is being able to call other people in your company. Assuming that in general, you are more likely to call someone you are more familiar with, we can devise a personalization score, which is essentially a measure of a user’s familiarity with others in the company. In other words, for every user, we compute a familiarity score between that user and everyone else in the company directory. This familiarity score considers factors like how far the two people are in the company’s organizational hierarchy and how frequently they interact with each other via calls or online meetings.

familiarity score

We can then leverage this additional personalization score during ranking to help us disambiguate among similar-sounding names in the ASR hypotheses, and pick the right one.

This was just one example for a specific use case, but you can envision similar personalization features for different applications. For a food ordering assistant, you could have a list of restaurants or dishes that a particular user has favorited or ordered a lot recently. For a music discovery app, you can use a list of artists and albums that a particular user likes and listens to more often. And so on.

ASR robustness features in MindMeld

You can employ one or all of the above techniques when building a MindMeld-powered voice assistant:

  • We don’t have native support for building in-domain language models and using them for reranking n-best ASR hypotheses. But you can try this on your own by leveraging the LM toolkits mentioned above and include it as a preprocessing step before calling the MindMeld NLP pipeline. However, we would recommend starting with the other two techniques first since those can be achieved to an extent within MindMeld itself. Furthermore, they may reduce the need for having a separate n-best reranking step at the beginning.
  • Training the NLP models with noisy data merely involves adding query examples with ASR errors to your training data files and then using MindMeld to build your NLP models as usual. Just heed the warnings about not adding too much noise or confusability to your models.
  • There’s some out-of-the-box support for ASR-robust entity resolution in MindMeld, as described in our user guide. You can improve upon this by implementing personalized ranking techniques that are tailored to your specific application. For more details, read our 2019 EMNLP paper on entity resolution for noisy ASR transcripts.

It’s worth emphasizing that anyone who aspires to build a production-quality voice assistant must invest heavily in making their NLP models robust to ASR errors. This can often be the difference between an unusable product and one with a good user experience. MindMeld-powered assistants are extensively used in enterprise environments where tolerance for misunderstanding of voice commands is far lower than in a consumer setting. Robustness to ASR errors is always top-of-mind for us, and we’ll continue to share updates as we make more progress on this front.

About the author

Karthik Raghunathan is the Director of Machine Learning for Webex Intelligence, which is the team responsible for building machine learning-driven intelligent experiences across all of Cisco’s collaboration products. Karthik used to be the Director of Research at MindMeld, a leading AI company that powered conversational interfaces for some of the world’s largest retailers, media companies, government agencies, and automotive manufacturers. MindMeld was acquired by Cisco in May 2017. Karthik has more than 10 years of combined experience working at reputed academic and industry research labs on the problems of speech, natural language processing, and information retrieval. Prior to joining MindMeld, he was a Senior Scientist in the Microsoft AI & Research Group, where he worked on conversational interfaces such as the Cortana digital assistant and voice search on Bing and Xbox. Karthik holds an MS in Computer Science with Distinction in Research in Natural Language Processing from Stanford University. He was co-advised by professors Daniel Jurafsky and Christopher Manning, and his graduate research focused on the problems of Coreference Resolution, Spoken Dialogue Systems, and Statistical Machine Translation. Karthik is a co-inventor on two US patents and has publications in leading AI conferences such as EMNLP, SIGIR, and AAAI.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
Financial technology concept. FinTech. Foreign exchange.
Building a banking assistant with MindMeld

Abhi Sidhu & Ritvik Shrivastava – In this post, we’ll take a look at our newest blueprint — a conversational assistant for common personal banking use cases.

 

MindMeld blueprints come with a pre-configured application structure and pre-built set of code samples and datasets. In this post, we’ll take a look at our newest blueprint — a conversational assistant for common personal banking use cases.

MindMeld provides example applications for common conversational use cases, called MindMeld blueprints, that comes with a pre-configured application structure and pre-built set of code samples and datasets. A blueprint allows you to quickly build and test a fully working conversational app without writing code or collecting training data. If desired, you can then treat the blueprint app as a baseline for improvement and customization by adding data and logic specific to your business or application needs.

In this post, we’ll take a look at our newest blueprint — a conversational assistant for common personal banking use cases.

FinTech Rank
Image Credit: FinTechRanking

Motivation & considerations

Before diving into the details of development, let’s talk about some key contributing factors behind the idea of this app.

Why banking?

With the growing popularity of FinTech, major financial institutions are looking at smarter solutions for providing their services to clients — conversational IVR, or virtual assistants, is one of the prominent targets.

The MindMeld platform, widely used for developing robust assistant applications, is ideal for the same. This serves as motivation for our new Banking Assistant blueprint: a virtual bank teller that shows off some of our amazing functionalities.

Value of time in large enterprises

Virtual assistants are efficient in terms of time spent by employees. Targeting lower customer interaction times by reducing the human-hours spent on solving previously seen issues is one of the major benefits. Also, AI-powered solutions are data-driven and can be improved with time and continuous training. This requires less time than training and re-training employees for the same.

Data security

For enterprises like banks, customers’ personal data is extremely sensitive. The MindMeld platform offers a significant advantage over cloud-based conversational AI platforms by allowing for data storage entirely on an organization’s local servers. This makes it advantageous for enterprise applications that are concerned about data privacy and security as data is never shared.

Now that we have our motivation, let’s take a look at the development steps.

Building the application

The Banking Assistant allows users to securely access their banking information and complete tasks as if they’re conversing with a teller. Below are some sample conversations for common banking tasks:

Building the application
Sample conversations — Paying credit card bills and reporting stolen/misplaced card.

Design overview

As part of the NLP component of any MindMeld app, we define a set of key use case domains or more fine-grained intents. The Banking Assistant intents include:

  • Activating a credit card
  • Applying for a loan
  • Transferring money
  • Paying off a credit card bill
  • Activating AutoPay
  • Checking account balances

For the complete description of the app’s architecture and a detailed breakdown of domains, intents, and entities, visit our documentation and refer to the illustration below:

NLP Design Overview for Banking Assistant
NLP Design Overview for Banking Assistant

Challenges & functionalities

There are a few unique challenges to building a conversational app for a banking firm, which we overcome through some of the MindMeld’s impressive built-in functionalities.

  • Client authentication through MindMeld
    In our vision for a production application, the frontend would handle user authentication and pass an immutable user token to the MindMeld application layer. This would allow the application to make calls to the bank’s REST APIs to fetch and update the corresponding user’s stored information securely. To show this, we mimic the passing of user tokens of the sample users in our database. When operating the app, one can pass the token for a specific sample user to only access data of that user and avoid leaks and cross-viewing of incorrect or mis-intended information. Find more information on the current set of sample users here and browse the data directory to find the user JSON data file.
MindmeldBankApp
Slot-filling for Money Transferring Intent

 

  • Learning about MindMeld entity roles
    As mentioned earlier, the purpose of blueprints is to exhibit a ready-made app and to allow developers to learn about using the MindMeld platform. This app showcases some unique features mentioned above, as well as some finer details that are really useful. For example, the use of roles in entities. In our banking use cases, the ‘account type’ is a major entity, representing the users’ savings, checking, and credit accounts. While the entity is sufficient by itself, it might not be unique in use cases like money transferring, where two ‘account type’ entities are required. Defining a separate entity just for one use case is also not ideal. Hence we make use of entity roles. These roles represent the purpose for each use of an entity. Continuing with the same example, there will be two ‘account type’ entities for money transfers: one with the role ‘from account’ and the other ‘to account.’ The use of roles can be extended to a variety of use cases. In the case of a location entity in a travel app, the roles could be ‘departure’ and ‘arrival’ or ‘source’ and ‘destination.
  • Obtain missing information using slot-filling
    Intents like transferring money or checking account balance require some key information such as account type, account number and amount of money. It’s likely that the user doesn’t provide this information in a single query, and the banking assistant needs to prompt the user for it. Instead of creating a back-and-forth logic to fetch missing information, we make use of MindMeld’s recently released slot-filling or entity-filling feature. We define a slot-filling form for each use case and let the feature prompt the user for this information on our behalf. You can read more about this feature here. A sample conversation using slot-filling for the money transferring intent can be seen here.
  • Querying external storage with dialogue manager
    The current Banking Assistant architecture showcases MindMeld’s support for secure REST APIs by mimicking PUT and GET API calls to retrieve and update information from a local data file. This is done through the Dialogue Manager of the app. This allows for the secure exchange of data and gives users the freedom to connect their REST endpoints and easily expand upon the backend. With this support, it’s easy to modify the underlying data storage as requirements change over time, with minimal design modifications to the app itself. It also allows for updates to the user data through secure API calls.

Code snippets

To give a glimpse of both the dialogue management functionalities of the app and the slot-filling feature, here’s a snippet of a dialogue handler code. The logic in this function (check_balances_handler) is fairly simple as you are only expecting one entity — an account type for which the user is checking the balance for. If the account type entity is not specified by the user the slot filling logic will be invoked. You can find an example of a more complex handler function for the Banking Assistant here

Banking Account code

That covers a brief overview of our new Banking Assistant blueprint application! If you would like to try it out, you can find more information here. For help developing your own application, take a look at our documentation.

We welcome every active contribution to our platform. Check us out on GitHub, and send us any questions or suggestions at mindmeld@cisco.com.

About the authors

Ritvik Shrivastava is a Machine Learning Engineer at Cisco’s MindMeld Conversational AI team. He holds his MS in Computer Science at Columbia University, specializing in Machine Learning and Natural Language Processing.

Abhi Sidhu is a Software Engineer at Cisco who specializes in providing practical solutions to emerging technological problems. He holds a BS in Computer Science from Cal Poly San Luis Obispo.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
3 colleagues sitting around a conference room sharing a report
Are all meeting rooms equal?

Meeting rooms to help teams collaborate effectively

Work has shifted from cubicles to nearly anywhere you can get a Wi-Fi signal, signaling the need to reimagine what the new way to work will look like: a mix of an in-office and remote workforce. Distributed workers and in-office employees need a seamless collaboration experience. Technology will help drive this.

Last year, I traveled to a customer site to demo an early prototype of the Webex Room Phone. I met with the IT team in the conference room they typically used for team meetings. It had a BT/USB speaker and sharing hub, with two devices from different vendors. Unfortunately, their solutions did not have a viable way for me to present content and deliver my pitch to both in-person attendees and remote team members. This team was deploying and managing the larger videoconference spaces and board rooms, but they did not have a simple, high-quality solution to help their teams collaborate effectively.

Webex Room Phone— a seamless experience with Webex devices

This a common theme in many workplaces today. Larger conferences and boardrooms are fitted with elegant solutions, but there simply isn’t budget in smaller spaces. As a result, users cannot transition between spaces and collaborate effectively, and IT administrators cannot plan, deploy, and monitor the technology being used. We address these issues and give users the seamless experience they need with our Webex devices.

The Webex Room Phone is one such device. It helps support a modern workforce by providing:

  • Safe and distraction-free meetings with HD audio. The Room Phone utilizes speaker and mic technology that provides 360-degree coverage for 20’x20’ rooms (and 20’x32’ rooms with wired mics). This coverage, paired with echo cancellation and noise reduction capabilities, enables productive meetings with all team members, remote or not.
  • Easy ways to join a meeting. Do not worry about touching the device. Instead, join meetings through proximity with the Webex mobile or desktop app.
  • Simple management and robust analytics. IT admins can use Cisco Webex Control Hub—a single pane of glass for Webex management—to provision new devices, monitor usage, and troubleshoot issues.
  • So much more than just a conference phone. The Webex Room Phone connects to any HD display, so you can easily share content, view in-meeting participant information, and use digital signage capabilities when the device is not being used.

More than a conference phone

At Webex, we believe that less is more, which is why we’ve focused on packaging the Webex Room Phone in a simple, easily deployable way that provides a consistent experience with all other Webex devices (and allows you to scale this to every room you use). Designed for collaboration, touchless meetings, and an intelligent experience, the Webex Room Phone is so much more than just another conference phone—it’s a key element in the future of collaboration.

Sign up here to join the team behind Cisco’s latest Webex device: Webex Room Phone on Sept 14th to learn more on how it can help you scale your Webex device experience

flyer of webex room phone: helping you scale your WEbex device experience flyer with profile pictures of David Scott, Jessica Ruffin, Subbu Subramanian, and Anthony Nolasco

Learn More

How team collaboration technology can work together no matter where you are

Returning to work with Intelligent Room Capacity

Make your voice heard with the User Community Feedback Portal

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
A conversational AI and scheduling a meeting
Conversational AI at MindMeld

Karthik Raghunathan – Dive deeper into Cisco’s conversational AI platform for deep-domain voice interfaces and chatbots.

MindMeld and conversational AI

The MindMeld team is part of the Webex Intelligence team at Cisco, which develops machine learning-driven, intelligent experiences across Cisco Webex collaboration products.

Before our acquisition by Cisco in 2017, MindMeld was a San Francisco-based AI startup powering intelligent conversational interfaces for several companies in the food, retail, and media industries. Now, we’re bringing that same technology to Cisco’s products to make them smarter, more natural, and easier to use.

Webex Assistant and voice control

Last year we launched Webex Assistant, the first of its kind enterprise voice assistant for the meeting room. With Webex Assistant, customers can use their voice to control their Webex video conferencing devices, review the room’s calendar, join online meetings, call people in their company directory, and much more. While Webex Assistant had its origins as an intelligent assistant for the conference room, we greatly expanded its availability this year by bringing it to our widely popular Webex Meetings software. We also added support for a whole new set of in-meeting voice commands which allow users to create action items, take notes, and even set up future meetings, using just their voice.

Webex Assistant is powered by the MindMeld Conversational AI Platform. We developed this Python-based machine learning framework as a startup, and continue to maintain and improve upon it at Cisco. Teams across Cisco use the MindMeld platform for a wide variety of natural language applications such as chatbots, interactive voice response (IVR) systems, automated FAQ answering, and search. For instance, MindMeld is used for query parsing in both Cisco’s internal enterprise search and the external-facing website search on Cisco.com.

Open-sourced MindMeld conversational AI platform

Following the release of Webex Assistant, we open-sourced the MindMeld Conversational AI Platform. While it’s particularly easy to use MindMeld with other Cisco technologies like Webex, the platform itself is agnostic and can be used to build any kind of conversational interface. As a result, it is now used not only by internal teams at Cisco but also by the wider developer community to build production-quality chatbots and voice assistants.

Given its ease of use and flexibility, MindMeld has been a popular choice at several hackathons, including Cisco’s Smart Spaces Hackathon and the Government of India’s Smart India Hackathon. It was even featured in the winning team’s solution in the 2019 IoT World Hackathon.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Learn More

To learn more about the MindMeld Conversational AI Platform, check out our website and GitHub repository.

You can also follow the MindMeld team’s blog on Medium where we share regular updates about new MindMeld features, best practices for building conversational interfaces, and other snippets from our ongoing research explorations.

About the author

Karthik Raghunathan is the Director of Machine Learning for Webex Intelligence, which is the team responsible for building machine learning-driven intelligent experiences across all of Cisco’s collaboration products. Karthik used to be the Director of Research at MindMeld, a leading AI company that powered conversational interfaces for some of the world’s largest retailers, media companies, government agencies, and automotive manufacturers. MindMeld was acquired by Cisco in May 2017. Karthik has more than 10 years of combined experience working at reputed academic and industry research labs on the problems of speech, natural language processing, and information retrieval. Prior to joining MindMeld, he was a Senior Scientist in the Microsoft AI & Research Group, where he worked on conversational interfaces such as the Cortana digital assistant and voice search on Bing and Xbox.

Karthik holds an MS in Computer Science with Distinction in Research in Natural Language Processing from Stanford University. He was co-advised by professors Daniel Jurafsky and Christopher Manning, and his graduate research focused on the problems of Coreference Resolution, Spoken Dialogue Systems, and Statistical Machine Translation. Karthik is a co-inventor on two US patents and has publications in leading AI conferences such as EMNLP, SIGIR, and AAAI.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
3 different imaging of brains with Dr. Don Vaughn
Pay attention now or you might pay for it later

Dr. Don Vaughn is a neuroscientist, an author, a speaker and an extremely interesting guy to have a conversation with. I had the recent privilege of chatting with Don about the future of our brains, and specifically of how we balance the barrage of inbound stimuli that can distract you during your day.  As they say, “the struggle is real” and it is powerful.  

Attention is a hot commodity

Prior to the outbreak of the pandemic, studies showed the average person is bombarded with as many as 4,000 messages per day. These messages come from a combination of advertising, marketing, notifications, alerts, emails, calls and more. Some are very passive, blending into the background but having a subtle build-up over time. Others are intrusive, invasive and break through the clutter.  All of them are intended invoke a response, and all of them are vying for your attention. It’s like being a parent, working at home and having 4,000 children constantly asking you for something. Okay, that might be an exaggeration to make a point, but to those of us who are parents with young kids in these days, it feels like a reality at times.

Your ability to pay attention is one of your most prized resources. Don points out that when you become distracted, it can take an average of 30 minutes to get back into “the groove” you were in.  For some people, it can take up to 2 hours and that delay can crush productivity. Productivity is the word of the day as people are trying to balance work and life while being stuck at home during in a pandemic world.  Companies are looking to the future, both near and long-term.  In the near future, they’re looking at ways to maintain a high level of productivity. In the long term, they’re looking at a hybrid workplace where teams may remain distributed and need to maintain the ability to collaborate in a flexible working environment.  Both situations require tools and platforms that enable interaction and sharing, but also the opportunity for people to focus and pay attention to their work.

Finding productivity in short bursts

graph of how much time to give for productivity

Studies show that short bursts of focus can be extremely productive.  Some of these studies mention 25-minute bursts.  I tend to subscribe to a 60-minute burst with a single small break. These bursts allow you to drop into a groove, maintain your attention and pursue a concept to a point where you feel a logical pause. After the burst you can take a breath, focus and get back into it for a logical conclusion. It becomes important for people to block sections of their week for these bursts of thinking and focus.  I personally lock in 6 hours per week, scattered over different days. These periods can be moved, but never canceled. They ensure I remain productive, even when the rest of my day is spent on video call after video call. During these times, I employ techniques to reduce interruptions and distractions. They include: closing applications, putting on headphones and removing my phone from immediate line of sight. By implementing these little adjustments, productivity is maintained.  Without them, focus is lost and my productivity wanes.

Empathy and human connection

An inability to pay attention can diminish results over time, hence a negative impact on productivity. You need these bursts of attention to keep you feeling positive. People take pride in their work, and they need to feel they are doing well and producing. Co-workers need to have empathy towards one another and acknowledge that we all need time to focus. Don speaks a lot about empathy and human connection. Video conferences have become the defacto means of maintaining that connection. Video allows you to see the other person and look them in the eye. Video allows for body language, tone and character to enter your conversation. Phone calls are difficult. It’s easier to be rude when you can’t see the person you’re talking to. Email is worse. Email has no tone. When you read an email, you hear the tone based on your current state of mind. Negativity can arise from responses to emails or confrontational phone calls more than video calls. On a video call you can see someone, read their body language and create empathy in your conversation. You can read the room and adjust your delivery to match. This empathy goes a long way to fostering a better working environment.

What’s next?

As teams work remotely or find their way back to the office in some capacity, attention and empathy are going to be key to success. Teams need to collaborate. Individual contributors need to be able to focus. Managers need to foster the opportunity for employees to pay attention and not degrade their productivity due to distraction and interruption. And businesses need to understand that technology and flexibility can be employed to create the optimal environment for the future.

Check out the session with Dr. Don Vaughn and hear more about attention, empathy and the future of the workplace.

Learn More

[Webinar] Inside the Brain: Science based methods we can use to increase our productivity

How to prepare for the return to work

How to have a webexceptional video conferencing meeting

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
A low angle view on a blue digital key made to resemble a circuit and placed on a surface with encrypted text.
# Stay safe – Always authenticate

Richard Barnes – Why authentication always needs to be the first thing you do with something you receive over the Internet.

Authentication lesson

Much like Ciscos collaboration products, WhatsApp is used by millions of people around the world to communicate and collaborate — and a little while ago, it was discovered that due to a vulnerability, WhatsApp also allowed anyone on the Internet to take over the phone it was running on.  What can developers learn from what went wrong here, to avoid making similar mistakes in other products?  The key lessons here are: 

  • Treat any data you receive from the Internet as potentially hostile 
  • In particular, always use authentication so that you can reject traffic from bad actors 
  • Use memory-safe languages and libraries, especially when handling data from untrusted sources 

 Heres how Facebook described the vulnerability in their advisory:

A buffer overflow vulnerability in WhatsApp VOIP stack allowed remote code execution via specially crafted series of SRTCP packets sent to a target phone number.”

Let’s unpack this

There are a couple of things to unpack here.  The phraseremote code execution hints at the severity of the vulnerability.  By exploiting this vulnerability, an attacker can run any code they want on the victims phone.  The really scary phrase, though, is specially crafted.  That means that anyone on the Internet could make up some packets, send them to your phone, and take it over.

The core mistake WhatsApp made here was trusting unauthenticated data.  Cryptographic authentication is the way we separate the good guys from the bad guys on the Internet.  Whenever a program or device receives data over the Internet, the very first thing it should do is verify that the data was sent by the entity that the product thought it was communicating with.  That way, we immediately reject traffic from unknown parties, so that the worst thing that can happen is that the thing were communicating with can send us bad data we’ve scaled down the risk from billions of devices to one.

As usual with cryptography, you should use standard tools for this, and most of the standard security tools include authentication.  TLS and its cousin HTTPS are the right tools for most things.  They provide authentication using digital certificates.  For real-time media, the best tool is DTLS-SRTP, which you’ll always be using if you’re using WebRTC.  If you can’t use DTLS-SRTP for some reason, you can fall back to Security Descriptions, which at least ensure that your media packets are from someone who was involved in the call signaling.  With all of these, you should configure your software using an AEAD algorithm such as AES-GCM to make sure that all of your communications are authenticated as well as encrypted.

In fact, WhatsApp usually uses Cisco software to encrypt and authenticate their real-time traffic!  WhatsApp incorporates the libsrtp open-source library that Cisco maintains.  In this case, though, WhatsApp seems not to have been getting all the protection libsrtp should have offered.  They seem to have done some processing on SRTCP packets before they use libsrtp to authenticate them.  Authentication always needs to be the first thing you do with something you receive over the Internet.

Buffer overflow

A second issue here is indicated by the phrase buffer overflow.  This is an ancient class of vulnerability, which is almost entirely prevented by using more modern languages like Rust, Go, or even Java.  If you’re stuck with C or C++, you should make sure to use defensive coding standards to avoid bad practices, and apply sanitizers and fuzzers to find memory corruption bugs before they turn into vulnerabilities.

It’s never pleasant to see large-scale, high-risk vulnerabilities.  They put peoples data, work, and lives at risk.  But its good when these vulnerabilities are found and fixed, and it gives us the opportunity to learn.  This case is a good reminder that we should never trust data from the Internet, and we should use modern tools to avoid memory corruption.

About the author

Richard Barnes is an IETF appointee to the ISOC Board of Trustees. He is employed as the Chief Security Architect for Collaboration at Cisco. He currently chairs the IETF PERC working group, and is actively involved in working groups across the security and applications areas. Mr. Barnes has been involved in the technical work and management of the IETF for several years. He has served as Area Director for Real-time Applications and Infrastructure (RAI) and co-chair of the IETF ECRIT and GEOPRIV working groups. He is co-author of several RFCs related to geolocation, emergency services, and security, including RFC 6155, RFC 6280, RFC 6394, and RFC 6848. He is also co-author of the book VoIP Emergency Calling: Foundations and Practice (John Wiley and Sons, 2010). Richard has also served as the chair of the RIPE Measurements, Analysis and Tools working group, and on the program committee for the Middle East Network Operators Group (MENOG). Prior to joining Cisco, Mr. Barnes was Firefox Security Lead at Mozilla. In that role, he was responsible for assuring the security of the Firefox web browser. Before joining Mozilla, he was a Principal Investigator at BBN Technologies, leading research activities related to real-time applications and Internet security. He holds a B.A. in Mathematics and an M.S. in Mathematics from the University of Virginia.

Click here to learn more about the offerings from Webex and to sign up for a free account. 

Read more
Caught by the fuzz

Robert Hanton – Learn how Webex uses fuzzing and machine learning as one more way to help prevent security issues.

What is fuzzing?

Fuzzing” is a security technique that actually goes back to the 1980s. The essential idea is to automatically generate very large numbers of random or semi-random inputs for your system, feed them in, and monitor the system for problems such as crashes, lockups, memory leaks or long delays in processing the data. When a problem is found you then have a discrete and repeatable input that can be used to trigger the problem, diagnose it, and confirm that a code-change has resolved it.

 

While any function or module in a codebase can be fuzzed, it is particularly valuable to apply to any unvalidated input the system receives, as these inputs can (accidentally or maliciously) trigger unwanted behaviours. In the case of a collaboration system these can range from the signaling messages in a call flow, to the actual audio and video packets themselves, to control packets such as RTCP (RTP control protocol) that travel alongside the media.

Despite this, fuzzing is perhaps one of the least-utilised tools across the tech industry as a whole, as for a long time setting it up to be of value was regarded as something of a “black art”, and the purview of dedicated security experts. There was a time that was at least somewhat true, but modern fuzzing tools allow for very effective generation of inputs with a minimum of time and training. To understand why these newer generation of fuzzers are so effective, let’s quickly explore how fuzzing used to be done.

Older fuzzing techniques

The challenge of fuzzing has always been the generation of good inputs – the test values fed into the system to attempt to provoke bad behaviour. The very first fuzzers simply generated random data, but in almost all real-world scenarios, random data makes for very ineffective inputs.

To understand why, consider JSON, which has a relatively permissive format. A parser will, however, expect an input to start with “{“. If we are generating inputs of random ASCII, then more than 99% of our inputs will likely be discarded by the very first check in the parser, and the vast majority of the remainder shortly thereafter when they fall foul of other very basic checks.

Random inputs are not entirely without value – fuzzing certain binary protocol that have very little space in their format devoted to validation such as some audio and video codecs can be effective. But for the vast majority of formats fuzzing with random inputs is extraordinarily inefficient.

So, to be effective, a fuzzer needs to reliably generate inputs that are at least close to a valid input for the system under test. Traditionally there were two methods for doing this: mutational fuzzing and generation fuzzingMutation fuzzing involves taking a number of ‘real’ inputs (often taken from log files or recorded by a packet analyser such as Wireshark) and using them to drive a mutator function. This function would take a valid sample and mutate it in one or more ways by randomly applying a range of rules such as changing bits or characters, duplicating values, removing values and so on.

mutation fuzzing

 

This would result in a large number of inputs for fuzzing that would resemble real-world inputs (and hence not be immediately rejected for violating basic syntax rules) but which might result in internal states that the designer had never contemplated and hence find crashes, lockups or other issues. A mutational fuzzer could thus be set up relatively quickly if a comprehensive body of real-world inputs were available to seed the mutator. However, skill was involved in picking out a representative sample of real-world inputs, as the mutator would only exercise parts of the format that were reflected in its samples.This was a particular issue when extending a format and adding new functionality, as there wouldn’t be an easily-accessible body of data to draw on that included this new syntax.


mutational fuzzier

 

By contrast, generation fuzzing involves creating a data model describing the syntax that a valid potential input could take. For instance, if seeking to fuzz a SIP parser, you would need a data model defining the SIP protocol (or at least as the parts of it that your parser supported). A generator function would then use this to generate a set of inputs, both valid and invalid, based on that data model.

Given a complete data model, generation fuzzing can produce an excellent set of inputs that can thoroughly exercise the system under test. However, producing a complete data model generally involves a considerable investment of time by someone with a deep familiarity with the protocol, and the model must be continually maintained and updated to ensure that any extension is also covered by the fuzzer.

These barriers of time and skill for the mutation and generation techniques are what contributed to fuzzing being seen as the domain only of dedicated security experts. Companies such as Codenomicon (now part of Synopsis) produced commercial pre-packaged fuzzing tools for well-known protocols such as SIP; these provided turnkey access to high-quality fuzzing for those specific protocols for companies that could afford to license them, but otherwise fuzzing was a niche tool.

Instrumentation-guided fuzzing with machine learning

However, there is a new generation of fuzzers that can produce high-quality inputs that can exercise the system under test as thoroughly as a generation fuzzer, but can do so automatically and without the need for a predefined data model. They do this by instrumenting the executable under test, detecting what code paths its inputs exercise, and then using that data to feedback into its input generation to learn to produce new, more effective inputs.

The fuzzer of this type my team uses is American Fuzzy Lop (AFL), but other similar tools exist: other teams in Webex use Clang’s LibFuzzer. These tools instrument the executable under test in a similar way to tools that generate figures for unit-test coverage, inserting hooks for each line or function that detects when that fragment of code is exercised.

This means that when an input is fed into the system under test, the fuzzer can detect what portions of the code that input exercised, and that can be used to assign a fitness to the particular input. Inputs that don’t fit the expected syntax well will be rejected without exercising much code and so will be assigned a lower fitness than one that is a better fit for the expected syntax and hence exercises more code.

With the ability to very accurately assign a fitness to each input it generates, the fuzzer can then learn to generate better and better inputs that exercise more and more of the executable under test. AFL does this through genetic algorithms, a machine learning technique where pseudo-natural selection techniques are used to “breed” new inputs from the fittest of a previous generation.

That means that you just need to give AFL an initial seed input and it will learn to evolve a corpus of inputs that thoroughly exercise your executable under test. Thanks to the instrumentation you can also get real-time feedback on how much of your executable it has managed to explore so far, how many issues it has found, and other key information.

american fuzzy lop

Getting started with instrumentation-guided fuzzing

There are plenty of tutorials out there for AFL, LibFuzzer and other tools, so instead here is a grab-bag of tips and suggestions:

Unless your system is very small don’t fuzz the entire thing – instead create a little ‘fuzzable’ executable for each module you want to test that strips it down to the bare minimum that ingests an input, parses/processes it, and exits. The less code there is and the faster it runs the more generations the fuzzer can run and the more quickly you will get results.

You can fuzz anything with defined inputs, but focus initially on inputs your system receives from the outside world, particularly those received without any validation from other parts of your overall system. These are some of your most vulnerable attack surfaces, and hence where you really want to find any vulnerabilities.

Fuzz your third-party modules, particularly those that are not pervasively used across the industry. Third-party code has bugs just like first-party code, and just because you didn’t write it doesn’t mean you are not responsible for those bugs if you include the library in your system – your customer won’t care who wrote the code that crashed their system (or worse). Third-party libraries usually have well-defined inputs and hence are highly amenable to fuzzing. If you do find issues don’t forget to push any patches back upstream so the community as a whole can benefit.

While instrumentation-guided fuzzers can produce a fully-compliant input from any arbitrary seed, it can take them quite some time to evolve the basic semantics. You can speed things up significantly by seeding them with real-world input. Similarly, keep your corpus from previous runs and use it to seed the fuzzer when you run it again – that will save a lot of time.

While you’ll get the most benefit from it the first time you run it, consider automating your fuzzing. You can set it up to run periodically or on new changesets and alert if it finds new vulnerabilities introduced by code changes. If so make sure to use the corpus of the previous run to seed the fuzzer, as you want to make the fuzzing process as efficient as possible.

Like any security technique, fuzzing is not a silver bullet for finding every vulnerability your system might have. Using fuzzing does not mean you should not also be using static analysis, threat modeling, and a range of other techniques to make your system as secure as possible. Good security is about defence in depth; fuzzing is one more technique that provides its own unique set of benefits.

About the author

Robert Hanton is a Principal Engineer at Cisco Systems. He has worked in video conferencing ever since he graduated from Cambridge’s Engineering course back in 2005, working in companies with tens of employees, and companies with tens of thousands. He is the primary architect and service owner for some of the key media services in Cisco’s Webex conferencing product, responsible for providing millions of minutes of real-time video and audio every day to users around the world.

Click here to learn more about the offerings from Webex and to sign up for a free account.

Read more
Man using Webex Assistant on Webex Room devices
Top 7 Webex Assistant Device commands to enhance your meetings!

Enhance your meetings with Webex Assistant

As we look to return to the office, the meeting room as you once knew it might be slightly changed. Deploying Webex Assistant will make the return back into the office a much easier one, as it allows users to interact with your Webex Room devices in a whole new way! Webex Assistant can help make calls or even share your screen without ever touching the device! Let me take you through a few commands which can be useful as you re-enter the workspace.

Useful commands as you re-enter the workspace

1) Start My Meeting – the ability to have what I nickname “No Touch Join” is an amazing feature. If your company has enabled booking of meeting rooms you can walk into any Webex Assistant enabled meeting room, if you are present at the time the meeting starts Webex assistant will ask you if you want to join the meeting now! We call this Proactive Meeting Join, and all you have to do is say “Yes” and the meeting will start. If you are running late or want to join early – no problem, you can just say the following phrase “OK Webex, Join the Meeting.” The device will take care of the rest and join the meeting for you!

2) Call a colleague – Sometimes you are in a huddle space or focus room, and you just quickly need to call a colleague who might be at home or at his desk. Instead of sending a Webex Teams message, you can just ask the Webex Assistant on the device to call that person directly from the device. To do this you can use the following command “OK Webex, Call Richard Bayes.”This will tell the device to search for Richard Bayes in the directory and call their directory number. If this user has Webex Teams or a personal device, it will ring them directly and you can easily have a call without pressing any buttons!

3) Share my screen – Webex Assistant can also easily share your screen into the call or meeting. Just by using the phrase “OK Webex, Share My Screen” you can start a screen share of any connected source without having to touch the device. Just another way to make your life easier. You can end the screen share as well with the “OK, Webex, Stop My Screen Share”

4) End the call – Another useful feature is the ability to end the call without having to touch the device. When you are ready to end the call just say the phrase “OK Webex, End The Call.”

5) Start or Stop a Recording – When you are in a meeting and you want to record it, Webex Assistant is also able to help start and stop the recording with the following phrase, “Ok Webex, Start a Recording.”

6) Turn up or down the Volume – Your colleague is sharing that new awesome marketing video and you want to get the full experience. If the touch panel is 6 ft away or being occupied by someone else then no worries, you can just say the following phrase “OK Webex, Turn up the Volume.” You can easily turn your Webex Room 70D into a party room! No one said work had to be quiet all the time, right?

7) Show the Room Calendar – So you just finish a marathon 2-hour call with Finance and you need some time to consume what was spoken about, write some notes and finish up some messages that you received during the meeting. Instead of being forced to leave the room right away, you can easily ask the Webex Assistant if the room is free so you can continue working and be productive. All you need to do is say the following phrase, “OK Webex, Show me the calendar.” Webex Assistant will return a quick summary of the next call and show a detailed list of the rest of the days meetings!

Ok webex what can you do?

That’s not all!

These are just some of the ways to get the most out of Webex Assistant, and we are always adding features and more commands to make your life a little easier so you can focus on the important things in the office or at your desk! There’s also Webex Assistant for Webex Meetings, which is the first and only enterprise digital AI meeting assistant on the market. Think Alexa or Siri for the workplace. It also uses voice commands to help improve your productivity and meeting experience. No need to take notes, capture action items, or find meeting controls. Webex Assistant provides real-time meeting transcription, highlights action items, and takes notes.

You can always ask your AI-powered assistant what it can do you for with the following phrase “OK Webex, What can you do.” This will return a list of different actions you can take!

Please make sure to register for our upcoming webinar, Safely Return to The Office with Webex Rooms, on August 18th, 2020.

Learn More

See all the Webex Meetings updates here

Secure, first-party recording transcripts in Webex Meetings

Webex Meetings June 2020 Update: Transcriptions, Background Blur and Mobile Grid View

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
woman standing at her computer and screen sharing
Screen sharing how-to guide: Tips for better real-time collaboration

Tips for better real-time collaboration with screen sharing

Effective collaboration is the holy grail for business: something constantly sought after, but often never reached. It’s not usually the fault of the team members and departments themselves, but more the communication and productivity tools they have.

True collaboration entails much more than a back-and-forth discussion in an email chain that reaches the double digits of replies. It means employees, and more broadly, business units, tightly coordinating with each other on strategy, planning, execution, and follow-up. In reality, companies can’t rely on email or traditional conference calls to support that level of collaboration — especially in real-time.

Screen sharing, however, can be used to power collaboration on all fronts: whether it’s uniting geographically diverse teams on a single video conference so they can meet face-to-face and walk through project outcomes or a quarterly report, or hopping on a quick 5-minute call to review and edit a presentation, screen sharing can greatly benefit teams.

Here’s a guide to using screen sharing for better real-time collaboration and some tips for leveraging its key advantages.

Use video whenever possible

No channel is more effective or efficient for communication than video. While it’s possible to have a conversation over the phone, video fosters deeper engagement and interaction, essential to collaboration. Seeing face to face or making eye contact while sharing your screen ensures your message resonates with everyone in the room.

Consider putting in place a policy that would have remote workers use video by default, or as often as possible. It can be hard to inspire true, real-time collaboration with professionals spread across disparate states and time zones, but video and screen sharing can provide the right conditions. For example, while teams can still utilize chat as much as they like, require that video be used for any kickoff meeting, scheduled biweekly update, or similar milestone meetings.

Ensure teams know how to screen share

The value that screen sharing features can provide won’t come in handy if your employees haven’t been trained in using the solution. As you roll out a new tool, take care to hold training and feedback sessions. The second type of outreach is crucial; that way, you can troubleshoot any issues before they become more significant problems with realizing value or return on investment from screen sharing software.

For example, knowing how to pass presenter controls is essential to collaboration on a video call. That way, each person who has something to say or share can do so, knowing that the rest of the attendees are focused on them and able to see the materials.

Enable meeting participants to go mobile

There’s no telling when a collaborative spark might ignite, and that includes when team members are on the road, in transit, or otherwise not in the office setting. A screen sharing solution with a mobile app facilitates collaboration by letting call attendees use their mobile devices to join the meeting and utilize screen-sharing features.

Encourage users to get creative

The beauty of screen sharing is that it can facilitate collaboration in different and creative ways. Encourage your teams to test out screen sharing for all types of meetings to find creative ways to use it. For instance, sales and marketing teams may work together on an interactive presentation in real-time, fine-tuning animations, or dynamic elements before the client meeting.

Have a recording policy

Recording a video meeting is helpful for future reference. Attendees can quickly call up what was said or share the contents with another member who wasn’t there. Recorded video calls help eliminate communication gaps and oversights that negatively impact collaboration. Also, this allows you to track the thought process if you were editing a document in real-time. It’s worth thinking at a high level about a policy for recording videos for preservation purposes.

Last but not least, don’t forget about security

While you want your teams and business units to work closely with one another, they are crucial for providing a secure platform for doing so. Different security topics you may need to consider include encryption, password management, access controls, cloud infrastructure, hardware security, and data storage. Security is as important as quality when researching software that allows screen sharing.

Ready to try screen sharing? Check out Cisco Webex to learn about our screen-sharing products or get started with a free plan today.

Learn More

Why screen sharing works better for sales than traditional conference calls

4 screen sharing tips to manage a growing business without an office

Personalize your team meetings with these top four screen sharing features

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more
How to Determine How Effective Your Collaboration Tech Is
How to determine how effective your collaboration tech is

Effective collaborative workplace culture

For a long time, workplace communication was limited to office intercoms and yelling over cubicle walls. As primitive as these approaches sound, they were the standards.

 

Workplaces now enjoy a litany of team collaboration tools developed to engage remote and in-person employees who no longer limit their productivity to a specific workplace or certain office hours. But no matter where employees are, they want high-quality communication and collaboration tools that are convenient to use.

 

How can small business leaders ensure their team collaboration tools continue to meet that demand? By embracing the spirit of those same tools and creating collaborative workplace cultures.

Benefits of collaboration tools

One of the primary benefits of collaboration tools is that ideas can flow freely and go into action at a moment’s notice. When those collaboration tools are added to a collaborative workplace culture, it fosters the kind of environment in which everyone — and every tool — can truly thrive.

 

To get the kind of insights necessary to ensure your collaboration tech is top shelf, you’ll need to start by getting some answers from your team. That way, employees will fit the tool — not the other way around.

Questions to ask your team

Here are some helpful questions to ask your team:

 

• What tools do you currently use to communicate? Is it a messenger app, or is most of your correspondence via email and other means? You can even open up this question to the communication tools team members use most in their personal lives. Determine how the majority of your office communicates, and it should lead you toward team collaboration tools that engage every employee rather than a select few. You might be surprised by what ideas your team offers.

 

• What’s your preferred mode of remote work? Whether team members are full-time remote employees or just working remotely now and then, they’ll have preferences for how they’re reached. Maybe it’s via phone, conference line, messaging, or video conferencing platform. Use this time surveying employees to find the best remote collaboration tools to foster your collaborative team culture.

 

• What kind of document sharing do you use? More and more platforms offer live editing and commenting features for document sharing. Ask employees how effective those solutions are and how often they leverage them.

 

• How do you track thoughts and take notes? Do your team members put pen to pad and jot down notes the old-fashioned way? Or do they prefer to take notes on their laptops or tablets? What about transcriptions? Do they use their preferred word processors to keep living documents of ideas and insights? Learn their preferences to provide optimal business collaboration tools.

Next steps

You’ll want to ask these questions annually or at least every other year. You can then apply those answers to what’s currently on the market to determine whether you’re getting the best value out of the slate of tools at your disposal. Maybe one — or more — of the team collaboration tools best suited for your team is offered in a stack that you can invest in for the whole staff. That’s far easier and more efficient than using a dozen different solutions from a dozen different providers.

 

Collaborative tools are most effective when they’re easy to use. Learn about the tools that best help your employees communicate, and your business’s ability to produce and collaborate will never falter.

 

Have your own answers to the above questions confirmed your need for new team collaboration tools?

 

Click here to learn more about the offerings from Webex and to sign up for a free account.

Learn More

Delighting remote workers: Why user experience is important

Embracing the rise of remote working

Working smarter: Managing a remote team

Still Need Help?

What would you like to do?

Join a Webex online meeting.

Learn more about web meetings and video conferencing.

Sign up for Webex.

Visit our home page or contact us directly for assistance.

Read more