Speech Air

Speech recognition, understanding and conversational interfaces
Alexander Rudnicky School of Computer Science

http://www.cs.cmu.edu/~air
Outline
Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications
SpeechWear Communicator
Speech as a signal
The difference between speech and sound
CD quality vs. intelligible quality
high-quality is 44.1 / 48 kHz desirable speech bandwidth: 0-8kHz, 16bits
at 16bits/sample: 256kbps (tethered mic) telephone: 64kbps (and lower)
Compression:
MPEG: 64kbps/channel and up (but not speech-optimal) CELP: 16kbps 2.4kbps (optimized for speech)
Speech for communication

The difference between speech and language Speech recognition and speech understanding
Computers and speech

Transcription
dictation, information retrieval
Command and control

data entry, device control, navigation
Information access
airline schedules, stock quotes
Problem solving
travel planning, logistics
Speech system architecture

SIGNAL PROCESSING DECODING UNDERSTANDING DISCOURSE ACTION
Varieties of speech systems

Transcription
I O I X X X X
ommand & Information ontrol ccess
roblem olving
O I T I
TIO
A generic speech system

speech
Signal processing
Parser
Dialog manager Domain Domain Domain agent agent agent
Language Generator Speech synthesizer
Decoder
Post parser
speech
display
effector
Decoding speech
Reduce dimensionality of signal Signal processing noise conditioning Decoder Transcribe speech to words
Acoustic models
Language models
Corpus-base statistical models
Creating models for recognition

Speech data
Acoustic models
Transcribe*
Train
Text data
Train
Language models
Understanding speech
Grammar
Ontology design, language acquisition
Parser
Extract semantic content from utterance
Post parser
Introduce context and world knowledge into interpretation
Context
Domain Agents
Grounding, knowledge engineering
Interacting with the user

Task schemas
Task analysis
Context
Dialog manager Domain Domain Domain agent agent agent
Guide interaction through task Map user inputs and system state into actions Interact with back-end(s) Interpret information using domain knowledge
Database
Live data (e.g. Web)
Domain expert
Knowledge engineering
Communicating with the user

Language Decide what to say to user (and how to phrase it) Generator Speech synthesizer Display Generator Action Generator
Speech recognition and understanding

Sphinx system
speaker-independent continuous speech large vocabulary
ATIS system
air travel information retrieval context management
film clip
Command and control systems

Small vocabularies, fixed syntax
OPEN WINDOW <window_id> MOVE OBJECT <object_id> to <position> Applications:
data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment)
Large vocabulary, fixed syntax

Web browsing (?)
SpeechWear
Vehicle inspection task
USMC mechanics, fixed inspection form Wearable computer (COTS components) html-based task representation
film clip
Information access
Moderate to very large vocabulary
IVR and frame based systems
Commercial systems:
Nuance: http://www.nuance.com/demo/index.html SpeechWorks:
http://www.speechworks.com/demos/demos.htm
lots of others..
IVR and frame-based systems

Interactive voice response (IVR)
interactions specified by a graph (typically a tree)
Frame systems
ergodic graphs states defined by multi-item forms
Graph-based systems
Welcome to Bank ABC! Please say one of the following: Balance, Hours, Loan, ...
What type of loan are you interested in? Please say one of the following: Mortgage, Car, Personal, ...
. . . .
Frame-based systems
I would like to fly to Boston
Id like to go to Boston on Friday,
Destination_City: Boston Departure_Date: ______ Departure_Time: ______ Preferred_Airline: ______ . . .
When would you like to fly?
Frame-based systems
Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .
Transition on keyword or phrase

Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .
Some problems
IVR systems work great, but only for wellstructured ( shallow) tasks Frame systems are good for tasks that correspond to a single form leading to an action Neither approach does well with more complex problem-solving activities
Dialog Systems
Problem solving activity; complex task
Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable.
Track progress and help task along

mixed-initiative dialog
Discourse phenomena
User expect to converse with the system
Carnegie Mellon Communicator

A dialog system that supports complex problem solving in a travel planning domain
create an itinerary using air schedule, hotel and car information 186 U.S. airports (>140k enplanements/yr)
currently: >500 world airports
Web-based data resources

Live and cached flight information Airport, airline, etc. information
Value schema/handlers
receptors
transform
value
Domain Agent
Compound schema
Value_1 Value_2 Value_3 +
transform
e.g. SQL query
value
Domain Agent
Schema ordering
Schema i Destination airport Value i Schema j Date Value j Flight Leg Value k Schema k Time
transform Database lookup
Value
Available flights
Carnegie Mellon Communicator

CMU Communicator
Call: 268-5144 the information is accurate; you can use it for your own travel planning...
User-aware speech interfaces

Predictable behavior on the systems part Users coomunicate at different levels
http://www.speech.cs.cmu.edu/air/papers/Interface Chars.html
User-aware speech interfaces

Content: task-centric utterances Possibility: What can I do? Orientation: Where are we? Navigation: moving through the task space Control: verbose/terse, listen! Customization: define this word
Speech interface guidelines

Speech recognition is errorful System state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S pInGuidelines/SpInGuidelines.html
Interface guidelines
State transparency Input control Error recovery Error detection Error correction Log performance Application integration
Summary
Speech and language communication Dialog structure Interface design

Speech Air

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Air

Uploaded by

Copyright:

Available Formats

Speech recognition, understanding and conversational interfaces

Alexander Rudnicky School of Computer Science

Speech for communication

Computers and speech

Command and control

Speech system architecture

Varieties of speech systems

ommand & Information ontrol ccess

A generic speech system

Dialog manager Domain Domain Domain agent agent agent

Language Generator Speech synthesizer

Corpus-base statistical models

Creating models for recognition

Ontology design, language acquisition

Extract semantic content from utterance

Introduce context and world knowledge into interpretation

Grounding, knowledge engineering

Interacting with the user

Dialog manager Domain Domain Domain agent agent agent

Live data (e.g. Web)

Communicating with the user

Speech recognition and understanding

Command and control systems

Large vocabulary, fixed syntax

IVR and frame-based systems

When would you like to fly?

Transition on keyword or phrase

Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .

Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .

Track progress and help task along

Carnegie Mellon Communicator

Web-based data resources

transform Database lookup

Carnegie Mellon Communicator

User-aware speech interfaces

User-aware speech interfaces

Speech interface guidelines

You might also like

Zxfgdh_dxab: ___ askjs: _ dhe: ___ aa_hgjs_aa: _____ . .

Zxfgdh_dxab: ___ askjs: _ dhe: ___ aa_hgjs_aa: _____ . .