Professional Documents
Culture Documents
Outline
Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications
SpeechWear Communicator
Speech as a signal
The difference between speech and sound
CD quality vs. intelligible quality
high-quality is 44.1 / 48 kHz desirable speech bandwidth: 0-8kHz, 16bits
at 16bits/sample: 256kbps (tethered mic) telephone: 64kbps (and lower)
Compression:
MPEG: 64kbps/channel and up (but not speech-optimal) CELP: 16kbps 2.4kbps (optimized for speech)
Information access
airline schedules, stock quotes
Problem solving
travel planning, logistics
roblem olving
O I T I
TIO
Signal processing
Parser
Decoder
Post parser
speech
display
effector
Decoding speech
Reduce dimensionality of signal Signal processing noise conditioning Decoder Transcribe speech to words
Acoustic models
Language models
Transcribe*
Train
Text data
Train
Language models
Understanding speech
Grammar
Parser
Post parser
Context
Domain Agents
Task analysis
Context
Guide interaction through task Map user inputs and system state into actions Interact with back-end(s) Interpret information using domain knowledge
Database
Domain expert
Knowledge engineering
ATIS system
air travel information retrieval context management
film clip
SpeechWear
Vehicle inspection task
USMC mechanics, fixed inspection form Wearable computer (COTS components) html-based task representation
film clip
Information access
Moderate to very large vocabulary
IVR and frame based systems
Commercial systems:
Nuance: http://www.nuance.com/demo/index.html SpeechWorks:
http://www.speechworks.com/demos/demos.htm
lots of others..
Frame systems
ergodic graphs states defined by multi-item forms
Graph-based systems
Welcome to Bank ABC! Please say one of the following: Balance, Hours, Loan, ...
What type of loan are you interested in? Please say one of the following: Mortgage, Car, Personal, ...
. . . .
Frame-based systems
I would like to fly to Boston
Id like to go to Boston on Friday,
Destination_City: Boston Departure_Date: ______ Departure_Time: ______ Preferred_Airline: ______ . . .
Frame-based systems
Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . . Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____ . .
Some problems
IVR systems work great, but only for wellstructured ( shallow) tasks Frame systems are good for tasks that correspond to a single form leading to an action Neither approach does well with more complex problem-solving activities
Dialog Systems
Problem solving activity; complex task
Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable.
Discourse phenomena
User expect to converse with the system
Value schema/handlers
receptors
transform
value
Domain Agent
Compound schema
Value_1 Value_2 Value_3 +
transform
e.g. SQL query
value
Domain Agent
Schema ordering
Schema i Destination airport Value i Schema j Date Value j Flight Leg Value k Schema k Time
Value
Available flights
Interface guidelines
State transparency Input control Error recovery Error detection Error correction Log performance Application integration
Summary
Speech and language communication Dialog structure Interface design