You are on page 1of 14

Future Human Interfaces to Computer Controlled Sound Systems Craig Rosenberg University of Washington Seattle, WA, USA Bob

Moses Rane Corporation Mukilteo, WA 98275, USA

3735

(A2-PM-3)

Presented at the 95th Convention 1993 October 7-10 New York

Auo,

Thispreprinthas been reproduced from theauthor'sadvance manuscript,withoutediting,corrections or consideration by the Review Board. The AES takes no responsibility for the contents. Additionalpreprintsmay be obtainedby sendingrequestand remittanceto theAudioEngineering Society,60 East 42nd St., New York,New York 10165-2520, USA. Allr_Thts reserved. Reproduction of thispreprint,or any portion thereof,is not permitted withoutdirectpermissionfrom the Journalof the Audio EngineeringSociety.

AN AUDIO ENGINEERING SOCIETY PREPRINT

Future Human Interfaces To Computer Controlled Sound Systems


Craig Rosenberg Human Factors Lab, University of Washington Bob Moses Rane Corporation

Abstract
Computer controlled sound systemsare among the most activeresearch topicswithinthe AES community. We are enteringan age of interoperability, in whichdevices and the human operatorworkas a cohesive team. Thispaper examineshuman-machineinterfaceissuespertainingto computer controlledsound systems.Traditionalhuman interfaces are analyzed, and foundinappropriatefor computercontrolledequipment. New human interface technologiesare presentedsuch as: spatialpositiontracking,eye tracking, tactile feedback, and headmounteddisplays. We describehow these technologieswork, and their applications in soundsystem controlandmusical performance.

Introduction Sound systems are rapidly incorporating advanced forms of computer control. Since the early 1980's, a number of computer control systems have flourished: MIDi, IQ, MediaLink, PA422, Mind~Net, and others. The proliferation of these computer control systems is fueled by a number of technological advances; the personal computer, digital signal processing, and local area networking. Of course, audio is not the only field to benefit from these technologies. The "information age" is upon us, and is changing almost everything in our daily lives: how we bank, how we shop, how we communicate, how we entertain, and so on. As information technologies expand, traditional paradigms for human-machine interaction ara stretched beyond their limits. How does a person, for example, effectively manipulate a spread sheet with millions of cells of information? Or, alternatively how does a person operate a large distributed sound system in real time with hundreds of channels of audio, and nearly a thousand audio signal processing functions? Human factors researchers are studying these questions and devising new human-machine interface techniques and technologies. Some of these technologies have become components of "virtual reality" (VR) systems, while others live a more humble life in less "trendy" applications. This paper provides a review of human interface technologies, and their possible applications in computer controlled sound systems. The Boon of Digital Communications

The primary component of computer controlled systems is a communications channel between devices in the system. This communications channel can take the form of a masterslave bus, or a peer-to-peer local area network. These systems provide at least two vauable benefits: Remote Control. Computer controlled systems allow an operator to control devices from a remote location, through the bus or network, As a result, equipment can be distributed or centralized--whichever is most convenient. System operation is simplified since a single operator has access to all devices in the system. Non-human operators (i.e. computers) can take over many of the routine tasks such as watching clip indicators and VU meters and adjusting the levels appropriately. This frees the human operator to concentrate on more of the creative and fun tasks. Few would argue that remote control is not highly desired--just look at the average American coffee tablel Interoperability. Computer controlled systems based on local area networks (and to a certain degree, buses) provide an architecture of interconnected devices. These devices have the opportunity to interact, share resources, and work together as a team rather than a collection of autonomous entities; This capability has not yet been fully exploited by any of the computer control systems in the industry. In the future, DSP processing modules will share their CPUs to allow flexible (and powerful) distributed parallel processing. Controls on one device might be mapped to other functions in other devices (e.g. an amplifier volume control might control an equalizer to dial in a "loudness" curve as level is turned down), and so on. Interoperability will have a profound impact on the performance and the flexibility of systems, much more so than we are aware of today. In this paper, we present a third important benefit of computer controlled systems; the opportunity to implement an improved human interface to the system. New human interface techniques can be implemented within personal computers, or in dedicated hardware designed to interact directly with a person and their senses. New human interfaces have the opportunity to be more intuitive, natural, efficient, and fun. We will explain how, later in this paper. But first, it is instructive to examine some traditional human interfaces to gain perspective.

Traditional

Human

Interfaces

Analog Controls Most traditional analog-based controls interact with the human through potentiometers, switches, etc., that are connected directly to the audio signal path in the device. This is very straight fonNard, but not very flexible. Devices with a large number of controllable functions, such as a mixer, are necessarilly large and heavy. The moving parts involved with analog controls are often inaccurate, unreliable, and expensive to automate. To compensate for these shortcomings, manufacturers are now introducing digitally-controlled analog devices which omit the use of moving parts and place functions under the control of a microprocessor. Touch Keypads Digitally-controlled analog (and fully digital) devices often interact with the human operator via a touch keypad. Keypads can either be alphanumericor function-based. Alphanumeric keypads (keyboards) are used with personal computers. The user types commands into the computer, which are then processed and routed to the devices under computer control. Function based keypads are often found on sound equipment and provide a higher level interface to the device's internal functions. Typical functions include: edit, store, recall, and so on. Touch keypads usually provide an improvement in reliability, accuracy, and size over analog controls, but are frequently more expensive. Touch keypads can provide a simpler, improved, human interface to a device, but often, a user is left wondering which keys to hit to evoke a desired action. More importantly, many sound engineers are accustomed to interacting with certain devices via traditional analog controls, for example, a graphic equalizer. For these reasons, it should be noted that a keypad does not necessarily provide a more intuitive or natural means of controlling the device than physical knobs and sliders. Alphanumeric Displays Alphanumeric displays come in two common flavors; Light Emitting Diodes (LED) and Liquid Crystal Displays (LCD). LED displays are usually small, containing less than ten 7-segment style digits. LCD displays typically provide 32 or more dot matrix characters. LED displays can present simple numeric readouts of a device's parameters. LCD displays communicate with a user threugh written language, and are very flexible. The primary disadvantage of these displays is that they force the user to translate all meaning from written representations. As one can imagine, the relative positions of faders on a mixer can provide a much better representation of "the mix" than a row of numbers on an alphanumeric display. Problems With Traditional Human Interfaces Analog controls, alphanumeric displays, and touch keypads all have limitations, as discussed in the previous sections. In general, none of these interface components provide an economical, intuitive, and flexible interface to the system. Often, the user is left with an interface that is incapable of translating creative intentions into the proper commands that the system is able to understand. Another frequent problem is the user is not able to effectively understand the outputs that the system is trying to communicate. For these reasons (and others), the user may not be able to accurately control or understand the system. This often results in operator errors and even damage to the system. Searching for the Ideal Human Interface to Sound Systems

A variety of human interface equipment exists that couples a human operator with computer systems. This equipment can be classified into two major groups: input devices (presenting human actions to the computer), and output devices (presenting information from the computer to the human).

One aspect of human factors research investigates the design and use of computer interface equipment. The field of "virtual reality" is fueled by advances within the field of human factors. It is the physical interface equipment in combination with graphical user interface software that can empower people within a virtual environment. In the following section, advanced computer interface techniques and their associated issues and applications are presented and discussed. The Human Factors of Advanced Human Interfaces

In this paper, the human factors research areas are divided into four sections; user input and computer recognition, tactile and force feedback, visual display systems, and hearing and spatial sound. Issues are presented along with possible applications of the technologies toward the goal of improving the human interface to computer controlled sound systems. User Input and Computer Recognition User input refers to the computer being able to recognize the actions and intentions of the user. A device that recognizes some form of human expression, and translates that input into numerical data that the computer can understand, is sometimes called a behavioraltransducer. When the user invokes an action that the system is able to recognize (for example: turns a knob or moves a fader), the system may be equipped to recognize and respond to the user's input. The greater the degree of communication from the user to the computer system, the greater the capability of the system to respond to the user's intentions. However, the bandwidth of control, alone, is not enough to ensure a powerful and intuitive user interface. The methods of control, and the mappings between users actual inputs and the meaning to the system, are of great importance. There are many forms of user input available to designers of advanced systems. A variety of new and unique input devices have been developed specifically for advanced human-computer interaction. The following sections detail many of these advanced user interface techniques and present the issues associated with using them. In the final section, the applicability of these new interface tools within sound systems is discussed. Voice Input Spoken words are the most common form of communication between human beings. Speech is also the most rapid form of natural communication. To take advantage of these human capabilities, voice recognition systems have been developed to recognize spoken words. There are several aspects that characterize the performance of voice recognition systems. Voice recognition systems are either speakerdependent or speaker independent. Anyone can use a speaker independent system without having to train the system. Speaker dependent systems must be trained to recognize the unique voice of the particular speaker using the system. Another variable that characterizes voice recognition systems is the size of the vocabulary (the number of words) that the system is able to recognize. The greater the vocabulary of the system, the more flexible the system is in recognizing spoken input. Words can be recognized and assigned to commands that the system is able to execute. Words can also be used as modifiers to commands. In this way, the operator of the system is able to speak directly to the program to issue commands to the system. Some systems are able to understand continuously spoken words as opposed to only discretely spoken words. Systems that recognize continuous speech are far more flexible and easier to use than discrete systems. This is because people speak continuously, as opposed to uttering discrete words interspersed by silence. Discrete voice recognition systems are therefore laborious to use. There are still technical problems associated with continuous word recognition, however, there is significant research being done in this area. The maximum benefit associated

with voice input will be attained from speaker independent, continuous word, voice recognition systems. Eye Tracking Eye tracking is another form of computer input that can be employed to recognize intention from the user. Eye tracking involves recognizing where the user is looking at any given instant and transmitting that information to the computer. Eye tracking hardware is available which can transmit to the computer precise indications of where a person is looking at any given instant. Eye tracking technology is conceptually simple. An infrared light beam is aimed at the cornea of the eye, while a small TV camera monitors the eye. By tracking the position of the pupil with respect to the fixed-position of the reflected infrared light, the computer is able to compute the instantaneous direction of gaze of the users eye. A possible example of eye tracking within the control of sound systems involves a computer display of several components within a system. The user's eye could be tracked to determine which component of the system is being watched at any given moment. Dwell (the time elapsed while staring at the same point on the screen) can effectively be used to select devices or options within a device. To select a component you just look at it for a brief moment. After the device is selected, you could select a control within the device using your eyes. After the control on the device is selected, you could look up or down to adjust the level of the control under your visual command. Gesture Recognition We use our hands constantly for most all physical tasks we do in the world around us. Why not use our hands for input to the computer as well? There are a variety of glove like devices that turn the hand into an input device to a computer. Gloves work by measuring the bend angle of several joints of each finger. By recognizing certain combinations of bend angles, the computer is able to recognize a gesture (i.e. the "peace" sign, the "thumbs-up" sign, or the "ok" sign). The computer is also able to compare positions of bend angles over time to deduce a moving gesture (like "let-the-fingers-do-the-walking"). Various technologies such as fiber optics, mechanical joints, strain gauges, and Hall effect sensors are used to measure the bending angles of the joints of the fingers. Gloves input devices are being used for advanced and intuitive human-computer interaction, telerobotics applications, sign language interpretation, and hand injury evaluation. An application using a glove interface to a sound systems is discussed in the last section. Spatial Tracking Several types of devices have been invented that are able to track the location and orientation of one object with respect to another object. In Cartesian space these devices track the x, y, and z positionand x, y, and z orientation(sometimes called yaw, pitch, and roil). These devices typically use magnetic induction, ultrasonic sound, mechanical, inertial, or optical tracking to determine spatial location and orientation with respect to some origin. Systems that accomplish spatial tracking by means of magnetic induction have a transmitter with three internal orthogonal wire coils that induce a current through a receiver (also with three orthogonal wire coils) that is proportional to the component distances and the relative angles between the receiver and the transmitter. Mechanical methods rely on directly connected mechanical linkages between a stable origin and the mobile object that is being tracked. Some mechanical trackers also incorporate optically encoded disks at bend positions which are very accurate. Inertial systems use gyroscopes and acceleremeters but are susceptible to drift and need to be recalibrated. Optical methods usually rely on cameras, light emitting diodes, and image recognition software to accomplish the spatial tracking task.

Six dimensional spatial trackers are characterized by the accuracy in which they can collect spatial data (the resolution of the tracker). The repeatabilityrefers to the degree to which the tracker drifts over time. The latencyrefers to the timeliness of the data. Lastly, the data rate refers to the number of positions per second that can be sent from the tracker to the computer, Some trackers have the capability to track multiple sources concurrently. Six dimensional spatial tracking is necessary for virtual reality applications as the computer must always know where your head and hand are in three dimensions as well as which direction they are pointing, measured in three dimensions. Applications of spatial tracking within sound systems is discussed in the last section. The Haptlc Channel Skin is the largest sensory system with a surfaco area of about 2 square meters. Mechanoreceptionis the neural events and sensations that result from mechanical displacement of cutaneos (skin) tissue. This includes repetitive displacements such as vibration, single displacement such as touch and pressure and tangential movement of a stimulus along the surface of the skin, Tactile stimulation in the hands results from information being passed to the spinal cord through two principal nerve tracks, the Median and Ulnarnerves. The Median nerve covers the majority of the palm, all of the thumb, index and middle fingers and half of the fourth finger. The Ulnar nerve covers the remainder of the palm, half the fourth finger and the pinkie. These two nerve fibers contain four nerve types: slowlyadaptingfibers, rapidly adapting fibers,punctuatefibers,and diffuse fibers. Slowly adapting fibers respond as something touches the skin and continue to show activity as long as the level of pressure is applied, and then taper off. Rapidly adapting fibers respond with a rapid burst of activity as soon as pressure is applied, and then level off. Rapidly adapting fibers respond again when pressure is released. Punctuate fibers have small oval shaped receptor fields with distinct boundaries that tell the brain where the sensation is coming from. Diffuse fibers possess large receptor fields with vague boundaries. Tactile and Force Feedback, Proprioceptlon Tactile and force feedback can be exceptionally helpful -- even necessary, in humancomputer interfaces. In virtual worlds applications, it is disconcerting to see and hear something but not be able to touch or feel it. There are several devices that have been constructed to provide tactile and force feedback for advanced human-computer interfaces. Tactile output devices can give the user the sensation of pressure, vibration, heat, as well as the shape of an object. Force feedback is different from tactile feedback, and involves resisting force applied by a human operator. Through force feedback alone, we can tell if we are holding an apple or a sponge based on the weight and the resistance to our hand closing around it. The information that we are using is called proprioceptivecues. Proprioceptive cues are pieces of information gathered from our skin, muscles, and tendons. PrepriocepUve cues give information about where any part of our body is and the forces our body is subjected toe. Proprioceptive cues give information about the shape of objects, the firmness of objects, the position of the body, as well as forces thai the body is subjected too. Proprioceptive cues are necessary for most hands-on real world tasks, and are also desirable for most virtual world applications. A variety of tactile and force feedback devices exist, however, many of the devices that have been built are prototypes and are not available as standard manufactured models. Some examples are the Argone Remote Manipulator (ARM), the Portable Dextrous Master, PERForce Hand controller, TeleTact tactile/force feedback system, the Begej Glove Controller, the TINi Alloy tactile feedback system, and the Sandpaper system developed at MIT.

Visual

Display

Systems

Visual display systems are a component of computer controlled sound systems. Therefore, the issues associated with designing visual display systems will influence the usability of the sound systems to which they are attached. There are many different issues associated with the use of electro-optical display systems for human computer interaction. In addition there are several major types of visual display systems. Display systems can be in the form of either opaque of translucent head mounted displays, projectors, as well as common computer monitors, and active matrix displays. This section looks at some of the human factors issues associated with visual display systems. Display Resolution Resolution refers to the number of picture elements (pixels) of which the display is composed. A greater number of pixels within the display provides higher resolution image. Screen based resolution is usually measured in dotsperinch. The horizontal display resolution of a display is computed by dividing the number of pixels horizontally by the width of the display horizontally. The vertical display resolution is computed similarly, If a display has a large number of pixels, it can display scenes with a higher degree of complexity, In addition, a higher resolution display can show smaller objects. It has also been shown that the higher the resolution of the display, the less chance of eye fatigue after prolonged use. Color vs. Black and White Even though color display systems are a relatively recent advancement in visual display technology, their use has proliferated rapidly. One of the ways in which a color display is characterized is by the number of simultaneous colors the device can produce. In general, it is desirable to have a color display (instead of a monochrome display) unless this results in decreased image resolution, Most all real world scenes contain color and a color display can reproduce the natural color within a scene. In addition, color displays can encode different types of information with different colors, increasing the information content and therefore the effectiveness of the display. Field of View Field of view refers to the horizontal and vertical angular extent that the image subtends at the retina of the users eyes. The field of view is a measure of the area of visual stimulation which the display occupies. As the display size increases, the field of view increases. As display distance from the user increases, field of view decreases. Field of view is very important because asthe field of view becomes larger, the cognitive sense of being included Withinthe scene also increases. In addition, more information can be included in a display that has a large field of view than a display that has a small field of view given that the resolution is the same. Display optics can also be employed to increase the field of view of a given display, Geometric Field of View Geometric field of view refers to the degree of magnification or minification of the image due to the specific perspective viewing parameters. The distinction between field of view and geometric field of view can be described by imagining two photographs; one picture taken with a telephoto lens and the other taken with a wide angle lens. One can hold both pictures at arms length and they will both occupy the same field of view but the picture taken with a wide angle lens has a much greater geometric field of view than the image taken with a telephoto lens. The image taken with the wide angle lens (large geometric field of view) has more of the scene within the frame of the picture. The image taken with the zoom lens has less of the scene within the frame of the picture, but it has a greater degree of magnification.

Stereoscopic verses Monoscopic Monoscopic display systems, such as standard computer monitors, provide each eye with the same image. Stereoscopic display systems provide each eye with a different image. The images in a stereoscopic display system are horizontally offset just as the users eyes are horizontally offset. Stereoscopic display systems are advantageous because they can provide an intuitive sense of depth to the user of the system. Binocularretinal dispar#yrefers to each eye receiving a different image of the scene. The brain is able to synthesize both images into a cohesive scene containing intuitive depth information. Stereoscopic presentation can be accomplished by means of head mounted displays as well CRT and projection systems. Methods of presenting stereoscopic images include: time multiplexing the images on the screen, polarization of light, and chromatic separations. Many studies have shown the advantage of stereoscopic displays over monoscopic displays for estimating depth information. With a stereoscopic viewing situation there is a both an increase in the number of depth cues available to the viewer as well as an increase in the effectiveness of many of the depth cues that are already available in monoscopic viewing situations. Because of advances in video display technology, stereoscopic presentation is more available and affordable than in the past. Within the domain of computer generated images, stereoscopic presentation can greatly enhance most visual situations due to the increased depth provided from the binocular presentation. The principle disadvantage of stereoscopic presentation is the increased cost of the stereo viewing hardware as well as the increased computational cost associated with generating two view (one for each eye) for each stereoscopic display. One possible application of stereoscopic visual presentation within the domain of sound systems involves placement of three dimensional sounds. The operator of the system can use a spatial tracker to interactively position computer graphics objects that represent actual sounds in three dimensions. The user receives stereoscopic visual feedback corresponding to the locations of the sound as well as the throe dimensional audio feedback coming from the virtual sound sources. Hearing and Spatial Sound

Besides sight, sound is a primary way in which humans collect information from their environment. The physiology of hearing involves the pinea (outer ear), the ear canal, the eardrum, the hammer, anvil, and stirrup, the cochlea, organ of corti, and auditory nerve. Each of these systems have a unique function in the perception of sound. Sound localization in the horizontal plane is accomplished primarily by means of interaural time differences and interauralintensitydifferencesat the users two eardrums. When a sound occurs to one side of your head, the sound reaches the closer ear sooner than it reaches the farther ear. This is the interaural time difference. In addition, the sound will be louder in the closer ear. This is referred to as an interaural intensity difference. The brain is able to interpret differences in timing and loudness of the sound received at the two ears and determine the location from which the sound originated. In addition, there are other cues to sound localization that help a listener determine the location from which a sound originated. The head forms an acoustic shadow that filters the frequencies received in the occluded ear. When a sound occurs to the right of your head, the right ear receives the full frequency of the sound whereas the occluded ear only receives frequencies of approximately 1000 Hz and less. The head in effect acts as a Iow pass filter. Echolocationrefers to the ability to judge the size of a room by the amount of reverberance of the room. The pinea of the ear plays a crucial roll in our ability to localize sounds in.elevation. The pinea of the ear performs a frequency dependent filtering of the sound depending on the elevation from which the sound originated. Sounds originating from above the ear sound higher pitched than sounds originating below the ear.

When using computers to recreate the directional components of sound, it is important to first acquire the "earprlnt" of the user of the system. An earprint is also called a head-relatedtransfer functionand refers to the characteristics of how the sound changes as the location of the sound relative to the listener changes. Earpdnts are traditionally measured by placing miniature microphones in the subjects ears and then recording white noise originating from many different azimuthal and elevational directional combinations. The sound received at the user's eardrums is digitized and a mathematical technique called fourieranalysis is used to collect coefficients that closely describe the resulting wave form. By collecting a list of coefficients corresponding to directional combinations, it is possible to recreate and simulate the directional components of sound using a convolution engine. A convolution engine uses the fourier coefficients to perform the frequency and time dependent filtering that is present in reality. An obvious application of three dimensional sound in computer controlled sound systems lies in the ability of the system operator (human and computer) to interactively position sounds in three dimensions. The ability to spatialize sound can be used effectively in recording environments as well as live sound.

Applications

of Advanced

Human

Computer

Interaction

Architecture, Engineering, and Computer Aided Design Currently, it is very popular for architects, engineers, and designers to use computer aided design systems to aid them in their work. Virtual interface technology allows architects and designers to intuitively and easily explore their creations by flying through them. The term virtual walk-throughis used to describe the use of a computer system to virtually experience walking (or flying) through a simulation of a building. The simulation of this model can be presented to the user both visually and auditorlly by means of a stereoscopic head-mounted display (HMD) and headphones presenting three dimensional spatialized sound. Virtual interface technology can also greatly aid in constructing mechanical and architectural models as the designer can reside inclusively within the space during the design process. Scientific Visualizations In the same way that architects can visualize buildings, scientists can visualize data and processes. In addition, scientists are able to use the computer to visualize multidimensional data as graphic forms with changing attributes such as color, size, position, orientation, etc. Applications of scientific visualizations of data include mathematics, molecular chemistry, meteorology, atomic physics, astronomy, thermodynamics, fluid flow analysis, as well as financial visualizations. Visualizing multidimensional data spaces enable the scientist to obtain a much greater understanding of the properties and interrelationships of the system under investigation. As an example related to the audio Industry, an acoustical engineer could use sophisticated three dimensional computer graphics to visualize sound pressure levels and room modes in an inclusive stereoscopic graphical simulation displaying the acoustics of a performance hall under design. The engineer would be able to see graphical change in the room modes as she moves walls and baffles in the computer graphic simulation. Training The military first used simulators during World War I to train pilots. Since then, the use of simulators has grown substantially within military and civilian markets. There are many benefits associated with usingsimulators to train and practice. It is much less expensive to "fly" a flight simulator than it is to fly a real plane. Maneuvers can be practiced that require a high degree of precision. These maneuvers would be dangerous to attempt unpracticed. In addition, dangerous situations can be simulated that would rarely be encountered in a real aircraft, such as an engine

falling off of the wing of an aircraft, or the loss of the rudder. Advancements in the field of flight simulation have lead to many discoveries that have also aided the field of virtual environment research. Education Virtual interface technology can be extremely useful in educational environments because it allows us to ask questions and create simulations to answer them. Interactive scenarios could be presented to the student and learning would be accomplish experientially. Relationships and properties of the systems being studied would be deduced through direct manipulation and discovered first hand by the user cf the system. One additional advantage of virtual interface technology is that experiences can be custom designed for the participant using the system. Medicine Virtual Interface tools can also be applied to the area of medicine. Systems are being designed to facilitate the radiologist in the placement of multiple radiation beams used to destroy tumors within the body. Other systems allow medical students to perform surgical operations virtually using head mounted displays and six dimensional positioning devices. Designing for the disabled is an area in which much research in advanced human computer interaction is taking place, Systems that translate sensory information from one modality into another modality are being designed to enable the physically challenged to better function within their surroundings. Telerobotics and Telemanipulation Telepresenceinvolves conveying to the user the feeling of being present at one location while actually being present in another location. Telepresence is usually employed when the environment is too hazardous for the human operator to experience directly. A telerobotics system employs stereoscopic video cameras for the eyes and binaural microphones for the ears. Sensory data is collected by the stereo cameras which are controlled by the position of the user's head motions. Binaural microphones collect sound in the remote environment which is then displayed to the user over headphones. In this way the user can experience a remote environment from a safe location. Telerobotics have been designed for use in warfare, fire fighting, bomb disposal, assembly in space, and high radiation, high heat; and high pressure environments. Entertainment and Art High technology in the entertainment industry is not new. Examples of computer and communications technologies can be found in many entertainment applications from computer games to rides at amusement parks. Advanced human computer interfaces can be used to simulate both intellectual and physical games. Interactive systems have been developed to simulate bike riding, playing racquetball, and even being on stage in an interactive virtual theater. New forms of art are being developed by crossing the boundaries of the traditional arts. An example of this is tracking a dancers movements to create musical expressions. Other new forms of art involve kinetic sculpture, interactive poetry, and unique human and computercontrolled musical instruments.

10

Conceptual

Application--Live

Sound

at the Ear Canal Night

Club

This section describes a conceptual night club, Ear Canal, which has a computer controlled sound system which incorporates many of the advanced human interface components discussed earlier. Human Interface Adaptability to System Operator Skill Level The sound system at Ear Canal can be adapted to the skill level of the person(s) operating it. In particular, the system has three standard interface modes; novice, expert, and privileged. Novice. In novice mode, most of the sound system's functions are hidden from the operator. A very simple interface is provided, with controls resembling a standard home entertainment system. Expert. In expert mode, all sound system functions are available to the operator, though some functions have restricted operating ranges. For example, the power amplifiers can not be adjusted beyond safe levels, limiters can not be uncalibrated, anti-feedback equalization can not be adjusted, and so on. Privileged. In privileged mode, all sound system functions are fully accessible. The operator may adjust any parameter in the system, through its full range. Privileged mode is typically reserved for the chief engineer of the system, and is restricted from typical operators (even experts) to protect the calibrated settings of the system. The adaptable human interface guarantees that any operator will feel comfortable operating the system. It also protects the system from unintentional (or intentional) abuse. Expert Systems Recently, an expert system was added to the human interface. The expert system uses artificial intelligence to learn how people operate the system, and will eventually take over many of the routine and non-creative tasks of the system. For example, the expert system has already learned that whenever the clip lights on an amplifier illuminate for an extended time, the operator tums down the level in that channel. In the future, the expert system can perform that task automatically, freeing the system operator to perform other tasks. The expert system has also learned that strong spectral energy in a very small bandwidth (which the human operator knows as feedback) is generally notched out with a parametric equalizer. The expert system could carry out this operation automatically as well. In the future, after the expert system has learned many more tricks of the trade, the novice operator will be able to operate the system with even better results, as the underlying expert system automatically performs most of the work. The expert operator benefits as well, since more time is available to be creative and less time is required for the logistics of controlling the system. Motion Sensors and Spatial Trackers The human interface to the Ear Canal sound system uses a number of motion sensors and spatial trackers. These components are used by the main house engineer, the monitor engineer, the lighting engineer, and performers. House Engineer. The house engineer wears one six dimensional spatial tracker on his hand. This tracker reports hand movement to the sound system control computer. Data from the spatial tracker is used to operate many of the parameters and functions in the sound system. For example, equalization is adjusted by moving the hand horizontally (to select frequency) and vertically (to select level at the current frequency), The mix is adjusted by pointing at a sound source and raising the arm. Virtual knobs can be turned by twisting the wrist. Almost any

11

physical (real) control can be represented by a virtual one and adjusted by tracking hand movement. Monitor Engineer. Like the house engineer, the monitor engineer wears a six dimensional spatial tracker on his hand, allowing him to adjust signal processing and virtual controls. In addition, he wears another spatial tracker on his head so the sound system can determine which direction he is looking. Since the monitor engneer is located close to the stage, discrete readings of head position can be correlated to performers on the stage. Therefore, when the monitor engineer looks at a performer, his head position reveals who that performer is, and the sound system knows which monitor mix to apply the corresponding control operations to. Lighting Engineer. The lighting engineer stands in front of a small video camera, which feeds an image to a personal computer. The computer processes the video image, and tracks the motion of the lighting engineer's body. As the engineer dances to the music, the computer recognizes his movements and controls the lights over the network. The human interface to the. system allows the engineer to control the system creatively. Indeed, control of the system is an art in itself. Sometimes the camera is aimed at the audience, and the crowd's motion takes control over the lights. This is a very popular effect, and always gets the crowd excited. Performers, Performers use motion sensors and spatial trackers in many configuralions. The video motion sensor used by the lighting engineer can be applied to algorithmic music composition, effects, and so on. Drummers especially enjoy drumming in front of the camera to trigger virtual percussive devices through MIDI. Six dimensional spatial trackers can be applied to performers bodies and instruments to control many different aspects of the performance. A favorite application of one regular rock band is to affix six dimensional spatial trackers to the neck of the lead guitar. As the lead guitarist leans back, sustain is automatically turned up. As she tilts the neck up, level, equalization, and distortion are adjusted to create a pearcing lead rock guitar sound. Transparent Head Mounted Display ('rHMD)s Sound engineers perform several tasks at one time: adjust mix levels, watch meters, adjust parameters on signal processors, watch performers for visual cues, and so on. It is not possible to see all things simultaneously, since they are located in different places. For this reason, house and monitor engineers at the Ear Canalwear THMDs. The THMD decouples visual information from the actual sources, so the operator can be located anywhere (perhaps in the "sweet spot" in the live sound arena). The operator sees the real world through the display (after all, it is transparent), with over-layed live or computergenerated images. These images can be human interface menus, representations of componenets of the system, indicators, or live video of action in another part or the venue. The sound system operator can interact with the sound system equipment and watch the performers simultaneously. A THMD combined with a data glove equipped with a six dimensional spatial tracker provides the engineers with a very powerful human interface to the system. The THMD provides visual feedback from the system, while the glove allows the operator to communicate her intentions to the system. For example, the engineer may see a picture of each sound source, scaled in size to represent its current volume setting. The operator can then point to a sound source, which is recognized by virtue of the six dimensional spatial tracker. When she raises her hand, the volume of the selected source increases. When she lowers her hand, the volume decreases. Spatial Sound The Ear Canal has a state-of-the-art three dimensional sound localization system. Sounds are placed in three dimensional space by positioning icons representing each source (viewed in the THMD) with the data glove. There are also a number of preset effects such as: ping pong,

12

wave, spiral, random walk, etc., which move sounds around the venue. True three dimensional spatialized sound is always a crowd pleaser at the Ear Canal Throe dimensional sound is not only used as an effect, it is also helpful for remote control of the system. The house engineer can monitor the sound in any position in the room by localizing her headphones within the virtual space. A sophisticated room model is incorporated to include the known reflections and modes of the room so the artificially localized sound is quite accurate. Remote Control Since all equipment in the Ear Canalsound system Isdigitally controlled, it can also be remotely controlled through the communications network. A standard modem allows the entire system to be remotely controlled from an off-site location. The modem allows an expert located off-site to monitor the work of inexperienced operators adding helpful input as necessary, as well as allowing the system's chief engineer perform weekly diagnostic tests from his lab across town. House calls to reset ailing equipment take on a whole new meaning. Summary The age of computer controlled sound systems is upon us, with far roaching benefits. However, until new improved human interfaces are evolved and incorporated into sound systems, the potential power of computer control will not be fully realized. Today's human interface paradigms do not allow the human to enjoy complete creative freedom within the domain of controlling the system. Human factors researchers ara inventing and investigating new and unique technologies that allow humans to interact with computers through the human's natural senses. In the near future, many of these interface technologies will be available for controlling sound systems. Ultimately, the human operator will be liberated from routine, logistical tasks, and will be free to perform creative tasks easily and intuitively. The technologies discussed in this paper can improve the human interface to sound systems, with tremendous benefit to the entire system. As costscome down, and more durable implementations become available, advanced human interfaces for sound system control will proliferate and sound systems will become mature members of the information age. Acknowledgments The authors acknowledge many stimulating conversations related to the topics in this paper with the following individuals: Colin Bricken, William Bricken, Garrott Cobarr, Geoff Coco, Brian Kart, Mark Lacas, Philip "Random" Reay, Rick Spirtes, Steve Tumidge, and David Warman.

13

You might also like