You are on page 1of 15

A thesis submitted in partial fulfillment for a Bachelor of Science

in Web Design and Interactive Media from the Art Institute of Seattle

March 2011

By understanding which gestures are similar between cultures and what gesture vocabulary definition
methods have had the most success, humans will appropriately assimilate an understanding of the
plausibility and practicality of global hand and finger gesture technology into their lives.
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

Ever noticed that while people talk on the phone, they still gesticulate, regardless of the fact that the

person on the end of the other phone line cannot see their hands? Since first humans stood up on two

feet, hands have become a major portion of communication. Pointing at prey, pounding a chest while

screaming, and touching are a few of the natural gestures that have evolved from this upward

movement [Barfield, 1997]. Gestures are ingrained so deeply that we use them even when we’re on the

phone talking to someone across town. The gestures are performed regardless of the fact that the

interface isn’t portraying our communicative gestures to the recipient. Gesture understanding interfaces

are becoming popular and affordable, which increases practicality. But are the gestures, decided by HCI5

professionals and developers, the right symbolic language for the job? Is the scope of their gesture

vocabulary6 limited to malleable consumers, or are these gestures practical for all cultures and less tech

savvy humans? This paper intends to evaluate the results of an evolution of gestures across cultures, in

order to uncover the practicality and plausibility of a global gesture vocabulary set. If gestures fulfill

these criteria, should implementation be done where a single individual or company decides the

vocabulary to be used by all users? Or should it follow an open source mentality, and allow users to

develop their own gesture motions, therefore defining custom vocabularies? By understanding which

gestures are similar between cultures and what gesture vocabulary definition methods have had the

most success, humans will appropriately assimilate an understanding of the plausibility and practicality

of global hand and finger gesture technology into their lives.

The urge to gesture is undeniable. It takes extra thinking and muscle control to keep gesticulation from

accompanying human communication. Try giving directions down the street without any gesturing. Non-

verbal behavior has accompanied speech for so long that some actions have moved into the sub-

conscious. For example, when someone calls out to another to see if they are OK, the first thing one may

1
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

do is display an OK gesture. They will not think about which finger or arm to display, it just shoots out

from their person, and is instantly understood on the receiving end. This type of gesture is used to

provide a piece of information; it signals that they do not need immediate assistance. The gesture in less

than one second has communicated a few sentences, validating how quickly the gesture effectively

replaces short verbal communication. As soon as someone blasts through the finish line in first place,

their arms shoot into the air in a victory gesture. They don’t think about how they are going to show

their victory when they win, it’s a natural reaction to the situation. Unless they are in the National

Football League, then a dance is required. Besides the NFL, the emotional state of the individual is

clearly stated simply with their arms. According to Edward Warman in 1892, “All negative gestures fall

below the level of the shoulder-line; all positive gestures rise above the level of the shoulder-line. This is

fully illustrated by animals, their expressive agent being the tail.” Dogs are especially known for their

expressive tails. Often it is the first thing the human examines when confronting a dog. When a dog’s tail

is tucked between its legs, and its head is sunken down towards the floor, humans understand and

empathize with the gesture. Humans communicate with animals all the time, and it is gestures which

make this verbally impossible situation possible. This thesis paper does not intend to focus on sub-

conscious nonverbal communication, unintentional gesturing, or talking with animals. It is driven by the

fact that gestures are interwoven into human experience and communication. This is specially noted and

used as a root for argument and analysis.

Touching is as important in communication as gesturing. It can assist, accentuate or provide pinpoint

information. Touching is a naturally occurring human gesture and communication method. It is unique;

it has dual ways of sending and receiving information. We process and consider the feedback of the

object being touched before making our next decision. Consider the simple gesture of touching a line in

a book, it communicates and draws attention to the information instantly. Touching is fast, precise, and

2
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

globally understood. Pointing, similar to touching, is another instant way to communicate vital

information by directing eyes towards the desired object. It is also a global hand gesture, fortunately,

easy to implement in an interface. Touching and pointing start to fail as global gestural input methods

when they demand specifics and require additional pinpoint information beyond x and y positions. For

example, using both hands to point at separate objects could be interpreted as trying to communicate

length, start and end points, selection points, size, width, etc. Pointing and touching are a firm

foundation for the start of a gesture vocabulary.

Hands are the most used Figure 1: [Karam, 2006]

body part in human

gestures. According to

Maria Karam almost

40% of all gestures are

done with the hand,

followed by semaphoric6

gesturing 30%, multiple

hands at 20%, and the

rest are distributed

between the body and

limbs (Figure 1). Over half of all human communication gestures are supported or driven by hands. It is

obvious while watching any human communicating in their native language that hands drive the

majority of the gesticulation4. Maria Karam also notes that gesturing travels over space very well, near

or far: Waving across the airport at a family member, or brushing the hair of a companion. There are

gesture zones which influence the type of gestures performed [Karam, 2006]. The first zone is an

3
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

intimate zone where hand gestures range from touching to 18 inches. The second zone is the personal

zone, which starts at around 18 inches from our person and ends about four feet away (arms length).

The third zone is known as the interpersonal space or social distance (four to eight feet) and the fourth

zone is the public distance zone or anything more than 8 feet. The following will primarily obseve and

examine the first and second zones.

Ray Birdwhistell, an American anthropologist, coined the term kinesics which is the study of body

language, facial expressions and gestures to interpret meaning. From his studies, he named an

interesting set of gesture classifications. He identifies 5 types: emblems17, illustrators18, regulators19,

adaptors20, and affect displays21. Emblems are gestures used in place of words, like the loser ‘L’ done

with the thumb and pointer finger. Illustrators are co-verbal gestures7, like clapping your hands together

in a squishing motion while talking about an opposing team. Regulators are gestures used to control the

speed and flow of the communication. Adaptors are gestures which release physical or emotional

tension, like wiping your forehead with the back of your hand. Affect displays are gestures which display

emotion, and emotion plus gesture usually equals dramatic or exaggerated motions. Hand gestures

could be classified using logic similar to Birdwhistell’s. I felt this closely related to Edward Warman, who

in 1892 identified the main functions of a hand to be: define or indicate, affirm or deny, mold or detect,

conceal or reveal, hold or surrender, accept or reject, inquire or acquire, support or protect, and caress

or assail. The hand does so much in our first and second zones of communication. Gesticulation starts to

make sense and appear similar between humans when the hand motions are classified as Warman

describes. For example, Desmond Morris’ movie “The Human Animal – The Language of the Body,”

presents individuals conversing in a public space from around the world. As an observer, without

knowing any of the language, the gesticulation shows many clues about the conversation, especially

when Warman’s and Birdwhistell’s theories are used as interpreters.

4
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

Warman says “the use of the little finger represents delicacy and refinement,” which could be the first

clue of global gesture plausibility. The pinky is observed in gestures involving intricacy and pin point

instruction across cultures. The pinky is only involved with other gestures though, and there is no single

pinky related gesture that is cross culture. Study of global gestures quickly shows that cultures have

customized gestures to fit their needs. The ‘thumbs up’ hand gesture is a classic example, but so are the

‘head nod’, ‘go away’ or ‘come here’ gestures. Thumbs up origins, according to Desmond Morris, were

from the days of the gladiators in the Roman Coliseums [Rose, 2005]. At that time, thumbs down

indicated a stabbing motion, a gesture meant to indicate ending the life of the losing gladiator. Thumbs

up meant to let them live, or draw back the sword from the gladiator. The reason this gesture pair is a

classic example of cultural differentiation, is if thumbs up can mean something entirely different to

people in Iran, Afghanistan, Nigeria and parts of Italy and Greece, where it is an obscene insult

[Axtell, 1991]. These being true, there is little hope for even an ‘OK’ or ‘yes’ symbol to have global

concurrence. More interesting examples are the seemingly common hand gestures to ‘go away’ or

‘come here’. It is particularly important that these motions be examined because in communicating

with computers, the entire experience is about moving through information. This means pulling and

pushing different views of information to the user. Swiping on the iPad is a ‘go away’ and ‘come here’

type of command, pushing old out to let the next in. Unfortunately, the direction that people motion

with their hands to gesture ‘go away’ or ‘come here’ is inconsistent. Desmond Morris evaluated this

gesture across cultures and found that some people motion their hand away from themselves to invite

something in. A pushing gesture logically makes sense as a rejection gesture, but others have evidently

learned through their environment to interpret it as an inviting motion. This does not change the

plausibility factor of this thesis. Gesticulation is unmistakably natural for humans, certainly more natural

than a keyboard and mouse, or a rectangle with buttons that all look the same.

5
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

In 1972, Warren Teitelman used experimental programming to develop the concept called DWIM (‘Do

What I Mean’), which included a set of gestures to substitute for functions. He was the first individual to

connect gestures to computer functions with elegance. His intent was similar to that of this thesis.

Defining a global vocabulary must start with a system that can efficiently read a user’s gestures and

intelligently ‘do what they mean,’ is a way of approaching a definition for a global vocabulary. It could be

very powerful if gesticulation was not so culturally specific.

Finding a foundation for classifying hand gestures into groups and breaking them down into smaller,

more manageable classes is vital to the gesture set’s success. This will make evaluation and

understanding much easier. To start, there are dynamic and static gestures [Freeman and Roth, 1995]. A

static gesture being a thumbs up, and a dynamic gesture2 being waving. Thumbs up is a stationary

position of the hand that is held and sustained until the message is received. Waving is a gesture of

movement and repetition. The hand shakes back and forth trying to get attention. It is important that

these two gesture properties are supported in a global gesture vocabulary. The following lists all single

hand static and dynamic gestures, all of which are defined primarily by static or dynamic properties:

crossed fingers, finger-gun, middle finger, fist pump, loser-L, money, poking, Vulcan salute, wave, chop,

point, punch, etc. Two hand gestures: air quotes, applause, x across chest, gator chomp, hand rubbing,

jazz hands, victory clasp, whatever-W, chest pound, surrender. Gestures with other body parts: air kiss,

bowing, choking, drinking, curtsey, one knee, hand over heart, mooning, nod, shrug, shush, throat slash,

gang sign, cross, crossed arms, right arm across chest, hand over eyes (look), hands over face, rocking

out.

In 1986, Jean-Luc Nespoulous identified the Nespoulous scheme [Nespoulous, 1986]. This specified

three categories of gestures: mimetic9, deictic10, and arbitrary. Mimetic gestures mimic the object being

described (think air guitar or invisible cell phone call). Deictic gestures are those often related to
6
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

pointing out and having its reference determined by the situation. An example is when one is specifying

that ‘this should go over there,’ the hand is used to direct without words. Arbitrary gestures are those

that are learned, agreed upon, or customized. Though uncommon, once learned they can be used and

understood instantly between others familiar with it. Maria Karam mentions semaphoric gestures in her

thesis, which are gestures involving objects. It’s important to recognize semaphoric gestures, because

holding objects or a hand position could be a key signal in separating gesticulation with a computer

command gesture. Typically semaphoric gestures are used to describe what airplane ground control do

with their marshaling wand when communicating with the pilot while the plane backs up. Mimetic

gestures are interesting because as long as the referenced object is understood by both parties,

understanding can be transferred with just hands. An interface experience could potentially be tailed so

that all operations animate the way they function. For example, grabbing and dragging are often

indicated by a mouse with a hand icon, visually supporting the action the computer is taking. Newer

pointer icons even appear to grip the page as the click and drag is performed.

Edward Warman classified hand gestures into patterns and clusters. Patterns and clusters are similar to

the gesture categories from Maria Karam, Nespoulous, and Freeman, originating however from a macro,

rather than a micro, point of view. According to Warman, patterns were visual gestures, exaggeration

gestures, and fine point gestures, which are made up of static and dynamic, mimetic, and deictic

properties. Gesture clusters use sequences, combinations, symbols and animated properties to describe

unique gestures. Identifying gestures in this way is advantageous when describing to a human what the

system is capable of and what is available as a gesture. Waving is a left to right sequence which is

repeated. Verbally describing this movement to a new user of this gesture vocabulary would be simple.

Combinations of finger gestures alone could offer enough actions to support a large vocabulary,

including: 3 taps, 2 taps (index to middle), 3 finger tap, 2 finger tap, 3 finger tap follow by 1 finger swipe.

7
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

A gesture vocabulary entirely developed as gestures that must be performed in patterns and clusters

could be unique enough to reach an arbitrary gesture status, which Nespoulous said makes the action

easily recalled from the brain. Gestures which use repeated and combined motions could be easily

recognized by the interface, but could also be too complex. With humans, there is also a natural need

when touching and gesturing, to repeat the gesture. It’s repeated to ensure delivery. This is especially

true with hand gestures. Humans repeatedly smash and get frustrated with unresponsive objects which

should be providing feedback. The button to cross the street is designed for smashing, a smart design

choice because it gets smashed while impatient people wait for the light to change. Buttons on a

keyboard get smashed if they don’t work, and humans point at an object over and over again if it’s not

being recognized by the recipient of the gesture. A global gesture library could use a unique gesture

system, such as one that only uses clusters. If a gesture set was simultaneously logical and abstract, the

gestures would not need to fight with natural human gestures. Apple has implemented a system using

abstract gestures, which are logical and relatively abstract. The gestures are simple to remember and

simple for the interface to receive, proving the power of abstracted gesture vocabulary.

Identifying, classifying and accumulating all of these gesture describing words are now the foundation

for a decision on the global plausibility and practicality of gesture interfaces. The hand and fingers work

together in an uncountable amount of gesture combinations. This offers a more robust, natural and

efficient interface for digital communication. Anthropologic studies by Desmond Morris clearly show

that it is impractical to pinpoint any single gesture that remains consistent across cultures. Hands are

also not meant for purely static gestures. The keyboard and mouse have familiarized users with

awkward objects to perform computer functions [Anderson, 1984]. A gesture set of themes, patterns,

and clusters could bring the natural gesticulation in conversation to devices which once felt lifeless. This

also switches the users’ mind set from static manipulation to dynamic manipulation. Hand gestures offer

8
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

more commands than a button interface ever could. The approach to introducing new gesture

technology needs to be studied and evaluated for the most practical method to emerge. The practicality

and plausibility of interface assimilation depends on it. From data gathered thus far, plausibility of a

gesture interface working globally is very high. There is no doubt that if a vocabulary was created, it

would pave way for a much more interactive and communication rich environment for a user and

gesture interface. Hands are natural assistants to all human communication. Gestures are one of a few

ways for humans to communicate without any words, across cultures, and bring meaning to a situation

without knowing each other’s language.

Interpreting gesturing is certainly a plausible interface method that is supported as a natural method for

humans to communicate or support communication. Although, since gesture interfaces are still under-

tested, and interface technology is still so new, gesture interfaces are not practical. Globally, the

keyboard and mouse are still trying to reach new territory. How could a magical hand reading interface

do any better? Easy, imagine walking into a village in a rural village. In a bag are two monitors and two

computers, as well as a keyboard and mouse, plus a multimodal12 computer. During set up of the two

machines, villagers are crowding around. Eventually, type one sentence on the keyboard, move the

mouse, and click a couple menu items. Then, switch to the multimodal computer. Do three gestures

(deictic, dynamic, pattern), speak one sentence in English, and watch them after motioning for them to

try. Also watch them all gesture and talk to each other, pointing at the machines, scratching their heads.

Which will a villager naturally gravitate towards? “The advantage of having gestures read directly off

your hand is that it’s more natural than groping for a mouse. Once harnessed, you can pay more

attention to the application at hand.*Popular Science 1993+” Howver, these application are still

expensive and unable to solve any various cultural problems, which makes assimilation difficult. No one

may feel particularly drawn towards such a drastic call to change within the computer world. The

9
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

establishment of a gesture vocabulary could open doors for practicality later. It is plausible for the

curious user.

Juan Wachs researched hand gesture vocabularies for his thesis on gesture based robotic control, and

broke the approaches down into 3 types: Centrist or Authoritarian14, Consensus15, and Customized16

[Wachs, 2006]. These classifications determine this thesis’s real plausibility, because the way the world

receives a gesture vocabulary is very important to its acceptance. Apple is notorious for the

Authoritarian approach, their devices require the user to conform to the input programmed by Apple.

Many other items, such as TV controllers and cars, have implemented their interfaces the same way.

The iPad comes equipped with multi-touch gesture recognition, but the user must learn the gestures

created by apple. Apple is currently making an attempt to define a global gesture vocabulary using this

technique [Raskin, 2000].

The Consensus approach has previously been used for smaller audiences [Munk, 2001], and not the

world. Trying to gather an equal amount of users from all cultural backgrounds to agree upon a gesture

vocabulary is expensive, and from this thesis one could infer the results. None of the gestures would

jump out as largely common between groups. Furthermore, after the interface was released to the end

user, it would still feel Authoritarian, since that single user had no direct input on the gesture choices.

However, an initial consensus approach could inform an educated Authoritarian approach. The data of a

global gesture consensus approach to hand gesture based interfaces would be unprecedented and

extremely influential in future gesture interface approaches if achieved.

The Customized approach would mean that the delivery of the unit would be a blank gesture slate, and

upon opening, would need calibration. The customized approach requires the user set his or her own

gesture library. This is a tricky path, because the instructions should be globally accessible and

10
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

understandable [Butow, 2007]. A blank slate is not always what people want, they like things to work

out of the box with little to no customization. There would have to be some sort of base interaction for

calibration, for which touch and/or pointing could substitute. The setup instructions would need to be

very well written, and give clear example of the capabilities of their new gesture based interface. This

approach would require less of a gesture vocabulary as it would a system for gesture learning and

customization. This system could be an application for creating custom gesture vocabularies, at which

point one could assume that a library of gesture vocabularies could be available for the new user from a

gesture vocabulary market. This gives cultures ability to find a common library on their own, modify it

until it’s widely used. This would be more of an Android approach, meaning certain companies and

individuals could create gesture vocabularies for their audience.

Considering that the earth has such a wide variety of languages, gestures, and body language styles, it is

not practical to assume that a single gesture vocabulary could suffice. Too many direct opposites exist,

making gesture meaning across cultures unobtainable. The plausibility of a gesture interface vocabulary,

on the other hand, is apparent. Humans naturally navigate to that which they can touch and interact

with on a physical level, receiving feedback from contact. Touching surfaces with hands or pointing at

objects are fundamentally natural gestures for humans. These gestures are the foundation for more

advanced gestures to naturally evolve. When or if an interface can be built using a customizable method

for gesturing, it could exponentially expand the possibilities of computers and humans communicating

fluently together. “When you find you can relate to computer on an intuitive basis, you are well on your

way to accepting the idea that man and computer can exist in intimate symbiosis *New York Magazine+.”

11
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

1) Gesture: A form of non-verbal communication in which visible bodily actions communicate particular
messages, either in place of speech or together and in parallel with spoken words.
2) Dynamic Gesture: Gesture which requires a motion pattern, like waving or stirring.
3) Static Gesture: A gesture which is posed and held, like a thumbs up.
4) Gesticulation: The act of assisting spoken words with visually supportive gestures, mainly including the hands
and arms.
5) HCI: Human Computer Interaction.
6) Gesture Vocabulary: A collection of gestures that are mapped to functions or actions.
7) Co-verbal Gesture: A gesture used directly with words to convey additional meaning.
8) Pantomimes: A gesture used to imitate the actions of others.
9) Mimetic: The hand and finer motions describe an object’s main shape or representative feature.
10) Deictic: Point to establish the identity or spatial location of an object.
11) Semaphores: Using lights, flags, or arms as gesture tools for signaling.
12) Multimodal: An interface which recognizes more than 1 of the following: what you say, what you’re looking
at, gestures, and eye tracking.
13) Electromyography: A tool for measuring electric body signals to detect medical abnormalities, activation
levels, recruitment order or to analyze the biomechanics of human or animal movement.
14) Centrist or Authoritarian Approach: A single individual decides what gesture vocabulary should be used for all
users.
15) Consensus Approach: A group of users, either implicitly or explicitly, decide on a common vocabulary to
express a given set of commands.
16) Customized Approach: Each individual defines his/her own gesture vocabulary.
17) Emblems: Gestures used in place of words.
18) Illustrators: Gestures performed in cooperation with a word, to reinforce its meaning.
19) Regulators: Gestures used to control the speed and flow of the communication.
20) Adaptors: Gestures which release physical or emotional tension.
21) Affect displays: Gestures which display emotion, which emotion plus gesture usually equals dramatic or
exaggerated motions.

12
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

Books___________________________________________________________________________________________________________

 Shneiderman, Ben, and Catherine Plaisant.


Designing the user interface: strategies for effective human -computer interaction .
Addison-Wesley Longman, 2009. pp. eBook.
 Axtell, Roger.
Gestures: the do's and taboos of body language around the world .
John Wiley & Sons, 1991. pp. eBook.
 Warman, Edward.
Gestures and Attitudes.
Boston: LEE AND SHEPARD Publishers, 1892. 374. Print.
 Anderson, Nancy S.
Methods for Designing Software to Fit Human Needs and Capabilities .
Washington, D.C.: National Academy Press, 1984. pp. Print.
 Saffer, Dan.
Designing for interaction: creating smart applications and clever devices .
Peachpit Pr, 2007. 44, 148. Print.
 Butow, Eric.
User interface design for mere mortals.
Addison-Wesley Professional, 2007. 141. Print.
 Raskin, Jef.
The humane interface: new directions for designing interactive systems .
Addison-Wesley, 2000. 9, 24. Print.
 Freeman W. and Roth M.
Orientation histograms for hand gesture recognition, International Workshop on
Automatic Face and Gesture Recognition .
1995. Zurich, June.Print.
 Nespoulous J., Perron P. and Lecours A.
The biological foundation of gestures: motor and semiotic aspects.
Lawrence Erlbaum Associates, Hillsdale, MJ. 1986. Print.
 Munk K.
Development of a gesture plug-in for natural dialogue interfaces, Gesture and Sign
Languages in Human-Computer Interaction .
International Gesture Workshop, GW 2001, London, UK. 2001. Print.
 Barfield, T.
The dictionary of anthropology.
Illinois, 1997. Blackwell Publishing. Print.

13
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility

Magazines______________________________________________________________________________________________________

 Rose, Lacey. "Desmond Morris On Symbolic Gestures."


Forbes 24 OCT 2005: Web. 5 Feb 2011.
<http://www.forbes.com/2005/10/19/morris-desmond-gestures-culture-comm05-
cx_lr_1024morris.html>.
 Antonoff, Michael. "Living In A Virtual World."
Popular Science. Jun 1993: 85. Print.
 O'Malley, Chris. "Computers & Software."
Popular Science. Mar 1998: 31. Print.

Website________________________________________________________________________________________________________

 "Gestures." Wikipedia. Web. 6 Feb 2011.


<http://en.wikipedia.org/wiki/List_of_gestures>.
 "Body Language." Wikipedia. Web. 6 Feb 2011.
<http://en.wikipedia.org/wiki/Body_language>.

PDF______________________________________________________________________________________________________________

 Karam, Maria.
"A framework for research and design of gesture -based human-computer
interactions."
University of South Hampton. (2006): Print.
<h t t p : / / e p r i n t s . e c s . s o t o n . a c . u k / 1 3 1 4 9 / 1 / T h e s i s . p d f >
 Wachs, Juan
“Optimal Hand Gesture Vocabulary Design Methodology for Virtual Robotic Control”
University of the Negev. (2006: Print.
<http://web.ics.purdue.edu/~jpwachs/papers/PHD_JUAN_JW.pdf>

Video___________________________________________________________________________________________________________

 "BBC Present: The Human Animal - The Language of the Body."


Google Video. Web. 5 Feb 2011.
<http://video.google.com/videoplay?docid= -3323021761394989726#>.

14

You might also like