Professional Documents
Culture Documents
in Web Design and Interactive Media from the Art Institute of Seattle
March 2011
By understanding which gestures are similar between cultures and what gesture vocabulary definition
methods have had the most success, humans will appropriately assimilate an understanding of the
plausibility and practicality of global hand and finger gesture technology into their lives.
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
Ever noticed that while people talk on the phone, they still gesticulate, regardless of the fact that the
person on the end of the other phone line cannot see their hands? Since first humans stood up on two
feet, hands have become a major portion of communication. Pointing at prey, pounding a chest while
screaming, and touching are a few of the natural gestures that have evolved from this upward
movement [Barfield, 1997]. Gestures are ingrained so deeply that we use them even when we’re on the
phone talking to someone across town. The gestures are performed regardless of the fact that the
interface isn’t portraying our communicative gestures to the recipient. Gesture understanding interfaces
are becoming popular and affordable, which increases practicality. But are the gestures, decided by HCI5
professionals and developers, the right symbolic language for the job? Is the scope of their gesture
vocabulary6 limited to malleable consumers, or are these gestures practical for all cultures and less tech
savvy humans? This paper intends to evaluate the results of an evolution of gestures across cultures, in
order to uncover the practicality and plausibility of a global gesture vocabulary set. If gestures fulfill
these criteria, should implementation be done where a single individual or company decides the
vocabulary to be used by all users? Or should it follow an open source mentality, and allow users to
develop their own gesture motions, therefore defining custom vocabularies? By understanding which
gestures are similar between cultures and what gesture vocabulary definition methods have had the
most success, humans will appropriately assimilate an understanding of the plausibility and practicality
The urge to gesture is undeniable. It takes extra thinking and muscle control to keep gesticulation from
accompanying human communication. Try giving directions down the street without any gesturing. Non-
verbal behavior has accompanied speech for so long that some actions have moved into the sub-
conscious. For example, when someone calls out to another to see if they are OK, the first thing one may
1
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
do is display an OK gesture. They will not think about which finger or arm to display, it just shoots out
from their person, and is instantly understood on the receiving end. This type of gesture is used to
provide a piece of information; it signals that they do not need immediate assistance. The gesture in less
than one second has communicated a few sentences, validating how quickly the gesture effectively
replaces short verbal communication. As soon as someone blasts through the finish line in first place,
their arms shoot into the air in a victory gesture. They don’t think about how they are going to show
their victory when they win, it’s a natural reaction to the situation. Unless they are in the National
Football League, then a dance is required. Besides the NFL, the emotional state of the individual is
clearly stated simply with their arms. According to Edward Warman in 1892, “All negative gestures fall
below the level of the shoulder-line; all positive gestures rise above the level of the shoulder-line. This is
fully illustrated by animals, their expressive agent being the tail.” Dogs are especially known for their
expressive tails. Often it is the first thing the human examines when confronting a dog. When a dog’s tail
is tucked between its legs, and its head is sunken down towards the floor, humans understand and
empathize with the gesture. Humans communicate with animals all the time, and it is gestures which
make this verbally impossible situation possible. This thesis paper does not intend to focus on sub-
conscious nonverbal communication, unintentional gesturing, or talking with animals. It is driven by the
fact that gestures are interwoven into human experience and communication. This is specially noted and
information. Touching is a naturally occurring human gesture and communication method. It is unique;
it has dual ways of sending and receiving information. We process and consider the feedback of the
object being touched before making our next decision. Consider the simple gesture of touching a line in
a book, it communicates and draws attention to the information instantly. Touching is fast, precise, and
2
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
globally understood. Pointing, similar to touching, is another instant way to communicate vital
information by directing eyes towards the desired object. It is also a global hand gesture, fortunately,
easy to implement in an interface. Touching and pointing start to fail as global gestural input methods
when they demand specifics and require additional pinpoint information beyond x and y positions. For
example, using both hands to point at separate objects could be interpreted as trying to communicate
length, start and end points, selection points, size, width, etc. Pointing and touching are a firm
gestures. According to
followed by semaphoric6
limbs (Figure 1). Over half of all human communication gestures are supported or driven by hands. It is
obvious while watching any human communicating in their native language that hands drive the
majority of the gesticulation4. Maria Karam also notes that gesturing travels over space very well, near
or far: Waving across the airport at a family member, or brushing the hair of a companion. There are
gesture zones which influence the type of gestures performed [Karam, 2006]. The first zone is an
3
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
intimate zone where hand gestures range from touching to 18 inches. The second zone is the personal
zone, which starts at around 18 inches from our person and ends about four feet away (arms length).
The third zone is known as the interpersonal space or social distance (four to eight feet) and the fourth
zone is the public distance zone or anything more than 8 feet. The following will primarily obseve and
Ray Birdwhistell, an American anthropologist, coined the term kinesics which is the study of body
language, facial expressions and gestures to interpret meaning. From his studies, he named an
adaptors20, and affect displays21. Emblems are gestures used in place of words, like the loser ‘L’ done
with the thumb and pointer finger. Illustrators are co-verbal gestures7, like clapping your hands together
in a squishing motion while talking about an opposing team. Regulators are gestures used to control the
speed and flow of the communication. Adaptors are gestures which release physical or emotional
tension, like wiping your forehead with the back of your hand. Affect displays are gestures which display
emotion, and emotion plus gesture usually equals dramatic or exaggerated motions. Hand gestures
could be classified using logic similar to Birdwhistell’s. I felt this closely related to Edward Warman, who
in 1892 identified the main functions of a hand to be: define or indicate, affirm or deny, mold or detect,
conceal or reveal, hold or surrender, accept or reject, inquire or acquire, support or protect, and caress
or assail. The hand does so much in our first and second zones of communication. Gesticulation starts to
make sense and appear similar between humans when the hand motions are classified as Warman
describes. For example, Desmond Morris’ movie “The Human Animal – The Language of the Body,”
presents individuals conversing in a public space from around the world. As an observer, without
knowing any of the language, the gesticulation shows many clues about the conversation, especially
4
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
Warman says “the use of the little finger represents delicacy and refinement,” which could be the first
clue of global gesture plausibility. The pinky is observed in gestures involving intricacy and pin point
instruction across cultures. The pinky is only involved with other gestures though, and there is no single
pinky related gesture that is cross culture. Study of global gestures quickly shows that cultures have
customized gestures to fit their needs. The ‘thumbs up’ hand gesture is a classic example, but so are the
‘head nod’, ‘go away’ or ‘come here’ gestures. Thumbs up origins, according to Desmond Morris, were
from the days of the gladiators in the Roman Coliseums [Rose, 2005]. At that time, thumbs down
indicated a stabbing motion, a gesture meant to indicate ending the life of the losing gladiator. Thumbs
up meant to let them live, or draw back the sword from the gladiator. The reason this gesture pair is a
classic example of cultural differentiation, is if thumbs up can mean something entirely different to
people in Iran, Afghanistan, Nigeria and parts of Italy and Greece, where it is an obscene insult
[Axtell, 1991]. These being true, there is little hope for even an ‘OK’ or ‘yes’ symbol to have global
concurrence. More interesting examples are the seemingly common hand gestures to ‘go away’ or
‘come here’. It is particularly important that these motions be examined because in communicating
with computers, the entire experience is about moving through information. This means pulling and
pushing different views of information to the user. Swiping on the iPad is a ‘go away’ and ‘come here’
type of command, pushing old out to let the next in. Unfortunately, the direction that people motion
with their hands to gesture ‘go away’ or ‘come here’ is inconsistent. Desmond Morris evaluated this
gesture across cultures and found that some people motion their hand away from themselves to invite
something in. A pushing gesture logically makes sense as a rejection gesture, but others have evidently
learned through their environment to interpret it as an inviting motion. This does not change the
plausibility factor of this thesis. Gesticulation is unmistakably natural for humans, certainly more natural
than a keyboard and mouse, or a rectangle with buttons that all look the same.
5
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
In 1972, Warren Teitelman used experimental programming to develop the concept called DWIM (‘Do
What I Mean’), which included a set of gestures to substitute for functions. He was the first individual to
connect gestures to computer functions with elegance. His intent was similar to that of this thesis.
Defining a global vocabulary must start with a system that can efficiently read a user’s gestures and
intelligently ‘do what they mean,’ is a way of approaching a definition for a global vocabulary. It could be
Finding a foundation for classifying hand gestures into groups and breaking them down into smaller,
more manageable classes is vital to the gesture set’s success. This will make evaluation and
understanding much easier. To start, there are dynamic and static gestures [Freeman and Roth, 1995]. A
static gesture being a thumbs up, and a dynamic gesture2 being waving. Thumbs up is a stationary
position of the hand that is held and sustained until the message is received. Waving is a gesture of
movement and repetition. The hand shakes back and forth trying to get attention. It is important that
these two gesture properties are supported in a global gesture vocabulary. The following lists all single
hand static and dynamic gestures, all of which are defined primarily by static or dynamic properties:
crossed fingers, finger-gun, middle finger, fist pump, loser-L, money, poking, Vulcan salute, wave, chop,
point, punch, etc. Two hand gestures: air quotes, applause, x across chest, gator chomp, hand rubbing,
jazz hands, victory clasp, whatever-W, chest pound, surrender. Gestures with other body parts: air kiss,
bowing, choking, drinking, curtsey, one knee, hand over heart, mooning, nod, shrug, shush, throat slash,
gang sign, cross, crossed arms, right arm across chest, hand over eyes (look), hands over face, rocking
out.
In 1986, Jean-Luc Nespoulous identified the Nespoulous scheme [Nespoulous, 1986]. This specified
three categories of gestures: mimetic9, deictic10, and arbitrary. Mimetic gestures mimic the object being
described (think air guitar or invisible cell phone call). Deictic gestures are those often related to
6
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
pointing out and having its reference determined by the situation. An example is when one is specifying
that ‘this should go over there,’ the hand is used to direct without words. Arbitrary gestures are those
that are learned, agreed upon, or customized. Though uncommon, once learned they can be used and
understood instantly between others familiar with it. Maria Karam mentions semaphoric gestures in her
thesis, which are gestures involving objects. It’s important to recognize semaphoric gestures, because
holding objects or a hand position could be a key signal in separating gesticulation with a computer
command gesture. Typically semaphoric gestures are used to describe what airplane ground control do
with their marshaling wand when communicating with the pilot while the plane backs up. Mimetic
gestures are interesting because as long as the referenced object is understood by both parties,
understanding can be transferred with just hands. An interface experience could potentially be tailed so
that all operations animate the way they function. For example, grabbing and dragging are often
indicated by a mouse with a hand icon, visually supporting the action the computer is taking. Newer
pointer icons even appear to grip the page as the click and drag is performed.
Edward Warman classified hand gestures into patterns and clusters. Patterns and clusters are similar to
the gesture categories from Maria Karam, Nespoulous, and Freeman, originating however from a macro,
rather than a micro, point of view. According to Warman, patterns were visual gestures, exaggeration
gestures, and fine point gestures, which are made up of static and dynamic, mimetic, and deictic
properties. Gesture clusters use sequences, combinations, symbols and animated properties to describe
unique gestures. Identifying gestures in this way is advantageous when describing to a human what the
system is capable of and what is available as a gesture. Waving is a left to right sequence which is
repeated. Verbally describing this movement to a new user of this gesture vocabulary would be simple.
Combinations of finger gestures alone could offer enough actions to support a large vocabulary,
including: 3 taps, 2 taps (index to middle), 3 finger tap, 2 finger tap, 3 finger tap follow by 1 finger swipe.
7
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
A gesture vocabulary entirely developed as gestures that must be performed in patterns and clusters
could be unique enough to reach an arbitrary gesture status, which Nespoulous said makes the action
easily recalled from the brain. Gestures which use repeated and combined motions could be easily
recognized by the interface, but could also be too complex. With humans, there is also a natural need
when touching and gesturing, to repeat the gesture. It’s repeated to ensure delivery. This is especially
true with hand gestures. Humans repeatedly smash and get frustrated with unresponsive objects which
should be providing feedback. The button to cross the street is designed for smashing, a smart design
choice because it gets smashed while impatient people wait for the light to change. Buttons on a
keyboard get smashed if they don’t work, and humans point at an object over and over again if it’s not
being recognized by the recipient of the gesture. A global gesture library could use a unique gesture
system, such as one that only uses clusters. If a gesture set was simultaneously logical and abstract, the
gestures would not need to fight with natural human gestures. Apple has implemented a system using
abstract gestures, which are logical and relatively abstract. The gestures are simple to remember and
simple for the interface to receive, proving the power of abstracted gesture vocabulary.
Identifying, classifying and accumulating all of these gesture describing words are now the foundation
for a decision on the global plausibility and practicality of gesture interfaces. The hand and fingers work
together in an uncountable amount of gesture combinations. This offers a more robust, natural and
efficient interface for digital communication. Anthropologic studies by Desmond Morris clearly show
that it is impractical to pinpoint any single gesture that remains consistent across cultures. Hands are
also not meant for purely static gestures. The keyboard and mouse have familiarized users with
awkward objects to perform computer functions [Anderson, 1984]. A gesture set of themes, patterns,
and clusters could bring the natural gesticulation in conversation to devices which once felt lifeless. This
also switches the users’ mind set from static manipulation to dynamic manipulation. Hand gestures offer
8
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
more commands than a button interface ever could. The approach to introducing new gesture
technology needs to be studied and evaluated for the most practical method to emerge. The practicality
and plausibility of interface assimilation depends on it. From data gathered thus far, plausibility of a
gesture interface working globally is very high. There is no doubt that if a vocabulary was created, it
would pave way for a much more interactive and communication rich environment for a user and
gesture interface. Hands are natural assistants to all human communication. Gestures are one of a few
ways for humans to communicate without any words, across cultures, and bring meaning to a situation
Interpreting gesturing is certainly a plausible interface method that is supported as a natural method for
humans to communicate or support communication. Although, since gesture interfaces are still under-
tested, and interface technology is still so new, gesture interfaces are not practical. Globally, the
keyboard and mouse are still trying to reach new territory. How could a magical hand reading interface
do any better? Easy, imagine walking into a village in a rural village. In a bag are two monitors and two
computers, as well as a keyboard and mouse, plus a multimodal12 computer. During set up of the two
machines, villagers are crowding around. Eventually, type one sentence on the keyboard, move the
mouse, and click a couple menu items. Then, switch to the multimodal computer. Do three gestures
(deictic, dynamic, pattern), speak one sentence in English, and watch them after motioning for them to
try. Also watch them all gesture and talk to each other, pointing at the machines, scratching their heads.
Which will a villager naturally gravitate towards? “The advantage of having gestures read directly off
your hand is that it’s more natural than groping for a mouse. Once harnessed, you can pay more
attention to the application at hand.*Popular Science 1993+” Howver, these application are still
expensive and unable to solve any various cultural problems, which makes assimilation difficult. No one
may feel particularly drawn towards such a drastic call to change within the computer world. The
9
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
establishment of a gesture vocabulary could open doors for practicality later. It is plausible for the
curious user.
Juan Wachs researched hand gesture vocabularies for his thesis on gesture based robotic control, and
broke the approaches down into 3 types: Centrist or Authoritarian14, Consensus15, and Customized16
[Wachs, 2006]. These classifications determine this thesis’s real plausibility, because the way the world
receives a gesture vocabulary is very important to its acceptance. Apple is notorious for the
Authoritarian approach, their devices require the user to conform to the input programmed by Apple.
Many other items, such as TV controllers and cars, have implemented their interfaces the same way.
The iPad comes equipped with multi-touch gesture recognition, but the user must learn the gestures
created by apple. Apple is currently making an attempt to define a global gesture vocabulary using this
The Consensus approach has previously been used for smaller audiences [Munk, 2001], and not the
world. Trying to gather an equal amount of users from all cultural backgrounds to agree upon a gesture
vocabulary is expensive, and from this thesis one could infer the results. None of the gestures would
jump out as largely common between groups. Furthermore, after the interface was released to the end
user, it would still feel Authoritarian, since that single user had no direct input on the gesture choices.
However, an initial consensus approach could inform an educated Authoritarian approach. The data of a
global gesture consensus approach to hand gesture based interfaces would be unprecedented and
The Customized approach would mean that the delivery of the unit would be a blank gesture slate, and
upon opening, would need calibration. The customized approach requires the user set his or her own
gesture library. This is a tricky path, because the instructions should be globally accessible and
10
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
understandable [Butow, 2007]. A blank slate is not always what people want, they like things to work
out of the box with little to no customization. There would have to be some sort of base interaction for
calibration, for which touch and/or pointing could substitute. The setup instructions would need to be
very well written, and give clear example of the capabilities of their new gesture based interface. This
approach would require less of a gesture vocabulary as it would a system for gesture learning and
customization. This system could be an application for creating custom gesture vocabularies, at which
point one could assume that a library of gesture vocabularies could be available for the new user from a
gesture vocabulary market. This gives cultures ability to find a common library on their own, modify it
until it’s widely used. This would be more of an Android approach, meaning certain companies and
Considering that the earth has such a wide variety of languages, gestures, and body language styles, it is
not practical to assume that a single gesture vocabulary could suffice. Too many direct opposites exist,
making gesture meaning across cultures unobtainable. The plausibility of a gesture interface vocabulary,
on the other hand, is apparent. Humans naturally navigate to that which they can touch and interact
with on a physical level, receiving feedback from contact. Touching surfaces with hands or pointing at
objects are fundamentally natural gestures for humans. These gestures are the foundation for more
advanced gestures to naturally evolve. When or if an interface can be built using a customizable method
for gesturing, it could exponentially expand the possibilities of computers and humans communicating
fluently together. “When you find you can relate to computer on an intuitive basis, you are well on your
way to accepting the idea that man and computer can exist in intimate symbiosis *New York Magazine+.”
11
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
1) Gesture: A form of non-verbal communication in which visible bodily actions communicate particular
messages, either in place of speech or together and in parallel with spoken words.
2) Dynamic Gesture: Gesture which requires a motion pattern, like waving or stirring.
3) Static Gesture: A gesture which is posed and held, like a thumbs up.
4) Gesticulation: The act of assisting spoken words with visually supportive gestures, mainly including the hands
and arms.
5) HCI: Human Computer Interaction.
6) Gesture Vocabulary: A collection of gestures that are mapped to functions or actions.
7) Co-verbal Gesture: A gesture used directly with words to convey additional meaning.
8) Pantomimes: A gesture used to imitate the actions of others.
9) Mimetic: The hand and finer motions describe an object’s main shape or representative feature.
10) Deictic: Point to establish the identity or spatial location of an object.
11) Semaphores: Using lights, flags, or arms as gesture tools for signaling.
12) Multimodal: An interface which recognizes more than 1 of the following: what you say, what you’re looking
at, gestures, and eye tracking.
13) Electromyography: A tool for measuring electric body signals to detect medical abnormalities, activation
levels, recruitment order or to analyze the biomechanics of human or animal movement.
14) Centrist or Authoritarian Approach: A single individual decides what gesture vocabulary should be used for all
users.
15) Consensus Approach: A group of users, either implicitly or explicitly, decide on a common vocabulary to
express a given set of commands.
16) Customized Approach: Each individual defines his/her own gesture vocabulary.
17) Emblems: Gestures used in place of words.
18) Illustrators: Gestures performed in cooperation with a word, to reinforce its meaning.
19) Regulators: Gestures used to control the speed and flow of the communication.
20) Adaptors: Gestures which release physical or emotional tension.
21) Affect displays: Gestures which display emotion, which emotion plus gesture usually equals dramatic or
exaggerated motions.
12
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
Books___________________________________________________________________________________________________________
13
Hand and Finger Gesture Interfaces An Exploration of Global Practicality and Plausibility
Magazines______________________________________________________________________________________________________
Website________________________________________________________________________________________________________
PDF______________________________________________________________________________________________________________
Karam, Maria.
"A framework for research and design of gesture -based human-computer
interactions."
University of South Hampton. (2006): Print.
<h t t p : / / e p r i n t s . e c s . s o t o n . a c . u k / 1 3 1 4 9 / 1 / T h e s i s . p d f >
Wachs, Juan
“Optimal Hand Gesture Vocabulary Design Methodology for Virtual Robotic Control”
University of the Negev. (2006: Print.
<http://web.ics.purdue.edu/~jpwachs/papers/PHD_JUAN_JW.pdf>
Video___________________________________________________________________________________________________________
14