You are on page 1of 6

InternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013www.ijcsa.

org
39
HandShapeRecognitionUsing3DActive
AppearanceModels
RyoYamashita
1
,TetsuyaTakiguchi
2
,YasuoAriki
3

GraduateSchoolofSystemInformatics,KobeUniversity11Rokkodai,Nada,Kobe,6578501,Japan
1
ryo@me.cs.scitec.kobeu.ac.jp;
2
takigu@kobeu.ac.jp;
3
ariki@kobeu.ac.jp

Abstract
In this paper, a recognition method to discern complicated
hand shapes has been proposed using 3D models as an
interface for highfunctionality TV. In such an interface, a
user has to show his hand directly in front of the camera
installed on the TV because it cannot recognize the hand
shape when viewed in arbitrary directions. With this
problem in mind, we have made it possible to track hand
shapes in any direction by using 3D active appearance
models (3DAAMs). With the highfunctional range image
sensor Kinect, RGB images and depthimages of the targets
can be obtained; by which hand shape models are
constructed. Using multiple 3D AAMs, the robust
recognition of such complicated hand shapes in any
directionbecomespossible.
Keywords
ShapeRecognition;ActiveAppearanceModel;3DHandModel
I nt r oduc t i on
Recently, new interface techniques meeting high
demand for use in combination with high
functionality TV or personal computers that use hand
gesturesinordertofreeusersupfromremotecontrols.
With [Kinect for Xbox 360], we can use our hands in
place of a mouse to control a pointer, or play movies
and music. There are many advantages of being free
fromaremotecontrol.Forexample,wecanhandlethe
maindevisesdirectlywithourhandswithoutkeeping
in mind how to use the controller buttons, or
concerningthatitrunsoutofbatteries.
However,evenifahighfunctionalrangeimagesensor
isinutilization,therewillbesomelimitationsinusing
hand gestures, compared to a conventional remote
control,namelylimitationsassociatedwiththefieldof
recognitionandtheabilitytorecognizefingermotions.
IntheKinectinterface,userscontrolthepointeronthe
TVdisplaywiththeirhandsandkeepthepointerstill
for a while on the icon they want. Then they can
conveytheircommandstotheTV.Thismethodisnot
only timeconsuming in regard to giving operation
commands, but it also easily results in mistakes in
controllingthepointer.
On the other hand, when images are used, taken
through a camera, even the finger shape can be
recognized by using appearance information, and the
total cost will be reduced. However, this method
requiresusertoputhishandsinfrontofacameraand
insomecases,thismakesthingsdifficultfortheuser.
With these things in mind, in this paper, a method to
recognize gestures has been put forward (including
theshapeoffingers)usingbothdepthinformationand
appearance information. At first, multiple 3D hand
and finger models are constructed using depth
information and appearance information. Then the
complicated shapes of the hand and fingers are
recognized in any direction, and the 3D model is
switchedaccordingtotheshapechange.
Syst em Fl ow

FIG.1SYSTEMFLOW
Fig. 1 shows the flow of the proposed system. In the
learning phase, multiple 3D models are trained using
depth information obtained from Kinect and RGB
images. In this study, the 3DAAM is applied, which
combines active appearance models (AAMs) [T.F.
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
40
Cootes, 1995] [T.F. Cootes, 1998] [T. F. Cootes, 2002]
and threedimensional information as a three
dimensional model [J. Xiao, 2004] [M. Zhou, 2010] [V.
Blanz, 1999]. The next step in the recognition phase is
tofiteach3Dmodeltotheinputimagesobtainedfrom
the Kinect sensor. Finally, the finger shape is output,
which has a minimal difference between the model
andtheinputfinger.
Hand Feat ur e Ex t r ac t i on Usi ng Mul t i pl e 3D-
AAMs
Hand feature extraction using multiple 3DAAMs [J.
Xiao,2004]isdescribedinthissection.
ActiveAppearanceModels
Cootes proposed an AAM that is mainly used to
extract feature points ofa face to represent the shape
and texture variations of an object with a low
dimensionalparameter vector [T.F. Cootes, 1995]. The
subspaceisconstructedbytheapplicationofprincipal
componentanalysis(PCA)totheshapeandtextureof
anobjectsfeaturepoints.
In the AAM framework, shape vector x and texture
vectorgoftheobjectaregivenasfollows:

( )
T
n n
y x y x x , , , ,
1 1
=
(1)

( )
T
n
g g g g , , ,
2 1
=
(2)
wheretheshapexindicatesthecoordinatesofthefea
turepoints,andthetexturevectorgindicatesthegray
level of the image within the shape. For example, the
AAMofthehandisconstructedusing44shapepoints
as shown in Fig. 2. The texture consists of the
intensities at pixels within triangular areas with
featurepointsasshowninFig.3.

FIG.2HANDFEATUREPOINTS

FIG.3TRIANGULARAREASINSHAPE
Next, PCA is applied to the training data in order to
obtain the normal orthogonal matrices,
S
P and
g
P .
Using the obtained matrices, the shape vector and
texturevectorcanbeapproximatedasfollows:

s s
b P x x + =

(3)

g g
b P g g + =

(4)
where x and
g
are the mean shape and mean texture
of the training images, respectively.
s
b
and
g
b
are the
parametersofvariationfromtheaverage.FurtherPCA
isappliedtothevector b asfollows:

Qc
g g P
x x P W
b
b W
b
T
g
T
s s
g
s s
=
|
|
.
|

\
|

+
|
|
.
|

\
|
=
) (
) (

(5)

|
|
.
|

\
|
=
g
s
Q
Q
Q

(6)
where
s
W
is a diagonal weight matrix for each shape
parameter, allowing for the difference in units
between the shape and texture models.
s
Q

and
g
Q

are
the eigen matrices (including the eigenvectors). c is a
vector of parameters controlling both the shape and
graylevelsofthemodel.Finally,theshapeandtexture
areapproximatedasfunctionsofc.

( ) c Q W P x c x
s s s
1
+ =
(7)

c Q P g c g
g g
+ = ) (

(8)
PoseParameter
Using parameter c, it is possible to control variations
in the shape and texture of the AAM. However, it is
InternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013www.ijcsa.org
41
notpossibletoexpressthepositionoftheobjectinthe
image, the size of the object, or the object pose. The
pose parameter
q
is defined as the global posture
changeasfollows:

| | y trans x trans scale roll q _ _ =

(9)
where roll indicates the rotation to the model plane,
scale

indicates the size of the model, while


x trans _

and
y trans _

indicatethetranslationbetween x and y ,
respectively.
TrackingAAM
ThegoaloftheAAMsearchistominimizetheerror E
onthetestimageImgasshowninEq.(10)withrespect
to c and
q
,

( ) ( ) ( ) | |
2
' , ' ; , '
s g g
b q x W Img I c Q P g E + =
(10)
where W denotes the Affine warp function, and
( ) ( ) ' , ' ; ,
s
b q x W Img I
indicates the Affine transformed
image controlled by the pose parameter
q
on the test
image. Thus, the most optimized c parameter can be
extractedfromthetestimage.
3DAAM
Inthispaper,the3DAAMisusedtoextractthehand
feature and to estimate the shape of the fingers. The
AAM includes various fluctuation components
(images) because the object images used as the
training data include various changes, such as a left
side object image or rightside one, and an upturned
oradownturnedobject.However,ifsomanychanges
are included in the training data of the object, the
AAMcannotexpresstheirvariationinPCA.Thenthe
extraction accuracy of feature points will become
lower. Since the variation of directions can be
expressedasgeometricchangeofshape,ifa3Dshape
with a right texture can be used, it can express the
directional variations. This is the reason why the 3D
AAM is employed in the hand and finger shape
recognition.Theshapeparameterisexpandedinto3D
usingzobtainedfromadepthimagesensor,asshown
inEq.(11).

( )
T
n n n
z y x z y x x , , , , , ,
1 1 1
=
(11)
RGBDImagesObtainedUsingDepthImageSensor
In the course of the creation of the 3DAAM, three
dimensionalshapeinformationisrequiredforatarget.
The3Dshapecanbeobtained,forexample,usinga3D
scannerorastereocamera.However,wechosetouse
a Kinect sensor because this device is equipped with
an RGB camera and infrared depth sensor. The 3D
data expressed in Eq. (11) is obtained as a set of
coordinate points on a target using a Kinect depth
imagesensor.
EliminationofBackgroundImagesinTriangular
AreasUsingBackgroundSubtraction
Backgroundimagesareincludedintriangularareasof
Fig.3.Therefore,inordertoeliminatethebackground
images, background subtraction is performed using
thedepthdataobtainedfromdepthimagesensor(Fig.
4).

FIG.4BACKGROUNDSUBTRACTION
ExpandingofPoseParameters
The 2D pose parameters in Eq. (9) are expanded into
3DbyaddingyawandpitchasshowninEq.(12).

| | y trans x trans scale roll pitch yaw q _ _ = (12)
The movingvariations ofthese parameters are shown
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
42
inFig.5.
OriginalModel
4.scale
1.yaw
2.pitch
5.trans_x
6.trans_y
x
y
3.roll

FIG.53DPOSEPARAMETERS
Using the six parameters, the 2DAAM can be ex
panded into the threedimensional AAM, and the pa
rameters can transform the model viewed in all
directions, angles, and positions. The transformation
of the shape using this pose parameter is given as
follows:

b a
x RotX RotY RotZ Scale trans x =

(13)
where
a
x
and
b
x
indicate the shape coordinate after
and before transformation, respectively. Each
transformationmatrixisgivenbyEq.(14)(18).

|
|
|
|
|
.
|

\
|
=
1
0
_
_
0
1
0
0
0
0
1
0
0
0
0
1
y trans
x trans
Trans


(14)

|
|
|
|
|
.
|

\
|
=
1
0
0
0
0
0
0
0
0
0
0
0
0
scale
scale
scale
Scale



(15)

( )
( )
( )
( )
|
|
|
|
|
.
|

\
|

=
1
0
0
0
0
180 / * cos
180 / * sin
0
0
180 / * sin
180 / * cos
0
0
0
0
1
t
t
t
t
yaw
yaw
yaw
yaw
RotX


(16)

( )
( )
( )
( )
|
|
|
|
|
.
|

\
|

=
1
0
0
0
0
180 / * cos
0
180 / * sin
0
0
1
0
0
180 / * sin
0
180 / * cos
t
t
t
t
pitch
pitch
pitch
pitch
RotY

(17)

( )
( )
( )
( )
|
|
|
|
|
.
|

\
|
=
1
0
0
0
0
1
0
0
0
0
180 / * cos
180 / * sin
0
0
180 / * sin
180 / * cos
t
t
t
t
roll
roll
roll
roll
RotZ


(18)
Trackingthe3DAAM
Since the 3DAAM consists of threedimensional
points and the input image consists of two
dimensional pixels, it is necessary to project the 3D
space into the 2D space when the error between the
inputimageandthe3Dmodeliscalculated.Usingthe
function P which projects the 3D space into the 2D
space,theerrorcanbecalculatedbyEq.(19).Thenthe
optimized parameters are calculated using the same
approachusedinthe2DAAM.

( ) ( ) ( ) ( ) | |
2
' , ' ; , Im '
s g g
b q x W P g I c Q P g E + =
(19)
Ex per i ment s
Multiple 3DAAMs are constructed for the finger
shapesandthemodelthatproducesthesmallesterror
when the data is input into Eq. (19) is selected as the
finalresult,asasequenceinthefingermovie.
Shape 1 Shape 2
Shape 3 Shape 4

FIG.6FOURMODELSOFFINGERSHAPE
InternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013www.ijcsa.org
43

FIG7.EXAMPLESOFEXPERIMENTRESULTS
ExperimentConditions
FourtypesofthreedimensionalmodelsshowninFig.
6 have been constructed in this experiment. The
numberoffeaturepointsincludedinrespectivemodel
is 44, 34, 34 and 38 points, respectively. During the
construction of each model, four depth images have
been used to learn. Before learning, the background
included in the training images has been excluded
based on the backgroundsubtraction. The training
data and testing data collected from the same single
person; included the finger shape with several
variationstakenfromdifferentdirections.
ExperimentResults
Table1showstheconfusionmatrixofthefingershape
recognition results. As a result of the experiment, it
showed that the higher recognition rate has been
obtained for shape 1 and shape 2. However, the false
recognitioncanbeseenbetweenshape3andshape4.
Itisthoughtthattheirfingershapesaresimilarsothat
theerrorofEq.(19)becomessmaller.Fig.7showsthe
exampleoftheoutcomeofthisexperiment.
TABLE1CONFUSIONMATRIXAMONGFOURSHAPEMODELS
Shape1 Shape2 Shape3 Shape4
Shape1 1.0 0.0 0.0 0.0
Shape2 0.0 1.0 0.0 0..0
Shape3 0.0 0.011 0.966 0023
Shape4 0.0 0.0 0.327 0.673
Conc l usi ons
Inthispaper,bymeansofmultiplethreedimensional
models,thefingershaperecognitionmethodhasbeen
proposed. Using depth image sensor as a device,
threedimensional models were constructed by the
acquisition of the depth information and RGB images
of objects. The model recognized the finger shapes
robustly against the various changes. The
improvementinthefingershapemodelstobeapplied
in the interface to TV or other display devices is the
focusoffurtherresearch.
REFERENCES
Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade,
RealTime Combined 2D+3D Active Appearance
Models.CVPR,535542,2004.
Mingcai Zhou, Yangsheng Wang, and Xiangsheng Huang,
RealTime 3D Face and Facial Action Tracking Using
Extended2D+3DAAMs.ICPR,39633966,2010.
T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham,
Active Shape Models Their Training and
Applications. Computer Vision and Image
Understanding,Vol.6,No.1,3859,1995.
T.F. Cootes, G.J. Edwards, and C.J. Taylor, Active
appearancemodels.ECCV,484498,1998.
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
44
T.F. Cootes, K. Walker, andC.J. Taylor, ViewBased Active
AppearanceModels.ImageandVisionComputing,657
664,2002.
V. Blanz and T. Vetter, A morphable model for the
synthesisof3Dfaces.SIGGRAPH,187194,1999.

Ryo Yamashita received the B.E. in computer science from


Kobe University in 2011. He is currently in graduate school
ofsysteminformatics,KobeUniversity.
Tetsuya Takiguchi received the B.S. degree in applied
mathematics from Okayama University of Science,
Okayama,Japan,in1994,andtheM.E.andDr.Eng.degrees
in information science from Nara Institute of Science and
Technology,Nara,Japan,in1996and1999,respectively.
From 1999 to 2004, he was a researcher at IBM research,
TokyoResearchLaboratory,Kanagawa,Japan.Since2004he

has been an Associate Professor at Kobe University. He


stayed at University of Washington as visiting scholar from
April2012toOctober2012.
Dr. Takiguchi is a member of IEEE, Information Processing
SocietyofJapan,andAcousticalSocietyofJapan.
YasuoArikireceivedhisB.E.,M.E.andPh.D.ininformation
science from Kyoto University in 1974, 1976 and 1979,
respectively.
HewasanassistantprofessoratKyotoUniversityfrom1980
to 1990, and stayed at Edinburgh University as visiting
academic from 1987 to 1990. From 1990 to 1992 he was an
associate professor and from 1992 to 2003 a professor at
Ryukoku University. Since 2003 he has been a professor at
KobeUniversity.Heismainlyengagedinspeechandimage
recognition and interested in information retrieval and
database.
Prof.ArikiisamemberofIEEE,IPSJ,JSAI,ITEandIIEEJ.

You might also like