Professional Documents
Culture Documents
org
39
HandShapeRecognitionUsing3DActive
AppearanceModels
RyoYamashita
1
,TetsuyaTakiguchi
2
,YasuoAriki
3
GraduateSchoolofSystemInformatics,KobeUniversity11Rokkodai,Nada,Kobe,6578501,Japan
1
ryo@me.cs.scitec.kobeu.ac.jp;
2
takigu@kobeu.ac.jp;
3
ariki@kobeu.ac.jp
Abstract
In this paper, a recognition method to discern complicated
hand shapes has been proposed using 3D models as an
interface for highfunctionality TV. In such an interface, a
user has to show his hand directly in front of the camera
installed on the TV because it cannot recognize the hand
shape when viewed in arbitrary directions. With this
problem in mind, we have made it possible to track hand
shapes in any direction by using 3D active appearance
models (3DAAMs). With the highfunctional range image
sensor Kinect, RGB images and depthimages of the targets
can be obtained; by which hand shape models are
constructed. Using multiple 3D AAMs, the robust
recognition of such complicated hand shapes in any
directionbecomespossible.
Keywords
ShapeRecognition;ActiveAppearanceModel;3DHandModel
I nt r oduc t i on
Recently, new interface techniques meeting high
demand for use in combination with high
functionality TV or personal computers that use hand
gesturesinordertofreeusersupfromremotecontrols.
With [Kinect for Xbox 360], we can use our hands in
place of a mouse to control a pointer, or play movies
and music. There are many advantages of being free
fromaremotecontrol.Forexample,wecanhandlethe
maindevisesdirectlywithourhandswithoutkeeping
in mind how to use the controller buttons, or
concerningthatitrunsoutofbatteries.
However,evenifahighfunctionalrangeimagesensor
isinutilization,therewillbesomelimitationsinusing
hand gestures, compared to a conventional remote
control,namelylimitationsassociatedwiththefieldof
recognitionandtheabilitytorecognizefingermotions.
IntheKinectinterface,userscontrolthepointeronthe
TVdisplaywiththeirhandsandkeepthepointerstill
for a while on the icon they want. Then they can
conveytheircommandstotheTV.Thismethodisnot
only timeconsuming in regard to giving operation
commands, but it also easily results in mistakes in
controllingthepointer.
On the other hand, when images are used, taken
through a camera, even the finger shape can be
recognized by using appearance information, and the
total cost will be reduced. However, this method
requiresusertoputhishandsinfrontofacameraand
insomecases,thismakesthingsdifficultfortheuser.
With these things in mind, in this paper, a method to
recognize gestures has been put forward (including
theshapeoffingers)usingbothdepthinformationand
appearance information. At first, multiple 3D hand
and finger models are constructed using depth
information and appearance information. Then the
complicated shapes of the hand and fingers are
recognized in any direction, and the 3D model is
switchedaccordingtotheshapechange.
Syst em Fl ow
FIG.1SYSTEMFLOW
Fig. 1 shows the flow of the proposed system. In the
learning phase, multiple 3D models are trained using
depth information obtained from Kinect and RGB
images. In this study, the 3DAAM is applied, which
combines active appearance models (AAMs) [T.F.
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
40
Cootes, 1995] [T.F. Cootes, 1998] [T. F. Cootes, 2002]
and threedimensional information as a three
dimensional model [J. Xiao, 2004] [M. Zhou, 2010] [V.
Blanz, 1999]. The next step in the recognition phase is
tofiteach3Dmodeltotheinputimagesobtainedfrom
the Kinect sensor. Finally, the finger shape is output,
which has a minimal difference between the model
andtheinputfinger.
Hand Feat ur e Ex t r ac t i on Usi ng Mul t i pl e 3D-
AAMs
Hand feature extraction using multiple 3DAAMs [J.
Xiao,2004]isdescribedinthissection.
ActiveAppearanceModels
Cootes proposed an AAM that is mainly used to
extract feature points ofa face to represent the shape
and texture variations of an object with a low
dimensionalparameter vector [T.F. Cootes, 1995]. The
subspaceisconstructedbytheapplicationofprincipal
componentanalysis(PCA)totheshapeandtextureof
anobjectsfeaturepoints.
In the AAM framework, shape vector x and texture
vectorgoftheobjectaregivenasfollows:
( )
T
n n
y x y x x , , , ,
1 1
=
(1)
( )
T
n
g g g g , , ,
2 1
=
(2)
wheretheshapexindicatesthecoordinatesofthefea
turepoints,andthetexturevectorgindicatesthegray
level of the image within the shape. For example, the
AAMofthehandisconstructedusing44shapepoints
as shown in Fig. 2. The texture consists of the
intensities at pixels within triangular areas with
featurepointsasshowninFig.3.
FIG.2HANDFEATUREPOINTS
FIG.3TRIANGULARAREASINSHAPE
Next, PCA is applied to the training data in order to
obtain the normal orthogonal matrices,
S
P and
g
P .
Using the obtained matrices, the shape vector and
texturevectorcanbeapproximatedasfollows:
s s
b P x x + =
(3)
g g
b P g g + =
(4)
where x and
g
are the mean shape and mean texture
of the training images, respectively.
s
b
and
g
b
are the
parametersofvariationfromtheaverage.FurtherPCA
isappliedtothevector b asfollows:
Qc
g g P
x x P W
b
b W
b
T
g
T
s s
g
s s
=
|
|
.
|
\
|
+
|
|
.
|
\
|
=
) (
) (
(5)
|
|
.
|
\
|
=
g
s
Q
Q
Q
(6)
where
s
W
is a diagonal weight matrix for each shape
parameter, allowing for the difference in units
between the shape and texture models.
s
Q
and
g
Q
are
the eigen matrices (including the eigenvectors). c is a
vector of parameters controlling both the shape and
graylevelsofthemodel.Finally,theshapeandtexture
areapproximatedasfunctionsofc.
( ) c Q W P x c x
s s s
1
+ =
(7)
c Q P g c g
g g
+ = ) (
(8)
PoseParameter
Using parameter c, it is possible to control variations
in the shape and texture of the AAM. However, it is
InternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013www.ijcsa.org
41
notpossibletoexpressthepositionoftheobjectinthe
image, the size of the object, or the object pose. The
pose parameter
q
is defined as the global posture
changeasfollows:
| | y trans x trans scale roll q _ _ =
(9)
where roll indicates the rotation to the model plane,
scale
FIG.4BACKGROUNDSUBTRACTION
ExpandingofPoseParameters
The 2D pose parameters in Eq. (9) are expanded into
3DbyaddingyawandpitchasshowninEq.(12).
| | y trans x trans scale roll pitch yaw q _ _ = (12)
The movingvariations ofthese parameters are shown
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
42
inFig.5.
OriginalModel
4.scale
1.yaw
2.pitch
5.trans_x
6.trans_y
x
y
3.roll
FIG.53DPOSEPARAMETERS
Using the six parameters, the 2DAAM can be ex
panded into the threedimensional AAM, and the pa
rameters can transform the model viewed in all
directions, angles, and positions. The transformation
of the shape using this pose parameter is given as
follows:
b a
x RotX RotY RotZ Scale trans x =
(13)
where
a
x
and
b
x
indicate the shape coordinate after
and before transformation, respectively. Each
transformationmatrixisgivenbyEq.(14)(18).
|
|
|
|
|
.
|
\
|
=
1
0
_
_
0
1
0
0
0
0
1
0
0
0
0
1
y trans
x trans
Trans
(14)
|
|
|
|
|
.
|
\
|
=
1
0
0
0
0
0
0
0
0
0
0
0
0
scale
scale
scale
Scale
(15)
( )
( )
( )
( )
|
|
|
|
|
.
|
\
|
=
1
0
0
0
0
180 / * cos
180 / * sin
0
0
180 / * sin
180 / * cos
0
0
0
0
1
t
t
t
t
yaw
yaw
yaw
yaw
RotX
(16)
( )
( )
( )
( )
|
|
|
|
|
.
|
\
|
=
1
0
0
0
0
180 / * cos
0
180 / * sin
0
0
1
0
0
180 / * sin
0
180 / * cos
t
t
t
t
pitch
pitch
pitch
pitch
RotY
(17)
( )
( )
( )
( )
|
|
|
|
|
.
|
\
|
=
1
0
0
0
0
1
0
0
0
0
180 / * cos
180 / * sin
0
0
180 / * sin
180 / * cos
t
t
t
t
roll
roll
roll
roll
RotZ
(18)
Trackingthe3DAAM
Since the 3DAAM consists of threedimensional
points and the input image consists of two
dimensional pixels, it is necessary to project the 3D
space into the 2D space when the error between the
inputimageandthe3Dmodeliscalculated.Usingthe
function P which projects the 3D space into the 2D
space,theerrorcanbecalculatedbyEq.(19).Thenthe
optimized parameters are calculated using the same
approachusedinthe2DAAM.
( ) ( ) ( ) ( ) | |
2
' , ' ; , Im '
s g g
b q x W P g I c Q P g E + =
(19)
Ex per i ment s
Multiple 3DAAMs are constructed for the finger
shapesandthemodelthatproducesthesmallesterror
when the data is input into Eq. (19) is selected as the
finalresult,asasequenceinthefingermovie.
Shape 1 Shape 2
Shape 3 Shape 4
FIG.6FOURMODELSOFFINGERSHAPE
InternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013www.ijcsa.org
43
FIG7.EXAMPLESOFEXPERIMENTRESULTS
ExperimentConditions
FourtypesofthreedimensionalmodelsshowninFig.
6 have been constructed in this experiment. The
numberoffeaturepointsincludedinrespectivemodel
is 44, 34, 34 and 38 points, respectively. During the
construction of each model, four depth images have
been used to learn. Before learning, the background
included in the training images has been excluded
based on the backgroundsubtraction. The training
data and testing data collected from the same single
person; included the finger shape with several
variationstakenfromdifferentdirections.
ExperimentResults
Table1showstheconfusionmatrixofthefingershape
recognition results. As a result of the experiment, it
showed that the higher recognition rate has been
obtained for shape 1 and shape 2. However, the false
recognitioncanbeseenbetweenshape3andshape4.
Itisthoughtthattheirfingershapesaresimilarsothat
theerrorofEq.(19)becomessmaller.Fig.7showsthe
exampleoftheoutcomeofthisexperiment.
TABLE1CONFUSIONMATRIXAMONGFOURSHAPEMODELS
Shape1 Shape2 Shape3 Shape4
Shape1 1.0 0.0 0.0 0.0
Shape2 0.0 1.0 0.0 0..0
Shape3 0.0 0.011 0.966 0023
Shape4 0.0 0.0 0.327 0.673
Conc l usi ons
Inthispaper,bymeansofmultiplethreedimensional
models,thefingershaperecognitionmethodhasbeen
proposed. Using depth image sensor as a device,
threedimensional models were constructed by the
acquisition of the depth information and RGB images
of objects. The model recognized the finger shapes
robustly against the various changes. The
improvementinthefingershapemodelstobeapplied
in the interface to TV or other display devices is the
focusoffurtherresearch.
REFERENCES
Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade,
RealTime Combined 2D+3D Active Appearance
Models.CVPR,535542,2004.
Mingcai Zhou, Yangsheng Wang, and Xiangsheng Huang,
RealTime 3D Face and Facial Action Tracking Using
Extended2D+3DAAMs.ICPR,39633966,2010.
T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham,
Active Shape Models Their Training and
Applications. Computer Vision and Image
Understanding,Vol.6,No.1,3859,1995.
T.F. Cootes, G.J. Edwards, and C.J. Taylor, Active
appearancemodels.ECCV,484498,1998.
www.ijcsa.orgInternationalJournalofComputerScienceandApplication(IJCSA)Volume2Issue2,May2013
44
T.F. Cootes, K. Walker, andC.J. Taylor, ViewBased Active
AppearanceModels.ImageandVisionComputing,657
664,2002.
V. Blanz and T. Vetter, A morphable model for the
synthesisof3Dfaces.SIGGRAPH,187194,1999.