Analysis of Landmarks in Recognition of Face Expressions: Application Problems

APPLICATION PROBLEMS
Analysis of Landmarks in Recognition of Face Expressions1

N. Alugupallya, A. Samala, D. Marxb, and S. Bhatiac
Department of Computer Science and Engineering, University of Nebraska Lincoln b Department of Statistics, University of Nebraska Lincoln c Department of Mathematics and Computer Science, University of Missouri St. Louis e mail: sanjiv@acm.org AbstractFacial expression is a powerful mechanism used by humans to communicate their emotions, intentions, and opinions to each other. The recognition of facial expressions is extremely important for a responsive and socially interactive human computer interface. Such an interface with a robust capability to recognize human facial expressions should enable an automated system to effectively deploy in a variety of applications, including human computer interaction, security, law enforcement, psychiatry, and education. In this paper, we examine several core problems in face expression analysis from the perspective of landmarks and distances between them using a statistical approach. We have used statistical analysis to determine the landmarks and features that are best suited to recognize the expressions in a face. We have used a standard database to examine the effectiveness of landmark based approach to classify an expression (a) when a face with a neutral expression is available, and (b) when there is no a priori information about the face. Keywords: Face expressions, statistical analysis, linear discriminant analysis, face features. DOI: 10.1134/S105466181104002X
a
1. INTRODUCTION The human face is a rich source of nonverbal com munication and may provide important information about human thought and behavior. The facial expres sion is a powerful and immediate means for human beings to communicate their emotions, intentions, and opinions to each other [7]. The communicative power of the face makes it a focus of attention during social interactions. It displays emotion, regulates social behavior and signals communicative intent [7, 14]. A face is not only a multi message system but also a multi signal system. The signals inferred from a face can be characterized as static (such as skin color), slow (such as permanent wrinkles), and rapid (such as rais ing the eyebrows) [9]. The messages and signals emanating from the face form the facial expression and provide a guide to the disposition of a person. This makes the facial expres sion important in the manner people conduct social interactions. Human beings, both consciously and unconsciously, use facial expressions to indirectly communicate their emotions and intentions. Hence, any system that interacts with humans needs to account for facial expression as a communication tool. In recent years, there has been a growing interest in improving all aspects of interaction between humans and computers. The emerging field of human com puter interaction (HCI) has been of interest to
1The article is published in the original.
researchers from a number of diverse fields, including computer science, psychology, and neuroscience. The recognition of facial expressions is important for a responsive and socially interactive HCI. Besides HCI, computers with the capability to recognize facial expressions will have a wide range of applications such as security, law enforcement, psychiatry, and educa tion. The recognition of facial expression forms a crit ical component of intelligent systems based on HCI. The more complex interactions in an HCI based sys tem take place from a human to a computer. An auto mated system that can determine the emotions of a person via his/her expressions provides the system with the opportunity to customize its response. For example, a robot capable of recognizing facial expres sions can bring a social dimension to the interaction and can be used in daily life. A system to recognize the expression can improve the effectiveness of an on line/distance learning system. An automobile may be able to detect fatigue in its driver and warn him/her. The recognition of facial expression can also be used in lie detection to provide incremental validity to poly graph examination. A long term application of such systems may be in providing security in public places, where the expressions and other body language pro vide clues to the state of the human emotion. Although humans can easily recognize expressions, the development of automated systems to recognize facial expressions and to infer emotions from those expressions in real time is a challenging research topic. We are interested in the importance of feature points in a human face to differentiate between expressions.
Received October 10, 2010
ISSN 1054 6618, Pattern Recognition and Image Analysis, 2011, Vol. 21, No. 4, pp. 681693. Pleiades Publishing, Ltd., 2011.
682
ALUGUPALLY et al.
In this paper, we examine several core problems in face expression analysis to provide greater resolution to the understanding of the problem. Specifically, we have analyzed the effect of facial feature points on the expressions in the face. Our approach is based on mea surements (distances and ratios) between a set of canonical feature points on a face to train an efficient classifier. We have examined the trackability of fea tures as well as an optimal set of features for expression recognition in the presence or absence of a face with neutral expression. Using the information thus col lected, we examine the effectiveness of a scheme to dynamically recognize an expression in a video sequence. All these results provide greater understand ing to develop automated systems that can dynami cally recognize the facial expressions of humans. 2. RELATED RESEARCH The importance of expressions in a face is well established. Psychologists have grappled with the expressions in a human face for a long time. However, the use of a computer to recognize faces and facial expressions is a relatively recent topic of research, with most of the work initiated in the past decade. In this section, well discuss the role and importance of facial expressions in the field of psychology, the development of databases of facial expressions, and the research in recognizing the expression in a face by a computer. 2.1. Psychology of Expressions Psychologists have been working in the analysis and understanding of human facial expressions since the nineteenth century. The earliest documented research on the subject is attributed to Darwin who hypothe sized that there are universal facial expressions for emotions, that is, different cultures express the emo tions on the face in the same manner [6]. However, a majority of the studies concluded that the expressions in the face could not be attributed to the emotions [22]. More recently, the research has concentrated on the classification of facial expressions. Ekman and Friesen [8] proposed six primary emotions each of which has a prototypical facial expression involving changes in multiple regions of the face. These proto typical emotional displays are also known as six basic emotions: happy, surprise, sad, fear, anger, and disgust. The next step was to create a quantifiable description of these emotions. The quantifiable description is pro vided by Facial Action Coding System (FACS), devel oped by Ekman and Friesen [10]. FACS was designed based on human observations to detect subtle changes in facial features. FACS is a detailed, technical guide that explains how to categorize facial behaviors based on the mus cles that produce them. It relates the muscular action to facial appearance. It uses a set of primitive measure
ment units, called Action Units (AU) that is a combi nation of linguistic description of muscles. The rich ness of AUs allows for the representation of all the vis ible facial expressions, either by an individual AU or by a combination of a set of AUs. FACS uses 44 AUs to describe the facial expressions with regard to their location as well as their intensity. The intensity is fur ther defined on a scale with five levels of magnitude, from trace to maximum. Some other relevant research on interaction of expressions and emotions is pre sented in [18, 43, 44]. 2.2. Face Expression Databases The best known database for facial expression anal ysis has been developed at Carnegie Mellon Univer sity, and is known as CMU Pittsburgh AU Coded Facial Expression Database, or Cohn Kanade Data base [19]. It provides a large, representative test bed for comparative studies of different approaches to facial expression analysis. The database includes approximately 2000 image sequences from over 100 subjects. The subjects are 100 university students enrolled in introductory psychology classes. They ranged in age from 18 to 30 years. The subject distribution across genders is 69% female and 31% male. In terms of racial distribution, 81% of the subjects are Euro American, 13% are Afro American, and 6% belong to other races. The Cohn Kanade database was created in an observation room equipped with a chair for the subject and two Panasonic WV3230 cameras, each connected to a Panasonic S VHS AG 7500 video recorder with a Horita synchronized time code gener ator. The cameras were located directly in front of the subject, Some other databases are described in refer ences [26, 47]. 2.3. Automated Face Expression Analysis Most of the early research in facial expression anal ysis was primarily in the field of psychology. One of the earliest references in expression analysis dates back to 1978, when Suwa et al. [39] presented a preliminary investigation in expression analysis using an image sequence. The automatic analysis of facial expressions is a complex task as physiognomies of faces vary con siderably from one individual to another due to differ ent age, ethnicity, gender, facial hair, and occluding objects such as glasses and hair. A detailed survey of many different aspects of face expression analysis has been presented by Pantic and Rothkrantz [30]. Here we briefly summarize three important stages of auto mated facial expression analysis: (a) face tracking, (b) facial expression data extraction, and (c) expres sion classification. Face Tracking: The face tracking stage starts with the automatic detection of the face in the frame under consideration. The detection method should be able to
Vol. 21 No. 4 2011
PATTERN RECOGNITION AND IMAGE ANALYSIS
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS
683
locate faces in complex scenes, possibly with cluttered background. Some expression analysis methods require the exact position of the face to extract features of interest whereas some other methods can work with an approximate location of the face. Tracking is more like the face detection problem, but in a dynamic environment. Hong et al. used the PersonSpotter system to perform real time tracking of faces [17, 38]. The PersonSpotter system detects a face and creates a bounding box around it. Then, it obtains the exact dimensions of the face by fitting a labeled graph onto the bounding box. Another technique to locate the faces was developed by Essa and Pentland [12] using the view based and modular eigenspace method developed by Pentland et al. [33]. Facial Expression Data Extraction: The data extraction for facial expressions takes place after the face is successfully tracked. This step can be catego rized into two classes: feature based and template based data extraction. The feature based data extrac tion methods use texture and spatial information of the features in a face, such as eyes, mouth, eyebrows and nose, to classify facial expression. The template based methods use 2 D or 3 D models of head and face as templates to extract information that can be used to classify the expression. Essa et al. proposed a 3 D facial model augmented by anatomically based muscles [12]. They used a Kalman filter in correspon dence with optical flow computation to extract muscle action in order to form a new model of facial action. Tomasi and Kanade [42] developed a feature tracker based on the matching measure known as sum of squared intensity differences (SSD), using a transla tion model. This was followed by an affine transforma tion model by Shi and Tomasi [37]. The tracker described by Shi and Tomasi [37] is based on earlier work by Lucas, Kanade, and Tomasi [24, 42], and is commonly known as KLT tracker. The KLT tracker locates the features by examining the minimum Eigenvalue of gradient matrices. The features are then tracked by using the Newton Raphson method to minimize the difference between two frames, using a small window around the location of point in the two consecutive frames. Expression Classification: Expression classification is the final step performed in a facial expression recog nition system. The classification is done either (a) directly from the face or its features or (b) by rec ognizing the actions units (AUs) first and then using the FACS [10] to classify the expressions. FACS based Classification: In FACS based classifi cation, a system uses temporal information in terms of AUs to discriminate between the expressions. Such systems may use optical flow to track the motion of facial features such as eyes, brows, nose, and mouth in a rectangular bounding box. This system requires the manual initialization of all the bounding boxes sur
rounding the facial features at the outset. It then per forms the tracking of features for all frames from neu tral to full blown expression. Seyedarabi et al. [35] developed a facial expression recognition system to classify expressions based on facial features using two classifiers: neural networks and Fuzzy Inference Sys tem. Essa et al. developed an optical flow based system to recognize action units from facial motions [12]. Pantic and Rothkrantz [31] describe a rule based sys tem to classify expressions based on 30 feature points. A total to 31 AU classes are considered in the system. Lien et al. use a Hidden Markov Model (HMM) based model to recognize the AUs. [23]. El Kaliouby and Robinson use a multi level dynamic Bayesian network classifier to model complex mental states [11]. Mental states include agreement, concentrating, disagree ment, interested, thinking, and unsure. Support vector machines and variants have been used widely to clas sify facial expressions [20, 27, 46]. Kotsia and Pitas [21] use geometric deformations in some candidate notes in a face during face expressions to train a mul ticlass support vector machine to classify the expres sions. Valstar and Pantic describe a hybrid model that combines a support vector machine and an HMM classifier to recognize the AUs [45]. The hybrid model performed better than a system that uses only an SVM. Tian et al. track the lips and use neural networks to identify 16 AUs in face image sequences [41]. Pantic and Patras use a particle filtering approach to recog nize 27 AUs by tracking the movements of feature points in the face [28, 29]. An information fusion approach using Dynamic Bayesian networks (DBNs) to model the temporal dynamics of facial expressions is described in [16, 50]. The recognition of facial expressions is accomplished by fusing the information from the current as well as past observations. Bartlet et al. present early work in detection of spontaneous facial expressions using a FACS based system [2]. They use support vector machines and AdaBoost to predict the intensity of the AUs in each frame of the video. Non FACS Based Classification: Gokturk et al. [15] proposed a novel 3D model based face tracker, which does not require the user to pose in a 3 D position. In this approach, pose and shape characteristics are fac tored into two separate signature vectors through tracking. The facial pose is recognized using monocu lar 3 D tracking of the face in each frame. Using a trained support vector machine depending on the pose, the shape vector is passed to the machine and the expression is recognized. Cohen et al. [5] used narve should be naive Bayes classifier to recognize the emo tions through facial expressions displayed in a video sequence. A wire frame model with 12 facial motion measurements based on the model proposed by Tao and Huang [40] is used for facial expression recogni tion. The 12 features correspond to the magnitude of the 12 facial motion measurements defined in the face model and the combination of these features define
Vol. 21 No. 4 2011
684
ALUGUPALLY et al.
3. DATA SETS
1
2 5 10 11 9
3 8 12 13 14 15 21 22 23
4 6 7
19
20
Fig. 1. Feature Points in the Face.
the 7 basic classes of facial expressions, including neu tral. Training and testing of the classifier is done with both person dependent and independent manner using 6 subjects. Zhao et al. [52] use a fuzzy kernel along with support vector machines to classify the six basic expressions. Bindu et al. [3] use a 3 D cognitive model to represent expressions with positive and neg ative reinforcers. Yeasin et al. [49] use a discrete HMM to train the characteristics of the expressions. Ander son and McOwen [1] have described a real time sys tem for facial expressions in dynamic and cluttered scenes using optical flow. Motion signatures are obtained from the face and classified using support vector machines. Displacements of a set of feature points are used by Zhou et al. [53] to classify expres sions as well. Yang et al. proposed a framework for video based facial expression recognition [48]. They use a HAAR like feature as face representation and cluster the features for different expressions to derive temporal patterns. They use a boosting approach to design classifiers. They show the effectiveness of the approach using Cohn Kanade database. Zhao and Pietikainen extend the concept of texture to temporal domain and use it for expression analysis [51]. The texture is modeled with volume local binary patterns that combine both motion and appearance. To man age computational costs the co occurrences on only three orthogonal planes are considered. In summary, a significant amount of work has already been performed in automating the process of recognizing expressions from a face. In recent years, the focus has been on FACS based methods. Despite much progress, there are still significant challenges in solving this problem. Our research adds to the body of knowledge by examining several central issues includ ing trackability of features and optimal features (dis tances and displacements) based on fiducial points. We also present a statistical approach to classify faces with expressions and provide a mechanism to classify expressions dynamically.
There is no standard dataset that is universally used for uniform comparison of different algorithms. Many researchers use the Cohn Kanade database [19] to analyze face expressions, even though there are some limitations. We have also used the same database for our experiments. Details of this database are described in Section 2.2. In addition, we use a database of feature points that is described below. Feature Points Database: Expressions manifest themselves as changes in the faces, which can be tracked by monitoring the location of key features. For example, the eyebrows become more elliptical (raised up) while expressing a surprise and do not change their shape during happiness. For our research, we have extracted the location of a set of features in the face images. These are obtained in two steps. First, the key points are manually located in the first image using a graphical user interface. Then these features are tracked using an automated method. Figure 1 shows the features that are used in our research. The set of points are based on features used for face reconstruc tion surgery [13] and have also been used for a number of face recognition applications [34, 36]. We have adapted the automated tracker developed by Lucas and Kanade [24] to track the feature points in successive image frames as the face changes expres sion from neutral to a full blown expression. In our experiments, we use a 9 9 pixel window to search the feature point in the current frame, based on its loca tion in the previous frame. The window size has been determined experimentally and is related to the image resolution; this can be adapted for larger images appropriately. Once the features are tracked over all the frames, their locations are stored in a database of feature points. All subsequent analysis is performed using these feature points. It should be noted that a manual initial step is not a serious limitation of our work. Many automated feature extraction algorithms are being developed [4] and they can be used to locate the initial feature points. Furthermore, in many appli cations in which the identity of a person is known, the feature points can be extracted once and stored in a database. These can be used to perform an initial tracking to locate the feature points in the first frame. 4. ANALYSIS OF FACIAL EXPRESSIONS The goal of this section is to answer some funda mental questions that will be helpful in developing an automated system. These questions are: 1. Which feature points in the face are easy to track during the development of expressions? 2. What is the best feature set to recognize expres sions (a) if the identity of the person is not known and (b) if the identity of the person is known and we have a neutral face image?
Vol. 21 No. 4 2011
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS Correctly tracked, % Correctly tracked, % 100 80 60 40 20 0
1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22 23 Feature points
685
100 80 60 40 20 0
1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 10 12 14 16 18 20 22 Feature points Anger Disgust Fear Happy Sad Surprise
Fig. 2. Success rates for tracking individual feature points.
3. How well can we classify expressions with the optimal set of features? 4. How do we build an optimized system to recog nize expressions in a dynamic setting? 4.1. Tracking Efficiency The basis of automated expression recognition sys tem is tracking the motion of the relevant landmarks in a human face. Detecting feature points dynamically is a challenge due to large variations in facial size, feature location, pose, lighting conditions, and motion pat terns. Some features are tied to more rigid parts of the face, e.g., corners of eyes, while others have greater degrees of freedom in their movements, e.g., middle parts of the lips. Some features are more stable for some expressions but unstable for others. Identifica tion of features that can be reliably tracked will provide guidance to automated expression analysis. For our analysis, we use the Cohn Kanade database described in Section 2.2. We have manually identified a set of 23 feature points for each starting frame as shown in Fig. 1. The initial feature points are then passed to an automated tracker which follows the points in succes sive frames until the end of expression. To determine the effectiveness of the tracker, we analyzed the tracked points individually and as a func tion of expressions. The feature points are classified into three categories based on how well they are tracked. 1. Correct: If a point is correctly tracked, it is classi fied as correct. 2. Lost: If the tracker is not able to find a feature point, it is labeled as lost. This is generally caused because the location of the feature point in successive frames is significantly different. 3. Drifted: If the tracker finds a match for the fea ture point in a frame, but does not do it accurately, we classify it as drifted. Figure 2a shows the percentage of correctly tracked features for all types of expressions combined. Figure 2b shows the percentage of feature points that are cor rectly tracked for different expressions. Figure 2 shows that the features are best tracked (80% on the average)
during sad, happy, fear, and anger expressions. Track ing is most difficult during the surprise expression (70% on the average). This is because there is a sudden change in the location of features during a surprise expression. The feature points fall out of the search window for tracking, resulting in the failure of the tracker. Feature point 20 (middle of the mouth) is the most difficult point to track in general and is especially hard for surprise, happy, and fear. During these expressions, the mouth opens making the tracking of the point very difficult as the composition of the point disappears. Feature points 22 and 23 (left and the midpoint of the upper lip) are also relatively hard to track (60% on average for all types of expressions) because the points lack distinct window that can be tracked in the consec utive frames. The uppermost points on the lip (feature points 16, 17 and 18) are successfully tracked around 70% of the time. Feature points 1, 10, 13, 14, and 15 (Midline of the forehead, inner and outer side of the left eye, and lowest point on the left eye) are tracked most successfully for all types of expressions due to rel atively small displacement during the evolution of the expression. Based on these observations, our general conclu sion is that Feature Points 20, 22 and 23 should not be used for analyzing expressions due to their poor trackability (less than 50%). Furthermore, since the uppermost points of the lips are also lost most of the time, we took an average of those three points (16, 17, and 18) to increase the tracking percentage. Thus, we limit ourselves to a total of 18 feature points that are easy to track for our analysis. Next, we will look at the set of features that are used to recognize an expression. 4.2. Optimal Feature Set to Recognize an Expression We select features to recognize an expression by analyzing two different scenarios: (a) when we have a face with expression, but no a priori information is available, and (b) when a neutral face is also available for the same subject. The motivation to use a small
Vol. 21 No. 4 2011
686
ALUGUPALLY et al.
Table 1. Most significant distances of different types (without a priori knowledge) Step 1 2 3 4 5 6 7 8 9 10 Euclidean Distance E04,18 E17,18 E08,16 E08,09 E03,15 E12,18 E12,17 E16,18 E16,17 E12,16 Block Distance B04,18 B16,18 B17,18 B08,09 B03,15 B02,17 B02,13 B04,16 B14,18 B06,17 Mixed Distances E04,18 E17,18 E08,16 E08,09 E03,15 E12,18 E12,17 B12,17 B12,18 E04,16
The Euclidean distance dE and the Block distance dB between the points Pi and Pj are given by: d E ( P i, P j ) = ( xi xj ) + ( yi yj )
2 2
d B ( P i, P j ) = x i x j + y i y j Thus a total of 153 values for each of Euclidean dis tance and block distance are computed. After comput ing these distances, we normalize them by using the distance between the left end of left eye and the right end of right eye. Our goal is to analyze how well a face can be classified into one of the expressions based on the Euclidean distance, block distance, and their com bination. We used stepwise discriminant analysis to obtain the 10 most important distances between differ ent feature points (Table 1). Figure 3 shows the recognition rates for the three approaches for different expressions. It shows that images with sad expression have the lowest recognition rate for expression. The most misclassifications for these images are fear and neutral. Overall, the worst recognition rate is observed with images with the expression of fear when using block distance. Images with neutral, surprise, and disgust expressions have the highest recognition rate in all modes. Using both Euclidean and Block distances in conjunction resulted in better overall performance (82.6%) than using only the Euclidean distance (79.8%) or the Block distance (78.4%). Analysis with a priori knowledge of the neutral face: In many applications, it is reasonable to expect that a neutral face of a person is available at the outset. Examples of such situations include human computer interfaces where the expressions are likely to be moni tored continuously. In such cases, in addition to the Euclidean and block distances, we can use displace ment of the features from the neutral positions. We can
subset of feature points is a significant reduction in recognition time, a critical issue in real time applica tions. Analysis with no a priori knowledge: When there is no a priori information about the face, our goal is to classify a given face into either neutral or one of the six basic expressions. Our dataset consists of 598 images; 299 images with neutral expressions, and 299 images with the six full blown expressions. For our analysis, we start with the set of 18 feature points described ear lier. We use two types of distance between each pair of feature points and use the following convention to rep resent them. Ei, jEuclidean distance between feature points Pi and Pj, 1 i, j 18 and j > i. Bi, jBlock distance between the feature points Pi and Pj, where 1 i, j 18 and j > i.
Fear
Happy
Surprise
Disgust
Anger
Sad
Expression
Fig. 3. Recognition rates for the different expressions with different distances. PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
Neutral
100 90 80 70 60 50 40 30 20 10 0
Recognition rate, %
Euclidean Block Combined
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS Table 2. The 10 most important distances of different types (with a priori knowledge) Step 1 2 3 4 5 6 7 8 9 10 Displacements 02,v 18,v 17,h 15,v 15,h 01,v 14,v 17,v 03,v 02,h Euclidean Distances E15,17 E15,18 E03,18 E02,14 E15,16 E01,14 E02,17 E13,14 E01,18 E02,03 Block Distances B02,17 B15,17 B02,15 B03,17 B01,16 B03,14 B13,16 B03,13 B02,08 B04,07 Distances and Displacements E05,17 E15,17 B02,15 B02,17 E03,17 01,h E03,14 B15,17 E01,18 B01,02
687
use both horizontal and vertical displacement as defined below: Displacement vectors: t, h, and i, v, represent the horizontal and vertical displacement of the feature i from the neutral face to the expressive face. Since we use 18 feature points we have a total of 36 displacements, 153 Euclidean distances and 153 block distances. Using these variables, we use the discrimi nant analysis to get estimation of the classification of various faces. Table 2 summarizes the 10 best features, as given by their index in the set of 18 features, for each category determined using a stepwise discriminant procedure. Figure 4 shows the recognition rates for the four approaches for different expressions. The overall recognition rates using displacement only is 84%. If both displacement and distance are allowed in the analysis, the best variables are able to explain around 46.5% of the difference between various expressions. It should be noted that some of the distances in Table 2 are different from the ones listed in Table 1 which lists the significant features when no a priori information is available.
100 80 60 40 20 Happy 0
4.3. Expression Classification We use a statistical approach to classify a face with an expression. For our analysis, we use the complete Cohn Kanade database with each set of images start ing from neutral expression; gradually progress through the expression, and end up with a full blown expression. We use a linear discriminant analysis (LDA) for the final classification of the face. The fea tures provided to LDA is the optimal set of features described in Section 4.2. We determine the probability of a frame being one of the six expressions, based on the features in that frame, using discriminant analysis. The frame is clas sified into the expression with the highest probability. We perform a separate analysis for each type of expres sion to determine how quickly the expression is recog nized. We examine the progression of the assigned probabilities for all expressions for the images as the face changes the expression from neutral to that par ticular expression. In each case, we average the proba bilities with the number of such expression sequences. Assuming that the prior probabilities of group mem bership (in our case the six expressions) are known and
Recognition rate, %
Displacement Euclidean Block Combined
Sad
No. 4
Expression
Fig. 4. Recognition rates for the expressions with displacements and distances. PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 2011
Surprise
Disgust
Anger
Fear
688
ALUGUPALLY et al.
that the group specific densities at an expression x can be estimated, we can compute p(t|x), the probability of x belonging to group t, by applying Bayes theorem: q f (x) (1) p(t x) = t t , f (x) where qt is the prior probability of membership in group t, ft is the probability density function for group t, and f(x) =
10V02, 17 + 86E03, 17 2001Y + 87E03, 14 100V15, 17 Dsad = 0 + 8E05, 17 + 31E15, 17 4V02, 15 4V02, 17 + 12E03, 17 + 1601Y + 3E03, 14 36V15, 17 Dsurprise = 9 + 33E05, 17 + 106E15, 17 16V02, 15 + 31V02, 17 + 35E03, 17 1201Y 48E03, 14 50V15, 17 Once the discriminant scores (0.5 D u (x)) for each expression are calculated from these equations, the posterior probability of each expression is calculated using Equation 4. Figure 5 shows the probabilities for different expressions for the images as a neutral face is transformed into a face with different expressions. Figure 5a shows the probabilities for an average anger image sequence. It can be seen that the probabilities for the other four expressions (disgust, fear, happy and surprise) begin with zero and remain negligible throughout the development of the expression. The probability of anger gradually goes up as we progress through the expression while the probability of sad goes down simultaneously. By the seventh frame the probability of anger and sad are equal, at around 0.45. By about the 14th frame, the probability for anger reaches around 0.7. Similarly, Figures 5b5f show the probabilities for different expressions for the images as a neutral face is transformed into a face with disgust, fear, happy, sad, and surprise, respectively.
2
q f (x), is the estimated unconditional

t t t
density at x. Linear discriminant analysis partitions a p dimensional vector space into a set of regions {Rt}, where the region Rt is the subspace containing all p dimensional vectors y such that p(t|y) is the largest among all groups. An observation is classified as com ing from a group t if it lies in the region Rt. Assuming that each group has a multivariate normal distribution, linear discriminant analysis develops a function or classification criterion using a measure of squared Mahalanobis distance. The classification criterion is based on the individual within group covariance matrices. Each observation is placed in the class from which it has the smallest squared Mahalanobis dis tance. The squared Mahalanobis distance from x to group twhere Vt is the within group covariance matrix, is given by d t ( x ) = ( x m t )'V t ( x m t ), (2) where mt is the a p dimensional vector containing variable means in group t. The group specific density estimate at x from group t is then given by (3) f t ( x ) = ( 2 ) V t exp ( 0.5d t ( x ) ). Combining Equations 1 and 3, the posterior prob ability of x belonging to group t is given by p(t x) = exp ( 0.5D t ( x ) )
2 p 2 1 2 2 2 1
4.4. Design of a Dynamic Expression Classifier A critical issue in any real time application, where the expression must be determined almost immedi ately, is the number of features used in the analysis. More features result in added tracking time and greater complexity in the classifier. Figure 6a shows the overall recognition rates of the expressions as a function of the number of features. It shows that the performance levels off after about 8 top features. Therefore, we use only those features for our dynamic classifier (shown in Fig. 6b). A red line indicates the Euclidean distance between the two feature points, a green line indicates a block distance, and a blue circle indicates the displacement of the point in the current frame and the initial frame. Our goal in the design of the dynamic classifier is to recognize the expression accurately and as early as possible when the expression develops in a face. The classifier should identify the initial few frames as neu tral and as the expression slowly develops, it should classify the frames as mixed (neutral/expression). As the full blown expression is formed, the classifier should classify the expression correctly. In order to develop this dynamic classifier, it is important to understand the progression of probabilities of different expressions during the process of expression develop ment (shown in Fig. 5a5f). We make the following observations from the figures.
Vol. 21 No. 4 2011
(4)
exp ( 0.5D u ( x ) )
2
The discriminant scores D u (x) can be calculated using the linear discriminant functions derived by LDA. An observation is classified into group u if set ting t = u produces the largest value of p(t/x) or the 2 smallest value of D t (x). The linear functions derived by LDA to calculate the discriminant scores for the six expressions are. Danger = 3 13E05, 17 + 135E15, 17 52V02, 15 + 26V02, 17 3E03, 17 + 3901Y 27E03, 14 113V15, 17 Ddisgust = 4 24E05, 17 + 100E15, 17 66V02, 15 + 58V02, 17 + 76E03, 17 + 1301Y + 16E03, 14 36V15, 17 Dfear = 2 + 4E05, 17 10E15, 17 + 16V02, 15 17V02, 17 + 14E03, 17 + 1301Y 14E03, 14 + 34V15, 17 Dhappy = 5 0E05, 17 + 204E15, 17 + 3V02, 15
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS 1.0 0.8 0.6 0.4 0.2 0 1.0 0.8 0.6 0.4 0.2 0 1.0 0.8 0.6 0.4 0.2 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
689
Anger image sequence (a)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1.0 0.8 0.6 0.4 0.2 0 1.0 0.8 0.6 0.4 0.2 0 1.0 0.8 0.6 0.4 0.2 0
Disgust image sequence (b)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Fear image sequence (c) Probability
Happy image sequence (d)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Sad image sequence (e)
Surprise image sequence (f)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Frame number
Anger Disgust Fear Happy
Frame number
Sad Surprise
Fig. 5. Progression of probabilities of different expressions for different image sequences.
100 90 80 70 60 50 40 30 20 10 0
Overal recognition rate, %
01, V
B02, 15 E05, 17 B02, 17
E03, 17 E03, 14
E15, 17 B15, 17
1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 1718 19 Number of variables
Fig. 6. Recognition rates for all the expressions as a function of the number of features and the features used in the dynamic clas sifier.
All the expressions start with a probability of 0.7 for sad while the other five expressions have their probabilities below 0.2. For the five expressions (anger, disgust, fear, happy, surprise) when the probability of classification reaches 0.5 in a frame, it can be safely classified as that expression.
If the probability of sad remains over 0.75, we can say safely classify the expression as sad. Using these observations we can classify full blown expressions accurately. In order to classify mixed expressions, we use the following two heuristics. If the probability of one of the five expressions is between 0.3 and 0.5, we can say that the face is chang
Vol. 21 No. 4 2011
690
ALUGUPALLY et al. p(X) X(S, A, F, D, H, R)
p(X) > 0.5 X(A, F, D, H, R)
Yes X
No
p(X) > 0.3 X(A, F, D, H, R)
Yes
Mixed Neutral/X
No
p(S) 0.75
Yes
Sad
No No Neutral
s < p(X) s < +s
Yes
Mixed Neutral/Sad
Fig. 7. Schematic for the dynamic classifier.
ing from neutral to expression and can be classified as mixed. If the probability of sad is between s s we classify the frame as mixed (sad), where s is the aver age value of probability of sad expression in the first frame and s is the standard deviation. If none of the above conditions hold, the frame is classified as neutral. The schematic for the dynamic classifier is given in Fig. 7. In order to analyze the performance of the algo rithm, we extracted a set of frames from the Cohn Kanade database. Since it is difficult to determine exactly when a frame turns from neutral to a mixed expression and then to a full blown expression, we decided to use the following rules to label frames. We assumed the first three frames in a sequence to be neu tral; the last three frames to be the full blown expres sion and the middle three frames to be mixed. Table 3 shows the performance of the classifier for these three sets of frames. From the table we see that we were able to correctly classify the first set of frames as neutral (or
Table 3. Recognition Rates (%) for the different expressions as the number of features increases Neutral Neutral/Expr Expression Other Neutral Neutral/Expr Expression 93.6 18.1 0.9 1.2 13.0 2.8 2.5 63.9 92.8 2.7 4.9 3.5
neutral/expression) 95% of the time. The middle set is classified into a wrong expression only about 5% of the time. It is interesting to note that, in a large number of instances, by the time we reach the middle frames, the expression has already been fully formed. The final full blown expression is classified correctly 96% of the time either as the full blown expression or as neu tral/expression. From this table it is clear that the clas sifier is sufficiently accurate for dynamic applications. Some reasons for misclassification are (a) wrong ground truth, (b) classifier error, and (c) incomplete or early formation of expression. 5. CONCLUSION AND FUTURE WORK In this research, we studied how the expressions can be recognized in a human face based on the change in facial features. The automatic analysis of the face and facial expressions is rapidly evolving into a mature scientific discipline. Our research is able to answer some of the fundamental questions useful in developing automated systems to recognize expres sions based on landmarks. We have analyzed how facial expressions can be recognized using Euclidean and block distances, and displacement of the feature points in a face using sta tistical analysis. We have identified the features and distances which are useful in differentiating between expressions. Our results show that it is possible to get accurate results with as few as 8 features. We have also developed an algorithm to dynamically classify expressions as they are being formed and evaluated its effectiveness.
Vol. 21 No. 4 2011
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS
691
An important extension to our manual extraction of features can be to locate the feature points without any assistance from the user. In this framework, if the user goes out of frame and returns, the application can relocate the points and restart. In addition, the search window to locate a feature in a new frame can be dynamically adjusted by examining the probabilities of different expressions. Further research can be done on measuring the intensity of an expression on the face. Identification of expressions other than the six pri mary ones can also be useful in many applications. The motion of the features can be modeled using tech niques such as particle filtering to get better character ization of the expressions. The research can also be extended to examine datasets with spontaneous expression [32] and challenging illumination condi tions [25]. REFERENCES
1. K. Anderson and P. W. McOwan, A RealTime Auto mated System for the Recognition of Human Facial Expressions, IEEE Trans. Syst., Man Cybernet. B 36, 96105 (2006). 2. M. S. Bartlett, G. C. Littlewort, M. G. Frank, C. Lain scsek, I. R. Fasel, and J. R. Movellan, Automated Face Analysis by Feature Point Tracking Has High Concur rent Validity with Manual FACS Coding, J. Multime dia 1, 2235 (2006). 3. M. H. Bindu, P. Gupta, and U. S. Tiwary, Cognitive ModelBased Emotion Recognition from Facial Expressions for Live Human Computer Interaction, in CIISP 2007: Proc. IEEE Symp. on Computational Intelligence in Image and Signal Processing, (Honolulu, 2007), pp. 351356. 4. P. Campadelli and R. Lanzarotti, Fiducial Point Localization in Color Images of Face Foregrounds, Image Vision Comput. 22, 863872 (2004). 5. I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S. Huang, Facial Expression from Video Sequences: Temporal and Static Modeling, Computer Vision Image Understand. 91, 160187 (2003). 6. C. Darwin, The Expression of the Emotions in Man and Animals (Univ. Chicago Press, 1965). 7. P. Ekman, Facial Expression and Emotion, Am. Psy chol. 48, 384392 (1993). 8. P. Ekman and W. V. Friesen, Constants across Cul tures in the Face and Emotion, J. Person. Social Psy chol. 17, 124129 (1971). 9. P. Ekman and W. V. Friesen, Unmasking the Face: A Guide to Recognizing Emotions from Clues (Prentice Hall, Englewood Cliffs, 1975). 10. P. Ekman and W. V. Friesen, Facial Action Coding Sys tem: A Technique for the Measurement of Facial Move ment (Consulting Psychologists Press, Palo Alto, 1978). 11. R. El Kaliouby and P. Robinson, RealTime Infer ence of Complex Mental States from Facial Expres sions and Head Gestures, in CVPRW 04: Proc. IEEE
Computer Soc. Conf. on Computer Vision and Pattern Recognition Workshops (Washington, 2004), p. 154. 12. I. A. Essa and A. P. Pentland, Coding, Analysis, Inter pretation and Recognition of Facial Expressions, IEEE Trans. Pattern Anal. Mach. Intell. 19, 757763 (1997). 13. L. G. Farkas, Anthropometry of the Head and Face (Raven Press, New York, 1994). 14. A. J. Fridlund, Human Facial Expression: An Evolution ary View (Acad. Press, San Diego, 1994). 15. S. B. Gokturk, J. Y. Bouguet, C. Tomasi, and B. Girod, ModelBased Face Tracking for ViewIndependent Facial Expression Recognition, in Proc. 5th Int. Conf. on Automatic Face and Gesture Recognition (Washing ton, 2002), p. 287. 16. H. Gu, Y. Zhang, and Q. Ji, Task Oriented Facial Behavior Recognition with Selective Sensing, Comp. Vision Image Understand. 100, 385415 (2005). 17. H. Hong, H. Neven, and C. von der Malsburg, Online Facial Expression Recognition Based on Personalized Galleries, in Proc. 3rd IEEE Int. Conf. on Automatic Face and Gesture Recognition (Nara, 1998), pp. 354 359. 18. W. James, What Is an Emotion?, Mind 9, 188205 (1984). 19. T. Kanade, J. Cohn, and Y. Tian, Comprehensive Database for Facial Expression Analysis, in Proc. 4th IEEE Int. Conf. on Automatic Face and Gesture Rec ognition (Grenoble, 2000), pp. 4653. 20. I. Kotsia, N. Nikolaidis, and I. Pitas, Facial Expres sion Recognition in Videos Using a Novel MultiClass Support Vector Machines Variant, in ICASSP 2007: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (Honolulu, 2007), Vol. 2, pp. 585588. 21. I. Kotsia and I. Pitas, Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines, IEEE Trans. Image Processing 16, 172187 (2007). 22. C. Landis, Studies of Emotional Reactions: II. Gen eral Behavior and Facial Expression, J. Comparative Psychol. 4, 447510 (1924). 23. J. J. Lien, T. Kanade, J. Cohn, and C. Li, Detection, Tracking and Classification of Action Units in Facial Expression, J. Robotics Autonom. Syst. 31, 131146 (2000). 24. B. D. Lucas and T. Kanade, An Iterative Image Regis tration Technique with an Application to Stereo Vision, in Int. Joint Conf. on Artificial Intelligence (Vancouver, 1981), pp. 674679. 25. Machine Perception Laboratory, MPLab GENKI4K Face, Expression, and Pose Dataset, Available from: http://mplab.ucsd.edu/wordpress/?page_id=398 26. D. Matsumoto and P. Ekman, Japanese and Caucasian Facial Expressions of Emotion (JACFEE) (Intercultural and Emotion Research Laboratory, Department of Psychology, San Francisco State Univ., 1998) (Unpub lished Slide Set).
Vol. 21 No. 4 2011
692
ALUGUPALLY et al. on Computer Vision and Pattern Recognition (Santa Bar bara, 1998), pp. 753750. 41. Y. L. Tian, T. Kanade, and J. F. Cohn, Recognizing Action Units for Facial Expression Analysis, IEEE Trans. Pattern Anal. Mach. Intell. 23, 97115 (2001). 42. C. Tomasi and T. Kanade, Detection and Tracking of Point Features, Carnegie Mellon Univ. Tech. Rep. No. CMUCS91132 (1991). 43. S. S. Tomkins, The Role of Facial Response in the Experience of Emotion: A Reply to Tourangeau and Ellsworth, J. Person. Social Psychol. 40, 355357 (1981). 44. S. S. Tomkins, Affect Theory, in Emotion in the Human Face, Ed. by P. Ekman, 2nd ed. (Cambridge Univ. Press, 1982). 45. M. F. Valstar and M. Pantic, Combined Support Vec tor Machines and Hidden Markov Models for Model ing Facial Action Temporal Dynamics, in Proc. IEEE Int. Workshop on HumanComputer Interaction (Rio de Janeiro, 2007), pp. 188197. 46. Q. Xu, P. Zhang, W. Pei, L. Yang, and Z. He, An Auto matic Facial Expression Recognition Approach Based on ConfusionCrossed Support Vector Machine Tree, in ICASSP 2007: Proc. IEEE Int. Conf. on Acous tics, Speech, and Signal Processing (Honolulu, 2007), pp. 625628. 47. Y. L. Xue, X. Mao, and F. Zhang, Beihang University Facial Expression Database and Multiple Facial Expression Recognition, in Proc. 5th Int. Conf. on Machine Learning and Cybernetics (Dalian, 2006), pp. 32823287. 48. P. Yang, Q. Liu, and D. N. Metaxas, Boosting Coded Dynamic Features for Facial Action Units and Facial Expression Recognition, in Proc. CVPR (Minneapo lis, 2007). 49. M. Yeasin, B. Bullot, and R. Sharma, Recognition of Facial Expressions and Measurement of Levels of Interest from Video, IEEE Trans. Multimedia 8, 500 508 (2006). 50. Y. Zhang and Q. Li, Active and Dynamic Information Fusion for Facial Expression Understanding from Image Sequences, IEEE Trans. Pattern Anal. Mach. Intell. 27, 699714 (2005). 51. G. Zhao and M. Pietikainen, Dynamic Texture Rec ognition Using Local Binary Patterns with an Applica tion to Facial Expressions, IEEE Trans. Pattern Anal. Mach. Intell. 29, 915928 (2007). 52. H. Zhao, Z. Wang, and J. Men, Facial Complex Expression Recognition Based on Fuzzy Kernel Clus tering and Support Vector Machines, in ICNS 07: Proc. 3rd Int. Conf. on Natural Computation (Haikou, 2007), pp. 562566. 53. G. Zhou, Y. Zhan, and J. Zhang, Facial Expression Recognition Based on Selective Feature Extraction, in ISDA 06: Proc. 6th Int. Conf. on Intelligent Systems Design and Application (Jinan, 2006), pp. 412417.
Vol. 21 No. 4 2011
27. P. Michel and R. El Kaliouby, Real Time Facial Expression Recognition in Video Using Support Vector Machines, in Proc. 5th Int. Conf. on Multimodal Inter faces (Vancouver, 2003), pp. 258264. 28. M. Pantic and I. Patras, Detecting Facial Actions and Their Temporal Segments in Nearly FrontalView Face Images Sequences, in Proc. IEEE Int. Conf. on Systems, Man and Cybernetics (Hawaii, 2005), pp. 33583363. 29. M. Pantic and I. Patras, Dynamics of Facial Expres sion: Recognition of Facial Actions and Their Tempo ral Segments from Face Profile Image Sequences, IEEE Trans. Syst., Man Cybern. B 36, 433449 (2006). 30. M. Pantic and L. J. M. Rothkrantz, Automatic Analy sis of Facial Expressions: The State of the Art, IEEE Trans. Pattern Anal. Mach. Intell. 22, 14241445 (2000). 31. M. Pantic and L. J. M. Rothkrantz, Expert System for Automatic Analysis of Facial Expression, Image Vision Comput. 18, 881905 (2000). 32. M. Pantic, M. F. Valstar, R. Rademaker, and L. Maat, Webbased Database for Facial Expression Analysis, in Proc. IEEE Int. Conf. on Multimedia and Expo (ICME05) (Amsterdam, 2005), pp. 317321. 33. A. Pentland, B. Moghaddam, and T. Starner, View Based and Modular Eigenspaces for Face Recogni tion, in CVPR94: Proc. IEEE Int. Computer Soc. Conf. on Computer Vision and Pattern Recognition (Seattle, 1994), pp. 8491. 34. A. Samal, V. Subramanian, and D. Marx, Sexual Dimorphism in Human Faces, J. Visual Commun. Image Rep. 18, 453463 (2007). 35. H. Seyedarabi, A. Aghagolzadeh, and S. Khanmoham madi, Recognition of Six Basic Facial Expressions by FeaturePoint Tracking Using RBF Neural Network and Fuzzy Inference System, in ICME 04: Proc. IEEE Int. Conf. on Multimedia and Expo (Taipei, 2004), pp. 12191222. 36. J. Shi, A. Samal, and D. Marx, Face Recognition Using LandmarkBased Bidimensional Regression, Comp. Vision Image Understand. 102, 117133 (2006). 37. J. Shi and C. Tomasi, Good Features to Track, in CVPR94: Proc. IEEE Computer Soc. Conf. on Computer Vision and Pattern Recognition (Seattle, 1994), pp. 593 600. 38. J. Steffens, E. Elagin, and H. Neven, PersonSpotter Fast and Robust System for Human Detection, Track ing and Recognition, in Proc. 3rd Int. Conf. on Auto matic Face and Gesture Recognition (Nara, 1998), pp. 516521. 39. M. Suwa, N. Sugie, and K. Fujimora, A Preliminary Note on Pattern Recognition of Human Emotional Recognition, in Proc. 4th Int. Joint Conf. on Pattern Recognition (Kyoto, 1978), pp. 408410. 40. H. Tao and T. S. Huan, Connected Vibrations: A Modal Analysis Approach to NonRigid Motion Tracking, in CVPR 98: Proc. IEEE Computer Soc. Conf.
ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS Nripen Alugupally completed his MS from the Department of Com puter Science and Engineering at the University of Nebraska Lincoln. He is currently working as a Software Engineer at Pacific Gas & Electric.
693
David Marx is a Professor in the Statistics Department at the Univer sity of Nebraska Lincoln. His research interests include spatial sta tistics, linear and non linear models, and biometrics. He has authored or co authored over 150 papers.
Sanjiv Bhatia is an Associate Pro fessor in the Department of Com puter Science and Mathematics at the University of Missouri at St. Louis. His research interests include algorithms, computer vision, and image analysis.
Ashok Samal is a Professor in the Department of Computer Science and Engineering at the University of Nebraska Lincoln. His research interests include image analysis including biometrics, document image analysis and computer vision applications. He has published exten sively in these areas.
Vol. 21
No. 4
2011

Analysis of Landmarks in Recognition of Face Expressions: Application Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Landmarks in Recognition of Face Expressions: Application Problems

Uploaded by

Copyright:

Available Formats

APPLICATION PROBLEMS

Analysis of Landmarks in Recognition of Face Expressions1

Received October 10, 2010

PATTERN RECOGNITION AND IMAGE ANALYSIS