Joint Spatial Geometric and Max-margin Classifier Constraints for Facial Expression Recognition Using Nonnegative Matrix Factorization

Based on the constrained non-negative matrix factor algorithm, the article presents a new approach to facial recognition recognition. Our proposed method incorporated two tasks in an automatic expression analysis system: facial feature extraction and classification into expressions. To obtain local and geometric structure information in the data as much as possible, we amalgamate max-margin relegation into the constrained NMF optimization, resulting in a multiplicative updating algorithm is additionally proposed for solving optimization quandary. Experimental results on JAFFE dataset demonstrate that the effectiveness of the proposed method with improved performances over the conventional dimension reduction methods.


Introduction
Facial expression recognition (FER) is increased the attention from psychologist's anthropologists, and computer scientists [1,2,3]. The computer researchers attempt to create complicated human-computer interfaces which are able to recognize automatically and classify human expressions or emotions. Fasel et al. [1] has defined facial features that will deform over time and the compression of facial muscles will give skin texture. Research has also shown that the general changes of muscle activity occur quickly and usually only a few seconds. The study also provides for the fact that emotions are the only source of facial expressions in addition to other things such as verbal and nonverbal correspondence, or physiological activities.
Usually facial expressions and emotions are different (and the terms are commonly not correctly exchanged), for PC vision group, the expression " facial expression recognition" frequently refers to the characterization of facial components in one of the six alleged essential feelings: sadness, disgust, happiness, surprise, fear and anger, as presented by Ekman in 1971 [2]. The endeavor of an elucidation depends on the suspicion which the appearances of feelings are widespread crosswise over people and also human ethnics and societies. An automatic expression analysis system has two tasks that are compulsory [4]: facial feature extraction and relegation into expressions. There are generally two kinds of methods in facial feature extraction: appearance-predicated and geometric feature-predicated methods. After focalizing the face, the exhibited countenance has to be extracted as much information as possible. In visage apperception, most automatic expression analysis systems endeavor to apperceive a diminutive set of prototypic expressions (i.e. fear, anger, joy, surprise, sadness, and disgust). In this section, we describe the most important step in a facial expression recognition system which is feature extraction step that it can be analyzed in terms of facial action occurrence after the face has been located in the image or video frames. Over the last several decenniums, massive endeavor has been made and exceptional. Immense endeavor has been made over the past several years and consequential results have been successful finished in FER. A main step in FER is to engender or abstract features of verbalization from the pristine facial images. A main step in FER is to build or abstract features of verbalization from the pristine images of the ocular perceivers. Several wide-ranging methods of extraction of features such as the key element analysis (PCA) [5], Eigen-face [6], Singular value decomposition (SVD) [7] Non-negative matrix factorization (NMF) [8].
In NMF algorithm, it was described as follow:a non-negative m×n matrix has been given: X=(x 0 ,x 1 ,…,x (m-1) )∈R^(m×n) is exactly the facial data will be analyzed, it's the need of finding non-negative matrix factor U (m×k) and matrix factor V=(k×n) such that X≈UV where k is smaller than m and n.
In the original X matrix a column vector can be considered as the sum of the weights of all the vectors in the left matrix U, but the opposite the elements of the corresponding column vector in the right matrix V are weight coefficients. The un-negativity constraints of U and V consistent with the intuitive notion of combining parts to form a whole, which is how NMF learns a part-based representation. . (1) where is Frobenius norm of a matrix, and the product is the non-negative matrix factorization approximation of of rank at most . The non-negativity constraints on and enables only additive (nonsubtractive) combine of parts to build entire data.
In the utility for consciousness of facial expression, the grey values on each facial picture are nonnegative and saved as a type of matrix X=[X 1 ,X 2 , ... ,X n ], the place X j is a column vector of m dimensions, consisting of the non-negative gray facial picture. The X matrix can be divided into the product of a nonnegative U matrix representing simple NMF images and a non-negative weight coefficient matrix V, NMF decomposition renders expression pix reconstructed in a nonsubtractive manner and a great deal like the approach of forming harmony from sections.

Classification by SVM Classifier
The last task of the FER system is classification based on machine learning theory. The input to the classification process is a combination of features retrieved from the face area in the earlier stage. The earlier stage which will contain a set of features formed to describe facial expressions. Classification is in need of supervised training, so the training set has to be made up of labeled data. Support Vector Machines (SVM) are data classification techniques based on optimization theory and statistical theory. In the SVM technique the initial input space will be mapped into the feature space and in this feature space the optimal dividing hyperplane is determined. To divide multiple class, the original SVM technique will divide the data space into two parts and the process is repeated many times. Thus, the multiclass problem using the SVM method can completely perform like the two-class problem. Assuming the problem needs to be classified with k classes (k> 2), the "one-against-one" strategy will proceed to implement k (k-l) / 2 binary classifications using the SVM method. Each class will conduct separation with the remaining k-1 class to determine the k-1 separation function based on the two-class division problem by SVM method.

Adaptive Feature Extraction and Classification Method
There are few existed works that use constrains the aim of increasing the the discriminatory ability for extracted features. Several variants of NMF with discriminant constraints imposed were proposed in [9,10]. Kumar et al [11] introduced an adaptive feature extraction and classification method which propose a method to obtain the baseline matrix by constructing a soft max-margin constraints calculation for the objective function of NMF that maximized the classification margin using the features that are extracted using those bases. Inspired by this, they aim at finding a set of basis vectors that maximizes the margin of an SVM classifier.
Let denote a set of data vectors and their corresponding labels, where . To extract features based on criteria max-margin classifier we use base matrix U. To perform this problem, we impose constraints on the features vectors derived from the matrix U. Features extracted from projections of example data x on base vectors are stored in U ( ) and optimizations are calculated as follows: (2) where ε=(ε 1 ,...,ε i ...ε L ) is variable vector, λ is a scalar that controls the relative significance for the NMF cost and C a scalar that controls the overall significance of the punishment forced for the preparation models that are either excessively near the isolating hyper-plane or misclassified.

Methodology
We propose a unity of objective function for the model which archives the upper objectives by combining the benefit of max-margin classifiers and NMF constraints together, through adding the pixel dispersion penalty and manifold regularization into the objective function. Following, we drive a multiplicative update rules using optimized gradient method and describe how the systems use this algorithm to perform the classification task we expect it to do.

Max-margin Nonnegative Matrix Factorization via Spatial Constraints and Graph Regularization
The unified objective function is constructed by jointing the data reconstruction objective function: (3)

Joint Spatial Geometric and Max-margin Classifier Constraints for Facial Expression Recognition
All variables are divided into three terms: the coefficient matrix (V), the basis matrix (U) and variables about max-margin projection (w, b, ε). Where is Frobenius norm of a matrix, and the product is the non-negative matrix factorization approximation of of rank at most ; is the lack variable vector, and C are scalars; the regularization parameter controls the smoothness of the new representation; and c 0 is a simple positive constant bound parameter; L is called graph Laplacian, E is called the dispersion kernel matrix.

Update the Projection Vector and Slack Variables.
When the coefficient matrix and the basis matrix are fixed, MMNMF_MR optimization problem changes into the standard binary soft-margin SVM classification. (4) The hyper-plane parameters , and slack variable vector are received by the use of SVM classifier.

Update the Coefficient Matrix.
When other variables are fixed, optimizing the coefficient matrix is a quadratic program solution: where 1 is a unit vector whose size is the same as v, 0 is the zero vector. We can derive v by solving this equation.

Update the Basis Matrix.
When other variables are fixed, the model is transformed to a non-negative matrix factorization: (6) Because of the non-negative constraints, we use gradient descent methods to solve this problem. The gradient of equation (6) is

Classification
To obtain the feature vectors during the implementation then input test vectors x test will be projected onto the base matrix U, . The feature vector is used by the max-margin classifier which predicts the class where w, b, U are calculated during training.

Algorithm for MNMF_SGR
Input: Matrix X, rank k, maxIter; positive constants Output: U, V, w, b Begin Initial the basis matrix U0 and the coefficient V0, let t=0 Let s=1, U=Us, V=Vs Repeat Fix U and Vs to find ws+1, bs+1 via equation (4) Fix V, ws and bs+1 to find Us+1 via equation (5)

Datasets
We using Japanese Female Facial Expression (JAFFE) database [13] to experiments. The dataset have been collected and staging by Michael Lyons, Miyuki Kamachi, and Jiro Gyoba.

Preprocessing
Due to the background is more immensely colossal than face image; firstly, the Viola-Jones algorithm will be used to find faces. Based on the partitions placed set on already detected frontal faces, we using cascade object detector to detection ocular perceivers, nasal perceiver and mouth. The reality shows that the use of Viola-Jones algorithm as a preprocessing step and want to achieve a good classification then this step needs to be done. Each original image from both databases is cropped and down-sampled in a such way that the final image size is 16 × 16 pixels.

Figure. 3. Face and facial parts detection
All algorithms were initialized with 20 random U and V matrices, each of them was trained for 20 iterations and the one with the minimum objective function value was further trained for 1000 iterations.

Parameter Settings
For training and testing splits, we repeated the following procedure for ten times. Each time we randomly selected two-thirds of number of image per individual and labeled them. All the other images were unlabeled and used as the testing set.
In MNMF_SGR, was tested for the following values {0.01, 1, 10} and was tested for {1, 100} and was tested for {10 -5 , 10 -4 ,..., 10 2 }. Firstly, the dimensionality reduction process with NMF, SpaNMF, GNMF and Semi-NMF algorithms, the trained coefficient matrix is ready to be used for classifying a testing face image. Then we use SVM algorithm for the classifiers in the face recognition.
With MNMF and MNMF_SGR, after training process we compute the feature vector from the input test vector that the basis matrix projected onto. After that, this feature vector is used in predicting class of face recognition. All algorithms were initialized with 20 random U and V matrices, each of them was trained for 20 iterations and the one with the minimum objective function value was further trained for 1000 iterations.

Classification Results.
The results of facial expression recognition for JAFFE dataset shown in Figure 6. Semi-supervised algorithms outperform all un-supervised ones. MNMF_SGR has highest accuracy, and then followed by MNMF_FA, MNMF, SpaNMF, GNMF, SemiNMF and standard NMF. MNMF_SGR outperforms NMF by 21.87%. The highest classification accuracy of 86.94% is achieved with k=30. The disarray lattice of outward appearance acknowledgment appeared in Table II utilizing proposed technique with 30 number of highlight vectors (k=30). A portion of the pitiful and satisfaction outward appearance are mistaken for one another. The distinction of satisfaction and dismal fizzled in light of the fact that these articulations had a comparable movement of mouth.

Conclusion
In this paper, we have presented the constrained NMF approach for problem the facial expression recognition. The proposed MNMF_SGR performs well in facial expression recognition task and its effectiveness has been proven in our model. To summarize, With many constraints allows us to build models effectively and specifically on high dimensional, sparse and noisy datasets. For future work more sophisticated and efficient way to tune kernel functions will be explored. We will also apply the proposed method to problems in other fields, such as bioinformatics and computer vision. Studying the convergence rate for MNMF_SGR and increasing the efficiency, they should be all in consideration.