Facial Expression Recognition

What is facial expression analysis?

What is Facial Expression?

Facial expressions are the facial changes in response to a person‘s internal emotional states, interntions, or social communications.

Role of facial expressions

Almost the most powerful, natural, and immediate way (for human beings) to communicate emotions and intentions
Face can express emotion sooner than people verbalize or realize feelings
Faces and facial expressions are an important aspect in interpersonal communication and man-machine interfaces

Facial Expressions

Facial expression(s):
- nonverbal communication
- voluntary / involuntary
- results from one or more motions or positions of the muscles of the face
- closely associated with our emotions
The fact: Most people’s success rate at reading emotions from facial expression is only a little over 50 percent.

Facial expression analysis vs. Emotion analysis

Emotion analysis requires higher level knowledge, such as context information.
Besides emotions, facial expressions can also express intention, cognitive processes, physical effort, etc.

Emotions conveyed by Facial Expressions

Six basic emotions (assumed to be innate)

Basic structure of facial expression analysis systems

Levels of description

Emotions

Discrete classes

Six basic emotions
Positive, neutral, negative

Continuous valued dimensions

Emotions as a continuum along 2/3 dimension
Circumplex model by Russel
- Valence: unpleasant - pleasant
- Arousal: low – high activation

Facial Action Units (AUs)

Facial Action Coding System (FACS)

A human-observer based system designed to detect subtle changes in facial features
Viewing videotaped facial behavior in slow motion, trained observer can manually FACS code all possible facial displays
These facial displays are referred to as action units (AU) and may occur individually or in combinations.

Action Units (AUs)

There are 44 AUs
30 AUs related to contractions of special facial muscles
- 12 AUs for upper face
- 18 AUs for lower face
Anatomic basis of the remaining 14 is unspecified $\rightarrow$ referred to in Facial Action Coding System (FACS) as miscellaneous actions
For action units that vary in intensity, a 5-point ordinal scale is used to measure the degree of muscle contraction

Combination of AUs

More than 7000 different AU combinations have been observed.

Additive: appearance of single AUs does NOT change. E.g.
Nonadditive: appearance of single AUs does change. E.g.

Individual Differences in Subjects

Variations in appearance
- Face shape,
- Texture
- Color
- Facial and scalp hair
due to sex, ethnic background, and age differences
Variations in expressiveness

Transitions Among Expressions

Simplifying assumption: expressions are singular and begin and end with a neutral position
Transitions from action units or combination of actions to another may involve NO intervening neutral state.
Parsing the stream of behavior is an essential requirement of a robust facial analysis system, and training data are needed that include dynamic combinations of action units, which may be either additive or nonadditive.

Intensity of Facial Expression

Facial actions can vary in intensity
FACS coding uses 5-point intensity scale to describe intensity variation of action units
Some related action units function as sets to represent intensity variation.
- E.g. in the eye region, action units 41, 42, and 43 or 45 can represent intensity variation from slightly drooped to closed eyes.

Relation to other Facial Behavior or Nonfacial Behavior

Facial expression is one of several channels of nonverbal communication.
The message values of various modes may differ depending on context.
For robustness, should be integrated with
- Gesture
- Prosody
- Speech

Different datasets and systems

Using geometric features + ANN (2001 / early work)

Recognizing Action Units for Facial Expression Analysis¹

An Automatic Facial Analysis (AFA) system to analyze facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (depending of facial furrows) in a nearly frontal-view image sequences.
A group of action units (neutral expression, six upper face AUs and 10 lower face AUs) are recognized whether they occur alone or in combinations.

Cohn-Kanade AU-Coded Facial Expression Database

100 subjects from varying ethnic backgrounds.
23 different facial expressions (single action units and combinations of action units)
Frontal faces, small head motion
Variations in lighting
- ambient lighting
- single-high-intensity lamp
- dual high-intensity lamps with reflective umbrellas
Coded with FACS and assigned emotion-specified labels (happy, surprise, anger, disgust, fear, sadness)
Example

Feature-based Automatic Facial Action Analysis (AFA) System

Feature detection & feature location
- Region of the face and location of individual face features detected automatically in the initial frame using neural network based approach
- Contours of face features and components adjusted manually in the initial frame
- Face features are then tracked automatically
  - permanent features (e.g., brows, eyes, lips)
  - transient features (lines and furrows)
Feature extraction: Group facial features into separate collections of feature parameters
- 15 normalized upper face parameters
- 9 normalized lower face parameters
Parameters fed to two neural-network-based classifiers

Facial Feature Extraction

Multistate Facial Component Models of a Frontal Face

Permanent components/features
- Lip
- Eye
- Brow
- Cheek
Transient component/features
- Furrows and wrinkles appear perpendicular to the direction of the motion of the activated muscles
- Classification
  - present (appear, deepen or lengthen)
  - absent
- Detection
  - Canny edge detector
  - Nasal root / crow’s-feet wrinkles
  - Nasolabial furrows

Facial Feature Representation

Face coordinate system
- $x =$ line between inner corners of eyes
- $y =$ perpendicular to x
Group facial features
- upper face features: 15 parameters
  - Example
- lower face features: 9 parameters
  - Example

AU Recognition by Neural Networks

Three layer neural networks (one hidden layer)
Standard back-propagation method
- Separate networks for upper- / lower face

Using appearance-based features + SVM (2006)

Automatic Recognition of Facial Actions in Spontaneous Expression²

RU-FACS data set

Containts spontaneous expressions
100 subjects

Using Deep features (CNN) + fusion (2013)

Emotion Recognition in the Wild Challenge (EmotiW)

🎯 Goal: Move to more realistic out of the lab data
AFEW Dataset (Acted Facial Expressions in the Wild)
- Extracted from movies
- Annotated with six basic emotions
- Movie clips from 330 subjects, age range: 1-70
- Semi-automatic annotation pipeline
  - Recommender sytem + manual annotation

2013 Winner

Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video³

Convolutional Network

Inputs are images of size 40x40, cropped randomly
Four layers, 3 convolutions followed by max or average pooling and a fully-connected layer

Representing video sequence

CNN gives 7-dim output per frame
Multiple frames are averaged into 10 vectors describing the sequence
- For shorter sequences, frames / vectors get expanded (duplicated)
Results in 70-dim feature vector (10*7)
Classification with SVM

Other Features

„Bag of Mouth“
Audio-features

Typical Pipline

Face detection and alignment
Extract various features and different representations
Build multiple classifiers
Fusion of results

Other Applications

Pain Analysis
Analysis of psychological disorders
Workload / stress analysis
Adaptive user interfaces
Advertisment

Y. . -I. Tian, T. Kanade and J. F. Cohn, “Recognizing action units for facial expression analysis,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 97-115, Feb. 2001, doi: 10.1109/34.908962. ↩︎
Littlewort, Gwen & Frank, Mark & Lainscsek, Claudia & Fasel, Ian & Movellan, Javier. (2006). Automatic Recognition of Facial Actions in Spontaneous Expressions. Journal of Multimedia. 1. 10.4304/jmm.1.6.22-35. ↩︎
Kahou, Samira Ebrahimi & Pal, Christopher & Bouthillier, Xavier & Froumenty, Pierre & Gulcehre, Caglar & Memisevic, Roland & Vincent, Pascal & Courville, Aaron & Bengio, Y. & Ferrari, Raul & Mirza, Mehdi & Jean, Sébastien & Carrier, Pierre-Luc & Dauphin, Yann & Boulanger-Lewandowski, Nicolas & Aggarwal, Abhishek & Zumer, Jeremie & Lamblin, Pascal & Raymond, Jean-Philippe & Wu, Zhenzhou. (2013). Combining modality specific deep neural networks for emotion recognition in video. ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction. 543-550. 10.1145/2522848.2531745. ↩︎

Last updated on 2024-09-05

← Facial Feature Detection 2021-02-18

People Detection: Global Approaches 2021-02-18 →