Face Recognition: Features

Local Appearance-based Face Recognition

🎯 Objective: To mitigate the effect of expression, illumination, and occlusion variations by performing local analysis and by fusing the outputs of extracted local features at the feature or at the decision level.

Some popular facial descriptions achieving good results

  • Local binary Pattern Histogram (LBPH)
  • Gabor Feature
  • Discrete Cosine Transform (DCT)
  • SIFT
  • etc.

Local binary Pattern Histogram (LBPH)1

截屏2021-02-16 11.10.39
  • Divide image into cells

  • Compare each pixel to each of its neighbors

    • Where the pixel’s value is greater than the threshold value (e.g., center pixel in this example), write “1”
    • Otherwise, write “0”

    $\rightarrow$ gives a binary number

  • Convert binary into decimal

  • Compute the histogram, over the cell

  • Use the histogram for classification

    • SVM
    • Histogram-distances

Tutorials and explanation:

High dim. dense local Feature Extraction

  • Computing features densely (e.g. on overlapping patches in many scales in the image)
  • Problem: very very high dimensionality!!!
  • Solution: Encode into a compact form
    • Bag of Visual Word (BoVW) model
    • Fisher encoding

Fisher Vector Encoding

  • Aggregates feature vectors into a compact representation
  • Fitting a parametric generative model (e.g. Gaussian Mixture Model)
  • Encode derivative of the likelihood of model w.r.t its parameters

截屏2021-02-16 11.38.19

Face recognition across pose (Alignment)

Problem

  • Different view-point / head orientation

    截屏2021-02-16 11.40.44
  • Recoginition results degrade, when images of different head orientation have to be matched 😭

Major directions to address the face recognition across pose Probelm

  • Geometric pose normalization (image affine warps)
  • 2D specific pose models, image rendering at pixel or feature level (2D+3D approaches)
  • 3D face Model fitting

Pose Normalization

💡 Idea

  • Find several facial features (mesh)
  • Use complete mesh to normalize face

Here we will use 2D Active Appearance Models

截屏2021-02-16 11.51.52
  • A texture and shape-based parametric model

  • Efficient fitting algorithm: Inverse compositional (IC) algorithm

Model and fitting

Independent shape and appearance model $$ \begin{array}{c} \text{shape:} \quad s=\left(x_{1}, y_{1}, x_{2}, y_{2}, \cdots, x_{v}, y_{v}\right)^{T}=s_{0}+\sum_{i=1}^{n} p_{i} s_{i} \\ \text{appearance:} \quad A(x)=A_{0}(x)+\sum_{i=1}^{m} \lambda_{i} A_{i}(x) \quad \forall x \in s_{0} \end{array} $$ Fitting goal: $$ \arg \min _{p, \lambda} \sum_{x \in s_{0}}\left[A_{0}(x)+\sum_{i=1}^{m} \lambda_{i} A_{i}(x)-I(W(x ; p))\right]^{2} $$ Fitting examples

  • Fitted mesh

    截屏2021-02-16 12.02.54

  • Mismatched mesh

    截屏2021-02-16 12.03.27

Fitted modal can be used to warp image to frontal pose (e.g. using piecewise affine transformation of mesh triangles)

Faces with different poses from FERET data base and their pose- aligned images
Faces with different poses from FERET data base and their pose- aligned images

Results

  • Much better results under pose variations compared to simple affine transform
  • Different warping functions can be used
    • Piecewise affine transformation worked best
  • Approach works well with local-DCT-based approach
    • but not so well with holistic approaches, such as Eigenfaces (PCA) 🤪

Face Recogntion using 3D Models2

  • A method for face recognition across variations in pose and illumination.
  • Simulates the process of image formation in 3D space.
  • Estimates 3D shape and texture of faces from single images by fitting a statistical morphable model of 3D faces to images.
  • Faces are represented by model parameters for 3D shape and texture.

Model-based Recognition

截屏2021-02-16 12.19.23

Face vectors

The morphable face model is based on a vector space representation of faces that is constructed such that any combination of shape and texture vectors $S_i$ and $T_i$ describes a realistic human face: $$ S=\sum_{i=1}^{m} a_{i} S_{i} \quad T=\sum_{i=1}^{m} b_{i} T_{i} $$ The definition of shape and texture vectors is based on a reference face $\mathbf{I}_0$.

The location of the vertices of the mesh in Cartesian coordinates is $(x_k, y_k, z_k)$ with colors $(R_k, G_k, B_k)$

Reference shape and texture vectors are defined by: $$ \begin{array}{l} S_{0}=\left(x_{1}, y_{1}, z_{1}, x_{2}, \ldots, x_{n}, y_{n}, z_{n}\right)^{T} \\ T_{0}=\left(R_{1}, G_{1}, B_{1}, R_{2}, \ldots, R_{n}, G_{n}, B_{n}\right)^{T} \end{array} $$ To encode a novel scan $\mathbf{I}$, the flow field from $\mathbf{I}_0$ to $\mathbf{I}$ is computed.

PCA

  • PCA is performed on the set of shape and texture vectors separately.

  • Eigenvectors form an orthogonal basis: $$ \mathbf{S}=\overline{\mathbf{s}}+\sum_{i=1}^{m-1} \alpha_{i} \cdot \mathbf{s}_{i}, \quad \mathbf{T}=\overline{\mathbf{t}}+\sum_{i=1}^{m-1} \beta_{i} \cdot \mathbf{t}_{i} $$

  • Example

    截屏2021-02-16 20.36.08

Model-based Image Analysis

  • 🎯 Goal: find shape and texture coefficients describing a 3D face model such that rendering produces an image $\mathbf{I}_{\text{model}}$ that is as similar as possible to $\mathbf{I}_{\text{input}}$

  • For initialization 7 facial feature points, such as the corners of the eyes or tip of the nose, should be labelled manually

    截屏2021-02-16 20.38.43

  • Model fitting: Minimize $$ E_{I}=\sum_{x, y}\left\|\mathbf{I}_{\text {input }}(x, y)-\mathbf{I}_{\text {model }}(x, y)\right\|^{2} $$

    • Shape, texture, transformation, and illumination are optimized for the entire face and refined for each segment.
    • Complex iterative optimization procedure

Databases

  • Necessary to develop and improve algorithms
  • Provide common testbeds and benchmarks which allow for comparing different approaches
  • Different databases focus on different problems

Well-known databases for face recognition

  • FERET
  • FRVT
  • FRGC
  • CMU-PIE
  • BANCA
  • XM2VTS

Observations

  • One 3-D image is more powerful for face recognition than one 2- D image.
  • One high resolution 2-D image is more powerful for face recognition than one 3-D image.
  • Using 4 or 5 well-chosen 2-D face images is more powerful for face recognition than one 3-D face image or multi-modal 3D+2D face.

Wild Face Datasets

Labeled Faces In the Wild Dataset (LFW)

  • Face Verification: Given a pair of images specify whether they belong to the same person

    截屏2021-02-16 20.44.55
  • 13K images, 5.7K people

  • Standard benchmark in the community

  • Several test protocols depending upon availability of training data within and outside the dataset.

YouTube Faces Dataset (YTF)

  • Video Face Verification: Given a pair of videos specify whether they belong to the same person

    截屏2021-02-16 20.46.03
  • 3425 videos, 1595 people

  • Standard benchmark in the community

  • Wide pose, expression and illumination variation


  1. T. Ahonen, A. Hadid and M. Pietikainen, “Face Description with Local Binary Patterns: Application to Face Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, Dec. 2006, doi: 10.1109/TPAMI.2006.244. ↩︎

  2. V. Blanz and T. Vetter, “Face recognition based on fitting a 3D morphable model,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1063-1074, Sept. 2003, doi: 10.1109/TPAMI.2003.1227983. ↩︎

Previous
Next