Face Detection | Haobin Tan

Face Detection: Color-Based

Fri, 06 Nov 2020 00:00:00 +0000

TL;DR

Different color spaces and classifiers can be used
- Models: histograms, Gaussian Models, Mixture of Gaussians Model
- Histogram-backprojection / Histogram matching
- Bayes classifier
- Discriminative Classifiers (ANN, SVM)
Bayesian classifier and ANN seem to work well
- Sufficient training data is needed for modeling the pdf, in particular for Bayesian approach (positive & negative pdfs learned)
Advantages: Fast, rotation & scale invariant, robust against occlusions
Disadvantages:
- Affected by illumination
- Cannot distinguish head and hands
- Skin-colored objects in the background problematic
Metric: ROC curve used to compare classification results / methods

Color-based face detection overview

💡 Idea: human skin has consistent color, which is distinct from many objects

Possible approach:

Find skin colored pixels
Group skin colored pixels

(and apply some heuristics) to find the face

Color

Grayscale Image: Each pixel represented by one number (typically integer between 0 and 255)
Color image: Pixels represented by three numbers

Different representations exist –> „Color Spaces“

Color spaces

RGB
- most widely used
- specifies colors in terms of the primary colors red (R), green (G), and blue (B)
HSV/HSI: hue (H), saturation (S) and value(V)/intensity (I)
- Closely related to human perception (hue, colorfulness and brightness)
  - Hue: “color”
  - Saturation: How “pure” the color is?
  - Value: “lightness”
Class Y spaces: YCbCr (Digital Video), YIQ (NTSC), YUV (PAL)
- Y channel contains brightness, other two channels store chrominance (U=B-Y, V=R-Y)
- Conversion from RGB to Yxx is a linear transformation
Perceptually uniform spaces
- Perceived color difference is uniform to difference in color values
- Euclidian distance can be used for color comparison
Chromatic Color Spaces
- Two color channels containing chrominance (colour) information
  - HS (taken from HSV)
  - UV (taken from YUV)
- Normalized rg from RGB:
  - r = R / (R+G+B)
  - g = G / (R+G+B)
  - b = B / (R+G+B)
- Sometimes it is argued that chromatic skin color models are more robust

Problems

Reflected color depends on spectrum of the light source (and properties of the object / surface)
If the light source / illumination changes, the reflected color signal changes!!! 🤪

How to model skin color?

Non-parametric models: typically histograms
Parametric models
- Gaussian Model
- Gaussian Mixture Model
Or just learn decision boundaries between classes (discriminative model)
- ANN, SVM, …

Histogram as skin color model

👍 Advantages: Works very well in practice
👎 Disadvantages
- Memory size quickly gets high
- A large number of labelled skin and non-skin samples is needed!

Histogram Backprojection

The simplest (and fastest) way to utilize histogram information
Each pixel in the backprojection is set to the value of the (skin-color) histogram bin indexed by the color of the respective pixel
- A color $x$ is considered as skin color if $H\_{+}(x) > \theta$
E.g.

Histogram Matching

Backprojection
- is good, when the color distribution of the target is monomodal.
- is not optimal, when the target is multi colored! 😢
🔧 Solution: Build a histogram of the image within the search window, and compare it to the target histogram.
- distance metrics for histograms, e.g.:
  - Battacharya distance
  - Histogram intersection
  - Earth-movers distance,…

Histogram Backprojection vs. Matching

Histogram Backprojection
- Compares color of a single pixel with color model
- Fast and simple
- Can only cope well with mono-modal distributions
- sufficient for skin-color classification
Histogram Matching / Intersection
- Compares color histogram of image patch with color model
- Better performance
- Can cope with multi-modal distributions
- Computationally expensive

Parametric models

Gaussian Density Models

Gaussian Densities
- Assume that the distribution of skin colors p(x) has a parametric functional form
- Most common function: Gaussion function $\mathrm{G}(\mathbf{x} ; \mu, \mathbf{C})$
  $$ p(x | \text{skin})=G(x ; \mu, C)=\frac{1}{(2 \pi)^{d / 2}|C|^{1 / 2} }\exp \left\\{-1 / 2(x-\mu)^{\top} C^{-1}(x-\mu)\right\\} $$
  - Mean $\mu$ and covariance matrix $C$ are estimated from a training set of skin colors $S = {x\_1,x\_2,...,x\_N}$:
    - $\mu = E\{x\}$
    - $C = E\{(\boldsymbol{x}-\mu)^T(\boldsymbol{x}-\mu)\}$
- A color is considered as skin color if
  - $p(x|\text{skin}) > \theta$
  - $p(x|\text{skin}) > p(x|\text{non-skin})$

Mixture of Gaussian Models

$$ p(x)=\sum\_{i=1}^{K} \pi\_{i} G\left(x, \mu\_{i}, C\_{i}\right) $$

Parameter set $\Phi$ can be estimated using the EM algorithm
- Iteratively changes parameters so as to maximize the log-likelihood of the training set: $$ L=\log \prod\_{i=1}^{N} p\left(x\_{i} \mid \Phi\right) $$
A color is considered as skin color if
- $p(x|\text{skin}) > \theta$
- $p(x|\text{skin}) > p(x|\text{non-skin})$

Bayes Classifier

Skin Classification using Bayes Decision Rule
- Minimum cost decision rule
- Classify pixel to skin class if $P(\text{Skin} | x)>P(\text{Non-Skin} | x)$
- Decision Rule:
  $$ \frac{p(\mathbf{x} \mid \text {Skin})}{p(\mathbf{x} \mid \text {Non-Skin})} \geq \frac{P(\text {Non-Skin})}{P(\text {Skin})} $$
- The classconditionals $p(x|\omega)$ can be estimated from the corresponding histograms:
  $$ p\left(x \mid \omega\_{i}\right)=h\_{i}(x) / \sum\_{x} h\_{i}(x) $$
  - $h\_i(x)$: count of pixels from class $\omega\_{i}$ that have value $x$

Discriminative Models / Classifiers

Artificial Neural Networks
Support Vector Machine

Performance Measures

For classification

When comparing recognition hypotheses with ground-truth annotations have to consider four cases:

More see: Evaluation

ROC (Receiver Operating Characteristic)

Used for the task of classification
Measures the trade-off between true positive rate and false positive rate

$$ \begin{array}{l} \text { true positive rate }=\frac{\mathrm{TP}}{\mathrm{Pos}}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} \\\\ \text { false positive rate }=\frac{\mathrm{FP}}{\mathrm{Neg}}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}} \end{array} $$

Each prediction hypothesis has generally an associated probability value or score
The performance values can therefore plotted into a graph for each possible score as a threshold
Example:

Skin-color: Analysis and Comparison

Conclusions ¹

Bayesian approach and MLP worked best
- Bayesian approach needs much more memory
Approach is largely unaffected by choice of color space, but
Results degraded when only chrominance channels were used

From Skin-Colored Pixels to Faces

Skin-colored pixels need to be grouped into object representations
🔴 Problems:
- skin-colored background,
- further skin-colored body parts (hands, arms, …),
- Noise, …

Perceptual Grouping

Morphological Operators: Operators performing an action on shapes where the input and output is a binary image.
Threshold each pixel‘s skin affiliation –> Binary Image
Morphological Erosion
- Remove pixels from edges of objects
- Set pixel value to min value of surrounding pixels
Morphological Dilatation
- Add pixels to edges of objects
- Set pixel value to max value of surrounding pixels
Morphological Opening
- Apply erosion, then dilatation
- Goal:
  - Smooth outline
  - Open small bridges
  - Eliminate outliers
Morphological Closing
- Apply dilatation, then erosion
- Goal:
  - Smooth inner edges
  - Connect small distances
  - Fill unwanted holes
Apply morphological closing then morphological opening
- Resulting image is reduced to connected regions of skin color (blobs)

From Skin Blobs To Faces

Goal: align bounding box around face candidate
Important for:
- Face Recognition
- Head Pose Estimation
Different approaches:
- Choose cluster with biggest size
- Ellipse fitting (approximate face region by ellipse)
- Heuristics to distinguish between different skin clusters
- Use temporal information (tracking)
- Facial Feature Detection
- …

S. L. Phung, A. Bouzerdoum and D. Chai, “Skin segmentation using color pixel classification: analysis and comparison,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148-154, Jan. 2005, doi: 10.1109/TPAMI.2005.17. ↩︎

Face Detection: Neural-Network-Based

Fri, 13 Nov 2020 00:00:00 +0000

Motivation

Idea: Use a search-window to scan over an image
Train a classifier to decide whether the search windows contains a face or not?

Detection

Simple neuron model

Topologies

Parameters

Adjustable Parameters are

Connection weights (to be learned)
Activation function (fixed)
Number of layers (fixed)
Number of neurons per layer (fixed)

Training

Backpropagation with gradient descent

Neural Network Based Face Detection¹

Idea: Use an artifical neural network to detect upright frontal faces
- Network receives as input a 20x20 pixel region of an image
- output ranges from -1 (no face present) to +1 (face present)
- the neural network „face-filter“ is applied at every location in the image
- to detect faces with different sizes, the input image is repeatedly scaled down

Network Topology

20x20 pixel input retina
4 types of receptive hidden fields
One real-valued output

System Overview

Network Training

Training Set

1050 normalized face images
15 face images generated by rotating and scaling original face images
1000 randomly chosen non-face images

Preprocessing

correct for different lighting conditions (overall brightness, shadows)
rescale images to fixed size

Histogram equalization

Defines a mapping of gray levels $p$ into gray levels $q$ such that the distribution of $q$ is close to being uniform
Stretches contrast (expands the range of gray levels)
Transforms different input images so that they have similar intensity distributions (thus reducing the effect of different illumination)
Example
Algorithm
- The probability of an occurrence of a pixel of level $i$ in the image:
  $$ p\left(x\_{i}\right)=\frac{n\_{i}}{n}, \qquad i \in 0, \ldots, L-1 $$
  - $L$: number of gray levels
  - $n\_i$: number of occurences of gray level $i$
- Define $c$ as the cumulative distribution function:
  $$ c(i)=\sum\_{j=0}^{i} p\left(x\_{j}\right) $$
- Create a transformation of the form
  $$ y\_i = T(x\_i) = c(i), \qquad y\_i \in [0, 1] $$
  will produce a level $y$ for each level $x$ in the original image, such that the cumulative probability function of $y$ will be linearized across the value range.
  $$ y\_{i}^{\prime}=y\_{i} \cdot(\max -\min )+\min $$

Training Procedure

Randomly choose 1000 non-face images
Train network to produce 1 for faces, -1 for non-faces
Run network on images containing no faces. Collect subimages in which network incorrectly identifes a face (output > 0)
Select up to 250 of these „false positives“ at random and add them to the training set as negative examples

Neural Network Based Face Filter

Output of ANN defines a filter for faces
Search
- Scan input image with search window, apply ANN to search window
- Input image needs to be rescaled in order to detect faces with different size
Output needs to be post-processed
- Noise removal
- Merging overlapping detections
Speed up can be achieved
- Increase step size
- Make ANN more flexible to translation
- Hierarchical, pyramidal search

Localization and Ground-Truth

For localization, the test data is mostly annotated with ground-truth bounding boxes
Comparing hypotheses to Ground-Truth
- Overlap
  $$ O = \frac{\text{GT } \cap \text{ DET}}{\text{GT } \cup \text{ DET}} $$
  
  Also called Intersection over Union (IoU)
- Often used as threshold: Overlap>50%

Neural Network Based Face Detection, by Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998. ↩︎