Face Detection: Neural-Network-Based
Motivation
- Idea: Use a search-window to scan over an image
- Train a classifier to decide whether the search windows contains a face or not?
Detection
Simple neuron model
Topologies
Parameters
Adjustable Parameters are
Connection weights (to be learned)
Activation function (fixed)
Number of layers (fixed)
Number of neurons per layer (fixed)
Training
Backpropagation with gradient descent
Neural Network Based Face Detection1
- Idea: Use an artifical neural network to detect upright frontal faces
Network receives as input a 20x20 pixel region of an image
output ranges from -1 (no face present) to +1 (face present)
the neural network „face-filter“ is applied at every location in the image
to detect faces with different sizes, the input image is repeatedly scaled down
Network Topology
- 20x20 pixel input retina
- 4 types of receptive hidden fields
- One real-valued output
System Overview
Network Training
Training Set
1050 normalized face images
15 face images generated by rotating and scaling original face images
1000 randomly chosen non-face images
Preprocessing
- correct for different lighting conditions (overall brightness, shadows)
- rescale images to fixed size
Histogram equalization
Defines a mapping of gray levels $p$ into gray levels $q$ such that the distribution of $q$ is close to being uniform
Stretches contrast (expands the range of gray levels)
Transforms different input images so that they have similar intensity distributions (thus reducing the effect of different illumination)
Example
Algorithm
The probability of an occurrence of a pixel of level $i$ in the image:
$$ p\left(x\_{i}\right)=\frac{n\_{i}}{n}, \qquad i \in 0, \ldots, L-1 $$- $L$: number of gray levels
- $n\_i$: number of occurences of gray level $i$
Define $c$ as the cumulative distribution function:
$$ c(i)=\sum\_{j=0}^{i} p\left(x\_{j}\right) $$Create a transformation of the form
$$ y\_i = T(x\_i) = c(i), \qquad y\_i \in [0, 1] $$will produce a level $y$ for each level $x$ in the original image, such that the cumulative probability function of $y$ will be linearized across the value range.
$$ y\_{i}^{\prime}=y\_{i} \cdot(\max -\min )+\min $$
Training Procedure
Randomly choose 1000 non-face images
Train network to produce 1 for faces, -1 for non-faces
Run network on images containing no faces. Collect subimages in which network incorrectly identifes a face (output > 0)
Select up to 250 of these „false positives“ at random and add them to the training set as negative examples
Neural Network Based Face Filter
Output of ANN defines a filter for faces
Search
Scan input image with search window, apply ANN to search window
Input image needs to be rescaled in order to detect faces with different size
Output needs to be post-processed
Noise removal
Merging overlapping detections
Speed up can be achieved
Increase step size
Make ANN more flexible to translation
Hierarchical, pyramidal search
Localization and Ground-Truth
For localization, the test data is mostly annotated with ground-truth bounding boxes
Comparing hypotheses to Ground-Truth
Overlap
$$ O = \frac{\text{GT } \cap \text{ DET}}{\text{GT } \cup \text{ DET}} $$Also called Intersection over Union (IoU)
Often used as threshold: Overlap>50%
Neural Network Based Face Detection, by Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998. ↩︎