Face Detection: Neural-Network-Based


  • Idea: Use a search-window to scan over an image
  • Train a classifier to decide whether the search windows contains a face or not?

Simple neuron model

Adjustable Parameters are

  • Connection weights (to be learned)

  • Activation function (fixed)

  • Number of layers (fixed)

  • Number of neurons per layer (fixed)


Backpropagation with gradient descent

Neural Network Based Face Detection1

  • Idea: Use an artifical neural network to detect upright frontal faces
    • Network receives as input a 20x20 pixel region of an image

    • output ranges from -1 (no face present) to +1 (face present)

    • the neural network „face-filter“ is applied at every location in the image

    • to detect faces with different sizes, the input image is repeatedly scaled down

Network Topology

  • 20x20 pixel input retina
  • 4 types of receptive hidden fields
  • One real-valued output

System Overview

Network Training

Training Set

  • 1050 normalized face images

  • 15 face images generated by rotating and scaling original face images

  • 1000 randomly chosen non-face images


  • correct for different lighting conditions (overall brightness, shadows)
  • rescale images to fixed size

Histogram equalization

  • Defines a mapping of gray levels $p$ into gray levels $q$ such that the distribution of $q$ is close to being uniform

  • Stretches contrast (expands the range of gray levels)

  • Transforms different input images so that they have similar intensity distributions (thus reducing the effect of different illumination)

  • Example

  • Algorithm

    • The probability of an occurrence of a pixel of level $i$ in the image:

      $$ p\left(x\_{i}\right)=\frac{n\_{i}}{n}, \qquad i \in 0, \ldots, L-1 $$
      • $L$: number of gray levels
      • $n\_i$: number of occurences of gray level $i$
    • Define $c$ as the cumulative distribution function:

      $$ c(i)=\sum\_{j=0}^{i} p\left(x\_{j}\right) $$
    • Create a transformation of the form

      $$ y\_i = T(x\_i) = c(i), \qquad y\_i \in [0, 1] $$

      will produce a level $y$ for each level $x$ in the original image, such that the cumulative probability function of $y$ will be linearized across the value range.

      $$ y\_{i}^{\prime}=y\_{i} \cdot(\max -\min )+\min $$

Training Procedure

  1. Randomly choose 1000 non-face images

  2. Train network to produce 1 for faces, -1 for non-faces

  3. Run network on images containing no faces. Collect subimages in which network incorrectly identifes a face (output > 0)

  4. Select up to 250 of these „false positives“ at random and add them to the training set as negative examples

Neural Network Based Face Filter

  • Output of ANN defines a filter for faces

  • Search

    • Scan input image with search window, apply ANN to search window

    • Input image needs to be rescaled in order to detect faces with different size

  • Output needs to be post-processed

    • Noise removal

    • Merging overlapping detections

  • Speed up can be achieved

    • Increase step size

    • Make ANN more flexible to translation

    • Hierarchical, pyramidal search

Localization and Ground-Truth

  • For localization, the test data is mostly annotated with ground-truth bounding boxes

  • Comparing hypotheses to Ground-Truth

    • Overlap

      $$ O = \frac{\text{GT } \cap \text{ DET}}{\text{GT } \cup \text{ DET}} $$

      Also called Intersection over Union (IoU)

    • Often used as threshold: Overlap>50%

