Tracking

Introduction

Tracking Vs. Detection

  • Detection: Find an object in a single image

    • Face, person, body part, facial landmarks, …
    • No assumption about dynamics, temporal consistency made
  • Tracking:

    • determine a target’s locations (and/or rotation, deformation, pose, …) over a sequence of images

      i.e.: determine the target’s state (location and/or rotation, deformation, pose, …) over a sequence of observations derived from images

    • Provides object positions (etc.) in each frame

Motivation

  • Use more than one image to analyse the scene
  • Use a-priori knowledge to improve analysis
    • system dynamics, imaging / measurment process,

Target types

  • Single objects: face, person, …
  • Multiple objects: group of people, head and hands, …
  • Articulated body: full body, hand

Sensor setup

  • Single camera
  • Multiple cameras
  • Active cameras
  • Cameras + microphones

observations used for tracking

  • Templates
  • Color
  • Foreground-Background segmentation Edges
  • Dense Disparity
  • Optical flow
  • Detectors (body, body parts)

Tracking as State Estimation

  • Want to predict state of the system (position, pose, …)
    • But state cannot directly be measured
  • Only certain observations (measurements) can be made
    • But Observations are noisy! (due to measurement errors)

What is the most likely state $x$ of the system at a given time, given a sequence of observations $Z_t$ ? $$ \arg \max p\left(x_{t} \mid Z_{t}\right) $$

  • $x_t$: state of the system at time $t$

  • $z_t$: Observation / measurement about the certain aspects of the system at

    time $t$

  • Observations up to time $t$: $z_{1:t}$ or $Z_t$

Bayes Filter

截屏2021-07-19 23.53.57

  • Assume state $x$ to be Markov process $$ p\left(x_{t} \mid x_{t-1}, x_{t-2}, . ., x_{0}\right)=p\left(x_{t} \mid x_{t-1}\right) $$

  • States $x$ generate observations $z$ $$ p\left(z_{t} \mid x_{t}, x_{t-1}, . ., x_{0}\right)=p\left(z_{t} \mid x_{t}\right) $$

  • Want to estimate most likely state $x_t$ given sequence $Z_t$: $$ \arg \max p\left(x_{t} \mid Z_{t}\right) $$

  • Can be estimated recursively

    截屏2021-07-20 10.01.52

  • Need:

    • Process model: $p(x_t | x_{t-1})$
    • Measurement model: $p(z_t | x_t)$

Helpful resource:

Kalman filter

  • An instance of a Bayes filter
  • Assumes
    • Linear state propagation and measurement model
    • Gaussian process and measurement noise

The process to be estimated: $$ \begin{array}{ll} x_{k}=A x_{k-1}+w_{k-1} & \quad p(w) \sim N(0, Q) \\ z_{k}=H x_{k}+v_{k} & \quad p(v) \sim N(0, R) \end{array} $$

  • $x_k$: state at time $k$
  • $A$: transition matrix
  • $z_k$: obeservation at time $k$
  • $H$: measurement matrix
  • $p(w) \sim N(0, Q)$: process noise
  • $p(v) \sim N(0, R)$: measurement noise

截屏2021-07-20 10.16.25

Note:

  • The simple Kalman Filter is NOT applicable, when the process to be estimated is NOT linear or the measurement relationship to the process is NOT linear.

    $\rightarrow$ The Extended Kalman Filter (EKF) linearizes about the current mean and covariance

Paticle Filter

Helpful resources:

  • The Kalman Filter often fails when the measurement density is multimodal / non-Gaussian.
  • A Particle Filter represents and propagates arbitrary probability distributions. They are represented by a set of weighted samples.
    • The Particle Filtering is a numerical technique (unlike the Kalman filter which is analytical).
    • Like a Kalman Filter, a Particle Filter incorporates a dynamic model describing system dynamics

Bayesian Tracking

Bayes rule applied to tracking $$ \arg \max _{x_{t}} p\left(x_{t} \mid Z_{t}\right)=\arg \max _{x_{t}} p\left(z_{t} \mid x_{t}\right) p\left(x_{t} \mid Z_{t-1}\right) $$

$$ p\left(x_{t} \mid Z_{t-1}\right)=\int_{x_{t-1}} p\left(x_{t} \mid x_{t-1}\right) p\left(x_{t-1} \mid Z_{t-1}\right) $$

Simplifying assumption (Markov): $$ p\left(x_{t} \mid X_{t-1}\right)=p\left(x_{t} \mid x_{t-1}\right) $$ where

  • $x_t$: state at time $t$
  • $z_t$: observation at time $t$
  • $X_t$: history of states up to the time $t$
  • $Z_t$: history of observations up to $t$

Observation and Motion Model

  • $p(z_t | x_t)$: The likelihood that the $z_t$ is observed, given that the true state of the system is represented by $x_t$
  • $p(x_{t} | x_{t-1})$: The likelihood that the state of the system is $x_t$ when the previous state was $x_{t-1}$

Factored Sampling

Probability density function is represented by weighted samples (“particles“)

截屏2021-07-20 16.05.42

Particle Filter (PF)

For a PF tracker, you need

  • a set of $N$ weighted samples (particle) at time $k$ $$ \left\{\left(s_{k}^{(i)}, \pi_{k}^{(i)}\right) \mid i=1 \dots N\right\} $$

  • the motion model $$ s_{k}^{(i)} \leftarrow s_{k-1}^{(i)} $$

  • the observation model $$ \pi_{k}^{(i)} \leftarrow s_{k}^{(i)} $$

The Condensation Algorithm

A popular instance of a particle filter in Computer Vision

  1. Select

    Randomly select $N$ new samples $S_{k}^{(i)}$ from the old sample set $S_{k-1}^{(i)}$ according to their weights $\pi_{k-1}^{(i)}$

  2. Predict

    Propagate the samples using the motion model

  3. Measure

    Calculate weights for the new samples using the observation model $$ \pi_{k}^{(i)}=p\left(z_{k} \mid x_{k}=s_{k}^{(i)}\right) $$

Illustration:

截屏2021-07-20 16.16.46

How to get the target position?

  • Cluster the particle set and search for the highest mode
  • Just take the strongest particle

How many particles are needed?

  • Depends strongly on the dimension of the state space!
  • Tracking 1 object in the image plane typically requires 50-500 particles

Problem

The Dimensionality Problem

截屏2021-07-20 16.18.25

Examples

Tracking one Face with a Particle Filter

截屏2021-07-20 16.28.40
  • State: ($x$, $y$, scale)

  • Observations: skin color

  • Procedure:

    1. Select and predict samples

    2. Measurement step

      • For each particle

        • Count supporting skin pixels in box defined by ($x$, $y$, scale)
        • Particle weights determined based on skin color support
      • Particle with maximum weight choosen as best solution

Tracking multiple objects

Two different approaches:

  • A dedicated tracker for each of the objects
    • Start with one tracker, once an object is tracked, initialize one more tracker to search for more objects
    • Typically fast and well parallelizable
    • Optimal global assignment / tracking difficult to find, Information has to be shared across trackers to find a good assignment
  • A single tracker in a joint state space
    • Easier to find optimal assignment
    • Number of objects has to be known in advance
    • State space becomes high dimensional (curse of dimensionality)

Face and Head Pose Tracking

  • Particle filter: Head-pose estimation integrated in the tracker
  • Observation model
    • Use bank of face detectors for different poses
    • Update particle weights with score of matching detector, i.e. the detector with closest angle to hypothesis
  • Dynamical model: Gaussian noise, no explicit velocity model
  • Occlusion handling
    • Set particle weight to zero, if it is too close to another track’s center
Previous
Next