Tracking

Introduction

Tracking Vs. Detection

  • Detection: Find an object in a single image

    • Face, person, body part, facial landmarks, …
    • No assumption about dynamics, temporal consistency made
  • Tracking:

    • determine a target’s locations (and/or rotation, deformation, pose, …) over a sequence of images

      i.e.: determine the target’s state (location and/or rotation, deformation, pose, …) over a sequence of observations derived from images

    • Provides object positions (etc.) in each frame

Motivation

  • Use more than one image to analyse the scene
  • Use a-priori knowledge to improve analysis
    • system dynamics, imaging / measurment process,

Target types

  • Single objects: face, person, …
  • Multiple objects: group of people, head and hands, …
  • Articulated body: full body, hand

Sensor setup

  • Single camera
  • Multiple cameras
  • Active cameras
  • Cameras + microphones

observations used for tracking

  • Templates
  • Color
  • Foreground-Background segmentation Edges
  • Dense Disparity
  • Optical flow
  • Detectors (body, body parts)

Tracking as State Estimation

  • Want to predict state of the system (position, pose, …)
    • But state cannot directly be measured
  • Only certain observations (measurements) can be made
    • But Observations are noisy! (due to measurement errors)

What is the most likely state xx of the system at a given time, given a sequence of observations Z_tZ\_t ?

argmaxp(x_tZ_t) \arg \max p\left(x\_{t} \mid Z\_{t}\right)
  • x_tx\_t: state of the system at time tt

  • z_tz\_t: Observation / measurement about the certain aspects of the system at

    time tt

  • Observations up to time tt: z_1:tz\_{1:t} or Z_tZ\_t

Bayes Filter

截屏2021-07-19 23.53.57

  • Assume state xx to be Markov process

    p(x_tx_t1,x_t2,..,x_0)=p(x_tx_t1) p\left(x\_{t} \mid x\_{t-1}, x\_{t-2}, . ., x\_{0}\right)=p\left(x\_{t} \mid x\_{t-1}\right)
  • States xx generate observations zz

    p(z_tx_t,x_t1,..,x_0)=p(z_tx_t) p\left(z\_{t} \mid x\_{t}, x\_{t-1}, . ., x\_{0}\right)=p\left(z\_{t} \mid x\_{t}\right)
  • Want to estimate most likely state x_tx\_t given sequence Z_tZ\_t:

    argmaxp(x_tZ_t) \arg \max p\left(x\_{t} \mid Z\_{t}\right)
  • Can be estimated recursively

    截屏2021-07-20 10.01.52

  • Need:

    • Process model: p(x_tx_t1)p(x\_t | x\_{t-1})
    • Measurement model: p(z_tx_t)p(z\_t | x\_t)

Helpful resource:

Kalman filter

  • An instance of a Bayes filter
  • Assumes
    • Linear state propagation and measurement model
    • Gaussian process and measurement noise

The process to be estimated:

x_k=Ax_k1+w_k1p(w)N(0,Q)z_k=Hx_k+v_kp(v)N(0,R) \begin{array}{ll} x\_{k}=A x\_{k-1}+w\_{k-1} & \quad p(w) \sim N(0, Q) \\\\ z\_{k}=H x\_{k}+v\_{k} & \quad p(v) \sim N(0, R) \end{array}
  • x_kx\_k: state at time kk
  • AA: transition matrix
  • z_kz\_k: obeservation at time kk
  • HH: measurement matrix
  • p(w)N(0,Q)p(w) \sim N(0, Q): process noise
  • p(v)N(0,R)p(v) \sim N(0, R): measurement noise

截屏2021-07-20 10.16.25

Note:

  • The simple Kalman Filter is NOT applicable, when the process to be estimated is NOT linear or the measurement relationship to the process is NOT linear.

    \rightarrow The Extended Kalman Filter (EKF) linearizes about the current mean and covariance

Paticle Filter

Helpful resources:

  • The Kalman Filter often fails when the measurement density is multimodal / non-Gaussian.
  • A Particle Filter represents and propagates arbitrary probability distributions. They are represented by a set of weighted samples.
    • The Particle Filtering is a numerical technique (unlike the Kalman filter which is analytical).
    • Like a Kalman Filter, a Particle Filter incorporates a dynamic model describing system dynamics

Bayesian Tracking

Bayes rule applied to tracking

argmax_x_tp(x_tZ_t)=argmax_x_tp(z_tx_t)p(x_tZ_t1) \arg \max \_{x\_{t}} p\left(x\_{t} \mid Z\_{t}\right)=\arg \max \_{x\_{t}} p\left(z\_{t} \mid x\_{t}\right) p\left(x\_{t} \mid Z\_{t-1}\right) p(x_tZ_t1)=_xt1p(x_tx_t1)p(x_t1Z_t1) p\left(x\_{t} \mid Z\_{t-1}\right)=\int\_{x_{t-1}} p\left(x\_{t} \mid x\_{t-1}\right) p\left(x\_{t-1} \mid Z\_{t-1}\right)

Simplifying assumption (Markov):

p(x_tX_t1)=p(x_tx_t1) p\left(x\_{t} \mid X\_{t-1}\right)=p\left(x\_{t} \mid x\_{t-1}\right)

where

  • x_tx\_t: state at time tt
  • z_tz\_t: observation at time tt
  • X_tX\_t: history of states up to the time tt
  • Z_tZ\_t: history of observations up to tt

Observation and Motion Model

  • p(z_tx_t)p(z\_t | x\_t): The likelihood that the z_tz\_t is observed, given that the true state of the system is represented by x_tx\_t
  • p(x_tx_t1)p(x\_{t} | x\_{t-1}): The likelihood that the state of the system is x_tx\_t when the previous state was x_t1x\_{t-1}

Factored Sampling

Probability density function is represented by weighted samples (“particles“)

截屏2021-07-20 16.05.42

Particle Filter (PF)

For a PF tracker, you need

  • a set of NN weighted samples (particle) at time kk

    \left\\{\left(s\_{k}^{(i)}, \pi\_{k}^{(i)}\right) \mid i=1 \dots N\right\\}
  • the motion model

    s_k(i)s_k1(i) s\_{k}^{(i)} \leftarrow s\_{k-1}^{(i)}
  • the observation model

    π_k(i)s_k(i) \pi\_{k}^{(i)} \leftarrow s\_{k}^{(i)}

The Condensation Algorithm

A popular instance of a particle filter in Computer Vision

  1. Select

    Randomly select NN new samples S_k(i)S\_{k}^{(i)} from the old sample set S_k1(i)S\_{k-1}^{(i)} according to their weights π_k1(i)\pi\_{k-1}^{(i)}

  2. Predict

    Propagate the samples using the motion model

  3. Measure

    Calculate weights for the new samples using the observation model

    π_k(i)=p(z_kx_k=s_k(i)) \pi\_{k}^{(i)}=p\left(z\_{k} \mid x\_{k}=s\_{k}^{(i)}\right)

Illustration:

截屏2021-07-20 16.16.46

How to get the target position?

  • Cluster the particle set and search for the highest mode
  • Just take the strongest particle

How many particles are needed?

  • Depends strongly on the dimension of the state space!
  • Tracking 1 object in the image plane typically requires 50-500 particles

Problem

The Dimensionality Problem

截屏2021-07-20 16.18.25

Examples

Tracking one Face with a Particle Filter

截屏2021-07-20 16.28.40
  • State: (xx, yy, scale)

  • Observations: skin color

  • Procedure:

    1. Select and predict samples

    2. Measurement step

      • For each particle

        • Count supporting skin pixels in box defined by (xx, yy, scale)
        • Particle weights determined based on skin color support
      • Particle with maximum weight choosen as best solution

Tracking multiple objects

Two different approaches:

  • A dedicated tracker for each of the objects
    • Start with one tracker, once an object is tracked, initialize one more tracker to search for more objects
    • Typically fast and well parallelizable
    • Optimal global assignment / tracking difficult to find, Information has to be shared across trackers to find a good assignment
  • A single tracker in a joint state space
    • Easier to find optimal assignment
    • Number of objects has to be known in advance
    • State space becomes high dimensional (curse of dimensionality)

Face and Head Pose Tracking

  • Particle filter: Head-pose estimation integrated in the tracker
  • Observation model
    • Use bank of face detectors for different poses
    • Update particle weights with score of matching detector, i.e. the detector with closest angle to hypothesis
  • Dynamical model: Gaussian noise, no explicit velocity model
  • Occlusion handling
    • Set particle weight to zero, if it is too close to another track’s center