Tracking 2 | Haobin Tan

Tracking 2

Multi-Camera Systems

Stereo-camera system (narrow baseline)
- Close distance and equal orientation
- An object’s appearance is almost the same in both cameras
- Allows for calculation of a dense disparity map
Wide-baseline multi-camera system
- Arbitrary distance and orientation, overlapping field of view
- An object’s appearance is different in each of the cameras
- Allows for 3D localization of objects in the joint field of view
Multi-camera network
- Non-overlapping field of view
- An object’s appearance differs strongly from one camera to another

Summary:

z^{\prime} = -f

\frac{y^{\prime}}{-f}=\frac{y}{z} \Rightarrow y^{\prime}=\frac{-f y}{z}

\frac{x^{\prime}}{-f}=\frac{x}{z} \Rightarrow x^{\prime}=\frac{-f x}{z}

Pixel coordinates $(u, v)$ of the projected points on image plane

\begin{array}{l} \boldsymbol{u}=\boldsymbol{k}\_{u} \boldsymbol{x}^{\prime}+\boldsymbol{u}\_{\mathrm{0}} \\\\ \boldsymbol{v}=-\boldsymbol{k}\_{v} \boldsymbol{y}^{\prime}+\boldsymbol{v}\_{\mathrm{0}} \end{array}

where $k\_u$ and $k\_v$ are scaling factors which denote the ratio between world and pixel coordinates.

In matrix formulation:

\left(\begin{array}{l} u \\\\ v \end{array}\right)=\left(\begin{array}{cc} k\_{u} & 0 \\\\ 0 & -k\_{v} \end{array}\right)\left(\begin{array}{l} x^{\prime} \\\\ y^{\prime} \end{array}\right)+\left(\begin{array}{l} u\_{0} \\\\ v\_{0} \end{array}\right)

Perspective Projection

internal camera parameters
$\begin{array}{l} \alpha\_{u}=k\_{u} f \\\\ \alpha\_{v}=-k\_{v} f \\\\ u\_{0} \\\\ v\_{0} \end{array}$
- have to be known to perform the projection
- they depend on the camera only
- Perform calibration to estimate

Intrinsics parameters: describe the optical properties of each camera (“the camera model”)

$f$ : focal length
$c\_x, c\_y$ : the principal point (“optical center”), sometimes also denoted as $u\_0, v\_0$
$K\_1, \dots, K\_n$ : distortion parameters (radial and tangential)

Extrinsic parameters: describe the location of each camera with respect to a global coordinate system

Transformation of world coordinate of point $p^* = (x, y, z)$ to camera coordinate $p$ :

p = \mathbf{R} (x, y, z)^T + \mathbf{T}

Calibration steps

For each camera: A calibration target with a known geometry is captured from multiple views
The corner points are extracted (semi-)automatically
The locations of the corner points are used to estimate the intrinsics iteratively
Once the intrinsics are known, a fixed calibration target is captured from all of the camerasextrinsics

Last updated on 2024-09-05