Kalman Filter

The Kalman filter is an efficient recursive filter estimating the internal-state of a linear dynamic system from a series of noisy measurements.

Applications of Kalman filter include

  • Guidance
  • Navigation
  • Control of vehicles, aircraft, spacecraft, and ships positioned dynamically

💡 The basic idea of Kalman filter is to achieve the optimal estimate of the (hidden) internal state by weightedly combining the state prediction and the measurement.

截屏2022-07-01 23.04.50

Kalman Filter Summary

Kalman filter in a picture:

kalman_filter-Kalman_filter_summary.drawio

Summary of equations:

EquationEquation NameAlternative Names
Predict$\hat{\boldsymbol{x}}_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}_{n-1, n-1} + \mathbf{G} \boldsymbol{u}_{n}$State ExtrapolationPredictor Equation
Transition Equation
Prediction Equation
Dynamic Model
State Space Model
$\mathbf{P}_{n, n-1}=\mathbf{F} \mathbf{P}_{n-1, n-1} \mathbf{F}^{T}+\mathbf{Q}$Covariance ExtrapolationPredictor Covariance Equation
Update$\mathbf{K}_{n}=\mathbf{P}_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}_{n, n-1} \mathbf{H}^{T}+\mathbf{R}_{n}\right)^{-1}$Kalman GainWeight Equation
$\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}}=\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}+\mathbf{K}_{\boldsymbol{n}}\left(\boldsymbol{z}_{n}-\mathbf{H} \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}\right)$State UpdateFiltering Equation
$\mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1}$Covariance UpdateCorrector Equation
Auxilliary$\boldsymbol{z}_{n} = \mathbf{H} \boldsymbol{x}_n + \boldsymbol{v}_n$Measurement Equation
$\mathbf{R}_n = E\{\boldsymbol{v}_n \boldsymbol{v}_n^T\}$Measurement UncertaintyMeasurement Error
$\mathbf{Q}_n = E\{\boldsymbol{w}_n \boldsymbol{w}_n^T\}$Process Noise UncertaintyProcess Noise Error
$\mathbf{P}_{n, n}=E\left\{\boldsymbol{e}_{n} \boldsymbol{e}_{n}^{T}\right\}=E\left\{\left(\boldsymbol{x}_{n}-\hat{\boldsymbol{x}}_{n, n}\right)\left(\boldsymbol{x}_{n}-\hat{\boldsymbol{x}}_{n, n}\right)^{T}\right\}$Estimation UncertaintyEstimation Error

Summary of notations:

TermNameAlternative Term
$\boldsymbol{x}$State vector
$\boldsymbol{z}$Output vector$\boldsymbol{y}$
$\mathbf{F}$State transition matrix$\mathbf{\Phi}$, $\mathbf{A}$
$\boldsymbol{u}$Input variable
$\mathbf{G}$Control matrix$\boldsymbol{B}$
$\mathbf{P}$Estimate uncertainty$\boldsymbol{\Sigma}$
$\mathbf{Q}$Process noise uncertainty
$\mathbf{R}$Measurement uncertainty
$\boldsymbol{w}$Process noise vector
$\boldsymbol{v}$Measurement noise vector
$\mathbf{H}$Observation matrix$\mathbf{C}$
$\mathbf{K}$Kalman Gain
$n$Discrete time index$k$

Multidimensional Kalman Filter in Detail

A Kalman filter works by a two-phase process, including 5 main equations:

State extrapolation equation

The Kalman filter assumes that the true state of a system at time step $n$ evolved from the prior state at time step $n-1$ is

$$ \boldsymbol{x}_n = \mathbf{F} \boldsymbol{x}_{n-1} +\mathbf{G} \boldsymbol{u}_{n} + \boldsymbol{w}_n $$
  • $\boldsymbol{x}_{n}$ : state vector

  • $\boldsymbol{u}_{n}$ : control variable or input variable - a measurable (deterministic) input to the system

  • $\boldsymbol{w}_n$ : process noise or disturbance - an unmeasurable input that affects the state

  • $\mathbf{F}$ : state transition matrix - applies the effect of each system state parameter at time step $n-1$ on the system state at time step $n$

  • $\mathbf{G}$ : control matrix or input transition matrix (mapping control to state variables)

The state extrapolation equation

  • Predicts the next system state, based on the knowledge of the current state

  • Extrapolates the state vector from time step $n-1$ to $n$

  • Also called

    • Predictor Equation
    • Transition Equation
    • Prediction Equation
    • Dynamic Model
    • State Space Model
  • The general form in a matrix notation

    $$ \hat{\boldsymbol{x}}_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}_{n-1, n-1}+\mathbf{G} \boldsymbol{u}_{n} $$
    • $\hat{\boldsymbol{x}}_{n, n-1}$ : predicted system state vector at time step $n$

    • $\hat{\boldsymbol{x}}_{n-1, n-1}$ : estimated system state vector at time step $n-1$

    $\hat{\boldsymbol{x}}_{n, m}$ represents the estimate of $\boldsymbol{x}$ at time step $n$ given observation/measurements up to and including at time $m \leq n$

Example

Covariance extrapolation equation

The covariance extrapolation equation extrapolates the uncertainty in our state prediction.

$$ \mathbf{P}_{n, n-1}=\mathbf{F} \mathbf{P}_{n-1, n-1} \mathbf{F}^{T}+\mathbf{Q} $$
  • $\mathbf{P}_{n-1, n-1}$ : uncertainty (covariance matrix) of the estimate at time step $n-1$

    $$ \begin{aligned} \mathbf{P}_{n-1, n-1} &= E\{\underbrace{(\boldsymbol{x}_{n-1, n-1} - \hat{\boldsymbol{x}}_{n-1, n-1})}_{=: \boldsymbol{e}_n} (\boldsymbol{x}_{n-1, n-1} - \hat{\boldsymbol{x}}_{n-1, n-1}) ^T\} \\ & = E\{\boldsymbol{e}_n \boldsymbol{e}_n^T\} \end{aligned} $$
  • $\mathbf{P}_{n, n-1}$ : uncertainty (covariance matrix) of the prediction at time step $n$

  • $\mathbf{F}$ : state transition matrix

  • $\mathbf{Q}$ : process noise matrix

    $$ \mathbf{Q}_n = E\{\boldsymbol{w}_n \boldsymbol{w}_n^T\} $$
    • $\boldsymbol{w}_n$ : process noise vector
Derivation

At time step $n$, the Kalman filter assumes

$$ \boldsymbol{x}\_n = \mathbf{F} \boldsymbol{x}\_{n-1} +\mathbf{G} \boldsymbol{u}\_{n} + \boldsymbol{w}\_n $$

The prediction of state is

$$ \hat{\boldsymbol{x}}\_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}\_{n-1, n-1}+\mathbf{G} \boldsymbol{u}\_{n} $$

The difference between $\boldsymbol{x}\_n$ and $\hat{\boldsymbol{x}}\_{n, n-1}$ is

$$ \begin{aligned} \boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1} &=\mathbf{F} \boldsymbol{x}\_{n-1}+\mathbf{G} \boldsymbol{u}\_{n}+\boldsymbol{w}\_{n}-\left(\mathbf{F} \hat{\boldsymbol{x}}\_{n-1, n-1}+\mathbf{G} \boldsymbol{u}\_{n}\right) \\\\ &=\mathbf{F}\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)+\boldsymbol{w}\_{n} \end{aligned} $$

The variance associate with the prediction $\hat{\boldsymbol{x}}\_{n, n-1}$ of an unknow true state $\boldsymbol{x}\_n$ is

image-20220625180624325

Noting that the state estimation errors and process noise are uncorrelated:

$$ E\left\\{\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right) \boldsymbol{w}\_{n}^{T}\right\\} = E\left\\{\boldsymbol{w}\_{n}\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)^{T}\right\\} = 0 $$

Therefore

$$ \begin{aligned} \mathbf{P}\_{n, n-1} &=\underbrace{E\left\\{\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)^{T}\right\\}}\_{=\mathbf{P}\_{n-1, n-1}} \mathbf{F}^{T}+\underbrace{E\left\\{w\_{n} w\_{n}^{T}\right\\}}\_{=\mathbf{Q}} \\\\ &=\mathbf{F} \mathbf{P}\_{n-1, n-1}\mathbf{F}^T+\mathbf{Q} \end{aligned} $$

Kalman Gain equation

The Kalman Gain is calculated so that it minimizes the covariance of the a posteriori state estimate.

$$ \mathbf{K}_{n}=\mathbf{P}_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}_{n, n-1} \mathbf{H}^{T}+\mathbf{R}_{n}\right)^{-1} $$
Derivation

Rearrange the covariance update equation

$$ \begin{array}{l} \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}{\color{DodgerBlue} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}{\color{DodgerBlue}\left(\mathbf{I}-\left(\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \qquad | \text{ } \mathbf{I} = \mathbf{I}^T \\\\\\\\ \mathbf{P}\_{n, n}={\color{ForestGreen}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}}{\color{DodgerBlue}\left(\mathbf{I}-\mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \qquad | \text{ } (\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T\\\\\\\\ \mathbf{P}\_{n, n}={\color{ForestGreen}\left(\mathbf{P}\_{n, n-1}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)}\left(\mathbf{I}-\mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T}} \qquad | \text{ } \mathbf{AB}\mathbf{A}^T + \mathbf{AC}\mathbf{A}^T = \mathbf{A}(\mathbf{B} + \mathbf{C})\mathbf{A}^T \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\boldsymbol{\mathbf{R}}\_{n}\right) \mathbf{K}\_{n}^{T}} \end{array} $$

As the Kalman Filter is an optimal filter, we will seek a Kalman Gain that minimizes the estimate variance.

In order to minimize the estimate variance, we need to minimize the main diagonal (from the upper left to the lower right) of the covariance matrix $\mathbf{P}\_{n, n}$ .

The sum of the main diagonal of the square matrix is the trace of the matrix. Thus, we need to minimize $tr(\mathbf{P}\_{n, n})$ . In order to find the conditions required to produce a minimum, we will differentiate $tr(\mathbf{P}\_{n, n})$ w.r.t. $\mathbf{K}\_n$ and set the result to zero.

$$ \begin{array}{l} tr\left(\mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}}\right)=tr\left(\mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}-1}\right)-{\color{DarkOrange}tr\left(\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)}\\\\ {\color{DarkOrange} -tr\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)} + tr\left(\mathbf{K}\_{\boldsymbol{n}}\left(\mathbf{H} \mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}-\mathbf{1}} \mathbf{H}^{\boldsymbol{T}}+\mathbf{R}\_{\boldsymbol{n}}\right) \mathbf{K}\_{n}^{\boldsymbol{T}}\right) \qquad | \text{} tr(\mathbf{A}) = tr(\mathbf{A}^T)\\\\\\\\ tr\left(\mathbf{P}\_{n, n}\right)=tr\left(\mathbf{P}\_{n, n-1}\right)-{\color{DarkOrange}2 tr\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)}\\\\ +tr\left(\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{\boldsymbol{T}}+\mathbf{R}\_{n}\right) \mathbf{K}\_{n}^{T}\right)\\\\\\\\ \frac{d}{d \mathbf{K}\_{n}}t r\left(\mathbf{P}\_{n, n}\right)={\color{DodgerBlue} \frac{d}{d \mathbf{K}\_{n}}t r\left(\mathbf{P}\_{n, n-1}\right)}-{\color{ForestGreen}\frac{d }{d \mathbf{K}\_{n}}2 t r\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)} \\\\ +{\color{MediumOrchid}\frac{d tr(\mathbf{K}\_{n}(\mathbf{H}\mathbf{P}\_{n, n-1}\mathbf{H}^T + \mathbf{R}\_n)\mathbf{K}\_{n}^T)}{d\mathbf{K}\_{n}}} \overset{!}{=} 0 \quad \mid {\color{ForestGreen} \frac{d}{d \mathbf{A}}tr(\mathbf{A} \mathbf{B}) = \mathbf{B}^T},{\color{MediumOrchid} \frac{d}{d \mathbf{A}}tr(\mathbf{A} \mathbf{B} \mathbf{A}^T) = 2\mathbf{A} \mathbf{B}}\\\\\\\\ \frac{d\left(t r\left(\mathbf{P}\_{n, n}\right)\right)}{d \mathbf{K}\_{n}}={\color{DodgerBlue}0}-{\color{ForestGreen}2\left(\mathbf{H} \mathbf{P}\_{ n , n - 1 }\right)^{T}}\\\\ +{\color{MediumOrchid}2 \mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)}=0\\\\\\\\ {\color{ForestGreen}\left(\mathbf{H} \mathbf{P}\_{n, n-1}\right)^{T}}={\color{MediumOrchid}\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)} \\\\\\\\ \mathbf{K}\_{n}=\left(\mathbf{H} \mathbf{P}\_{n, n-1}\right)^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \quad \mid (\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T \\\\\\\\ \mathbf{K}\_{n}=\mathbf{P}\_{n, n-1}^{T} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \quad | \text{ covariance matrix } \mathbf{P} \text{ symmetric } (\mathbf{P}^T = \mathbf{P})\\\\\\\\ \mathbf{K}\_{n}=\mathbf{P}\_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \end{array} $$

Kalman Gain intuition

The Kalman gain tells how much I should refine my prediction (i.e., the a priori estimate) by given a measurement.

We show the intuition of Kalman Gain with a one-dimensional Kalman filter.

The one-dimensional Kalman Gain is

$$ \boldsymbol{K}_{\boldsymbol{n}}=\frac{p_{n, n-1}}{p_{n, n-1}+r_{n}} \in [0, 1] $$
  • $p_{n, n-1}$ : variance of the state prediction $\hat{x}_{n, n-1}$
  • $r_n$ : variance of the measurement $z_n$

(Derivation see here)

Let’s rewrite the (one-dimensional) state update equation:

$$ \hat{x}_{n, n}=\hat{x}_{n, n-1}+K_{n}\left(z_{n}-\hat{x}_{n, n-1}\right)=\left(1-K_{n}\right) \hat{x}_{n, n-1}+K_{n} z_{n} $$

The Kalman Gain $K_n$ is the weight that we give to the measurement. And $(1 - K_n)$ is the weight that we give to the state prediction.

  • High Kalman Gain

    A low measurement uncertainty (small $r_n$) relative to the prediction uncertainty would result in a high Kalman Gain (close to 1). The new estimate would be close to the measurement.

    💡 Intuition

    small $r_n \rightarrow$ accurate measurements $\rightarrow$ place more weight on the measurements and thus conform to them

    kalman_filter-high_Kalman_Gain.drawio
  • Low Kalman Gain

    A high measurement uncertainty (large $r_n$) relative to the prediction uncertainty would result in a low Kalman Gain (close to 0). The new estimate would be close to the prediction.

    💡 Intuition

    large $r_n \rightarrow$ measurements are not accurate $\rightarrow$ place more weight on the prediction and trust them more

    kalman_filter-low_Kalman_Gain.drawio

State update equation

The state update equation updates/refines/corrects the state prediction with measurements.

$$ \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}}=\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}+\mathbf{K}_{\boldsymbol{n}}\underbrace{\left(\boldsymbol{z}_{n}-\mathbf{H} \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}\right)}_{\text{innovation}} $$
  • $\hat{\boldsymbol{x}}_{n, n}$ : estimated system state vector at time step $n$

  • $\hat{\boldsymbol{x}}_{n, n-1}$ : predicted system state vector at time step $n$

  • $\mathbf{K}_{\boldsymbol{n}}$ : Kalman Gain

  • $\mathbf{H}$ : observation matrix

  • $\boldsymbol{z}_{n}$ : measurement at time step $n$

    $$ \boldsymbol{z}_{n} = \mathbf{H} \boldsymbol{x}_n + \boldsymbol{v}_n $$
    • $\boldsymbol{x}_n$ : true system state (hidden state)

    • $\boldsymbol{v}_n$ : measurement noise

      $\rightarrow$ Measurement uncertainty $\mathbf{R}_n$ is given by

      $$ \mathbf{R}_n = E\{\boldsymbol{v}_n \boldsymbol{v}_n^T\} $$

Covariance update equation

The covariance update equation updates the uncertainty of state estimate on the base of covariance prediction

$$ \mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1}\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right)^{T}+\mathbf{K}_{n} \mathbf{R}_{n} \mathbf{K}_{n}^{T} $$
Derivation

According to state update equation:

$$ \begin{aligned} \hat{\boldsymbol{x}}\_{n, n} &= \hat{\boldsymbol{x}}\_{n, n-1}+\mathbf{K}\_{n}\left(\boldsymbol{z}\_{n}-\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\right) \\\\\\\\ &= \hat{\boldsymbol{x}}\_{n, n-1}+\mathbf{K}\_{n}\left(\mathbf{H} \boldsymbol{x}\_n + \boldsymbol{v}\_n-\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\right) \end{aligned} $$

The estimation error between the true (hidden) state $\boldsymbol{x}\_n$ and estimate $\hat{\boldsymbol{x}}\_{n, n}$ is:

$$ \begin{aligned} \boldsymbol{e}\_n &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n} \\\\ &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1} - \mathbf{K}\_{n}\mathbf{H}\boldsymbol{x}\_n - \mathbf{K}\_{n}\boldsymbol{v}\_n + \mathbf{K}\_{n}\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\\\\ &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1} - \mathbf{K}\_{n}\mathbf{H}(\boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1}) - \mathbf{K}\_{n}\boldsymbol{v}\_n \\\\ &= (\mathbf{I} - \mathbf{K}\_{n}\mathbf{H})(\boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1}) - \mathbf{K}\_{n}\boldsymbol{v}\_n \end{aligned} $$

Estimate Uncertainty

$$ \begin{array}{l} \boldsymbol{\mathbf{P}}\_{n, n}=E\left(\boldsymbol{e}\_{n} \boldsymbol{e}\_{n}^{T}\right)=E\left(\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n}\right)^{T}\right) \qquad | \text{ Plug in } \boldsymbol{e}\_n\\\\\\\\ \boldsymbol{\mathbf{P}}\_{n, n}=E\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right) \right.\\\\ \left.\times\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\left(\mathbf{I}-\boldsymbol{\mathbf{K}}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1}\right)-\boldsymbol{\mathbf{K}}\_{n} v\_{n}\right) \right.\\\\ \left.\times\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\right)^{T}-\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\right) \qquad | \text{ }(\mathbf{A} \mathbf{B})^{T}=\mathbf{B}^{T} \mathbf{A}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right) \right. \\\\ \left.\times\left(\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}-\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\right)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right.\\\\ -\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\\\\ -\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\\\\ \left.+\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right) \qquad | \text{ } E(X \pm Y)=E(X) \pm E(Y)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ -\color{red}{E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)}\\\\ -\color{red}{E\left(\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)}\\\\ +E\left(\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right) \end{array} $$

$\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}$ is the error of the prior estimate in relation to the true value. It is uncorrelated with the current measurement noise $\boldsymbol{v}\_n$. The expectation value of the product of two independent variables is zero.

$$ \begin{aligned} \color{red}{E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)} = 0 \\\\ \color{red}{E\left(\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)} = 0 \end{aligned} $$

Therefore

$$ \begin{array}{l} \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ +{\color{DodgerBlue}{E\left(\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)}} \qquad | \text{ }(\mathbf{A} \mathbf{B})^{T}=\mathbf{B}^{T} \mathbf{A}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ +{\color{DodgerBlue}{E\left(\mathbf{K}\_{n} v\_{n} v\_{n}^T \mathbf{K}\_{n}^T\right)}} \qquad | \text{ } E(a X)=a E(X) \\\\\\\\ \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) {\color{ForestGreen}\underbrace{{E\left(\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\right)}}\_{=\mathbf{P}\_{n, n-1}}}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}{\color{DodgerBlue}{\underbrace{E\left( v\_{n} v\_{n}^T \right)}\_{=\mathbf{R}\_n}}} \mathbf{K}\_{n}^T \\\\\\\\ \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) {\color{ForestGreen}\mathbf{P}\_{n, n-1}} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}{\color{DodgerBlue}\mathbf{R}\_n} \mathbf{K}\_{n}^T \end{array} $$

In many textbook you can see a simplified form:

$$ \mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1} $$

This equation is elegant and easier to remember and in many cases it performs well.

However, even the smallest error in computing the Kalman Gain (due to round off) can lead to huge computation errors. The subtraction $\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right)$ can lead to nonsymmetric matrices due to floating-point errors. Therefore this equation is numerically unstable!

Derivation of a simplified form of the Covariance Update Equation
$$ \begin{array}{l} \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}\mathbf{R}\_n \mathbf{K}\_{n}^T \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \boldsymbol{\mathbf{H}}^{T} \mathbf{K}\_{n}^{T}-\boldsymbol{\mathbf{K}}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n}}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right) \mathbf{K}\_{n}^{T} \qquad | \text{ Substitute Kalman Gain}\\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{P}\_{n, n-1}} \underbrace{{\color{MediumOrchid}{\mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1}}}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)}\_{=1} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\\\\\ \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1} \end{array} $$

Reference