Kalman Filter

The Kalman filter is an efficient recursive filter estimating the internal-state of a linear dynamic system from a series of noisy measurements.

Applications of Kalman filter include

  • Guidance
  • Navigation
  • Control of vehicles, aircraft, spacecraft, and ships positioned dynamically

πŸ’‘ The basic idea of Kalman filter is to achieve the optimal estimate of the (hidden) internal state by weightedly combining the state prediction and the measurement.

ζˆͺ屏2022-07-01 23.04.50

Kalman Filter Summary

Kalman filter in a picture:

kalman_filter-Kalman_filter_summary.drawio

Summary of equations:

EquationEquation NameAlternative Names
Predictx^n,nβˆ’1=Fx^nβˆ’1,nβˆ’1+Gun\hat{\boldsymbol{x}}_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}_{n-1, n-1} + \mathbf{G} \boldsymbol{u}_{n}State ExtrapolationPredictor Equation
Transition Equation
Prediction Equation
Dynamic Model
State Space Model
Pn,nβˆ’1=FPnβˆ’1,nβˆ’1FT+Q\mathbf{P}_{n, n-1}=\mathbf{F} \mathbf{P}_{n-1, n-1} \mathbf{F}^{T}+\mathbf{Q}Covariance ExtrapolationPredictor Covariance Equation
UpdateKn=Pn,nβˆ’1HT(HPn,nβˆ’1HT+Rn)βˆ’1\mathbf{K}_{n}=\mathbf{P}_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}_{n, n-1} \mathbf{H}^{T}+\mathbf{R}_{n}\right)^{-1}Kalman GainWeight Equation
x^n,n=x^n,nβˆ’1+Kn(znβˆ’Hx^n,nβˆ’1)\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}}=\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}+\mathbf{K}_{\boldsymbol{n}}\left(\boldsymbol{z}_{n}-\mathbf{H} \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}\right)State UpdateFiltering Equation
Pn,n=(Iβˆ’KnH)Pn,nβˆ’1\mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1}Covariance UpdateCorrector Equation
Auxilliaryzn=Hxn+vn\boldsymbol{z}_{n} = \mathbf{H} \boldsymbol{x}_n + \boldsymbol{v}_nMeasurement Equation
Rn=E{vnvnT}\mathbf{R}_n = E\{\boldsymbol{v}_n \boldsymbol{v}_n^T\}Measurement UncertaintyMeasurement Error
Qn=E{wnwnT}\mathbf{Q}_n = E\{\boldsymbol{w}_n \boldsymbol{w}_n^T\}Process Noise UncertaintyProcess Noise Error
Pn,n=E{enenT}=E{(xnβˆ’x^n,n)(xnβˆ’x^n,n)T}\mathbf{P}_{n, n}=E\left\{\boldsymbol{e}_{n} \boldsymbol{e}_{n}^{T}\right\}=E\left\{\left(\boldsymbol{x}_{n}-\hat{\boldsymbol{x}}_{n, n}\right)\left(\boldsymbol{x}_{n}-\hat{\boldsymbol{x}}_{n, n}\right)^{T}\right\}Estimation UncertaintyEstimation Error

Summary of notations:

TermNameAlternative Term
x\boldsymbol{x}State vector
z\boldsymbol{z}Output vectory\boldsymbol{y}
F\mathbf{F}State transition matrixΦ\mathbf{\Phi}, A\mathbf{A}
u\boldsymbol{u}Input variable
G\mathbf{G}Control matrixB\boldsymbol{B}
P\mathbf{P}Estimate uncertaintyΞ£\boldsymbol{\Sigma}
Q\mathbf{Q}Process noise uncertainty
R\mathbf{R}Measurement uncertainty
w\boldsymbol{w}Process noise vector
v\boldsymbol{v}Measurement noise vector
H\mathbf{H}Observation matrixC\mathbf{C}
K\mathbf{K}Kalman Gain
nnDiscrete time indexkk

Multidimensional Kalman Filter in Detail

A Kalman filter works by a two-phase process, including 5 main equations:

State extrapolation equation

The Kalman filter assumes that the true state of a system at time step nn evolved from the prior state at time step nβˆ’1n-1 is

xn=Fxnβˆ’1+Gun+wn \boldsymbol{x}_n = \mathbf{F} \boldsymbol{x}_{n-1} +\mathbf{G} \boldsymbol{u}_{n} + \boldsymbol{w}_n
  • xn\boldsymbol{x}_{n} : state vector

  • un\boldsymbol{u}_{n} : control variable or input variable - a measurable (deterministic) input to the system

  • wn\boldsymbol{w}_n : process noise or disturbance - an unmeasurable input that affects the state

  • F\mathbf{F} : state transition matrix - applies the effect of each system state parameter at time step nβˆ’1n-1 on the system state at time step nn

  • G\mathbf{G} : control matrix or input transition matrix (mapping control to state variables)

The state extrapolation equation

  • Predicts the next system state, based on the knowledge of the current state

  • Extrapolates the state vector from time step nβˆ’1n-1 to nn

  • Also called

    • Predictor Equation
    • Transition Equation
    • Prediction Equation
    • Dynamic Model
    • State Space Model
  • The general form in a matrix notation

    x^n,nβˆ’1=Fx^nβˆ’1,nβˆ’1+Gun \hat{\boldsymbol{x}}_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}_{n-1, n-1}+\mathbf{G} \boldsymbol{u}_{n}
    • x^n,nβˆ’1\hat{\boldsymbol{x}}_{n, n-1} : predicted system state vector at time step nn

    • x^nβˆ’1,nβˆ’1\hat{\boldsymbol{x}}_{n-1, n-1} : estimated system state vector at time step nβˆ’1n-1

    x^n,m\hat{\boldsymbol{x}}_{n, m} represents the estimate of x\boldsymbol{x} at time step nn given observation/measurements up to and including at time m≀nm \leq n

Example

Covariance extrapolation equation

The covariance extrapolation equation extrapolates the uncertainty in our state prediction.

Pn,nβˆ’1=FPnβˆ’1,nβˆ’1FT+Q \mathbf{P}_{n, n-1}=\mathbf{F} \mathbf{P}_{n-1, n-1} \mathbf{F}^{T}+\mathbf{Q}
  • Pnβˆ’1,nβˆ’1\mathbf{P}_{n-1, n-1} : uncertainty (covariance matrix) of the estimate at time step nβˆ’1n-1

    Pnβˆ’1,nβˆ’1=E{(xnβˆ’1,nβˆ’1βˆ’x^nβˆ’1,nβˆ’1)⏟=:en(xnβˆ’1,nβˆ’1βˆ’x^nβˆ’1,nβˆ’1)T}=E{enenT} \begin{aligned} \mathbf{P}_{n-1, n-1} &= E\{\underbrace{(\boldsymbol{x}_{n-1, n-1} - \hat{\boldsymbol{x}}_{n-1, n-1})}_{=: \boldsymbol{e}_n} (\boldsymbol{x}_{n-1, n-1} - \hat{\boldsymbol{x}}_{n-1, n-1}) ^T\} \\ & = E\{\boldsymbol{e}_n \boldsymbol{e}_n^T\} \end{aligned}
  • Pn,nβˆ’1\mathbf{P}_{n, n-1} : uncertainty (covariance matrix) of the prediction at time step nn

  • F\mathbf{F} : state transition matrix

  • Q\mathbf{Q} : process noise matrix

    Qn=E{wnwnT} \mathbf{Q}_n = E\{\boldsymbol{w}_n \boldsymbol{w}_n^T\}
    • wn\boldsymbol{w}_n : process noise vector
Derivation

At time step nn, the Kalman filter assumes

x_n=Fx_nβˆ’1+Gu_n+w_n \boldsymbol{x}\_n = \mathbf{F} \boldsymbol{x}\_{n-1} +\mathbf{G} \boldsymbol{u}\_{n} + \boldsymbol{w}\_n

The prediction of state is

x^_n,nβˆ’1=Fx^_nβˆ’1,nβˆ’1+Gu_n \hat{\boldsymbol{x}}\_{n, n-1}=\mathbf{F} \hat{\boldsymbol{x}}\_{n-1, n-1}+\mathbf{G} \boldsymbol{u}\_{n}

The difference between x_n\boldsymbol{x}\_n and x^_n,nβˆ’1\hat{\boldsymbol{x}}\_{n, n-1} is

x_nβˆ’x^_n,nβˆ’1=Fx_nβˆ’1+Gu_n+w_nβˆ’(Fx^_nβˆ’1,nβˆ’1+Gu_n)=F(x_nβˆ’1βˆ’x^_nβˆ’1,nβˆ’1)+w_n \begin{aligned} \boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1} &=\mathbf{F} \boldsymbol{x}\_{n-1}+\mathbf{G} \boldsymbol{u}\_{n}+\boldsymbol{w}\_{n}-\left(\mathbf{F} \hat{\boldsymbol{x}}\_{n-1, n-1}+\mathbf{G} \boldsymbol{u}\_{n}\right) \\\\ &=\mathbf{F}\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)+\boldsymbol{w}\_{n} \end{aligned}

The variance associate with the prediction x^_n,nβˆ’1\hat{\boldsymbol{x}}\_{n, n-1} of an unknow true state x_n\boldsymbol{x}\_n is

image-20220625180624325

Noting that the state estimation errors and process noise are uncorrelated:

E\left\\{\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right) \boldsymbol{w}\_{n}^{T}\right\\} = E\left\\{\boldsymbol{w}\_{n}\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)^{T}\right\\} = 0

Therefore

\begin{aligned} \mathbf{P}\_{n, n-1} &=\underbrace{E\left\\{\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)\left(\boldsymbol{x}\_{n-1}-\hat{\boldsymbol{x}}\_{n-1, n-1}\right)^{T}\right\\}}\_{=\mathbf{P}\_{n-1, n-1}} \mathbf{F}^{T}+\underbrace{E\left\\{w\_{n} w\_{n}^{T}\right\\}}\_{=\mathbf{Q}} \\\\ &=\mathbf{F} \mathbf{P}\_{n-1, n-1}\mathbf{F}^T+\mathbf{Q} \end{aligned}

Kalman Gain equation

The Kalman Gain is calculated so that it minimizes the covariance of the a posteriori state estimate.

Kn=Pn,nβˆ’1HT(HPn,nβˆ’1HT+Rn)βˆ’1 \mathbf{K}_{n}=\mathbf{P}_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}_{n, n-1} \mathbf{H}^{T}+\mathbf{R}_{n}\right)^{-1}
Derivation

Rearrange the covariance update equation

P_n,n=(Iβˆ’K_nH)P_n,nβˆ’1(Iβˆ’K_nH)T+K_nR_nK_nTP_n,n=(Iβˆ’K_nH)P_n,nβˆ’1(Iβˆ’(K_nH)T)+K_nR_nK_nT∣ I=ITP_n,n=(Iβˆ’K_nH)P_n,nβˆ’1(Iβˆ’HTK_nT)+K_nR_nK_nT∣ (AB)T=BTATP_n,n=(P_n,nβˆ’1βˆ’K_nHP_n,nβˆ’1)(Iβˆ’HTK_nT)+K_nR_nK_nTP_n,n=P_n,nβˆ’1βˆ’P_n,nβˆ’1HTK_nTβˆ’K_nHP_n,nβˆ’1+K_nHP_n,nβˆ’1HTK_nT+K_nR_nK_nT∣ ABAT+ACAT=A(B+C)ATP_n,n=P_n,nβˆ’1βˆ’P_n,nβˆ’1HTK_nTβˆ’K_nHP_n,nβˆ’1+K_n(HP_n,nβˆ’1HT+R_n)K_nT \begin{array}{l} \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}{\color{DodgerBlue} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}{\color{DodgerBlue}\left(\mathbf{I}-\left(\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \qquad | \text{ } \mathbf{I} = \mathbf{I}^T \\\\\\\\ \mathbf{P}\_{n, n}={\color{ForestGreen}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1}}{\color{DodgerBlue}\left(\mathbf{I}-\mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \qquad | \text{ } (\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T\\\\\\\\ \mathbf{P}\_{n, n}={\color{ForestGreen}\left(\mathbf{P}\_{n, n-1}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)}\left(\mathbf{I}-\mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}+\mathbf{K}\_{n} \mathbf{R}\_{n} \mathbf{K}\_{n}^{T}} \qquad | \text{ } \mathbf{AB}\mathbf{A}^T + \mathbf{AC}\mathbf{A}^T = \mathbf{A}(\mathbf{B} + \mathbf{C})\mathbf{A}^T \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\boldsymbol{\mathbf{R}}\_{n}\right) \mathbf{K}\_{n}^{T}} \end{array}

As the Kalman Filter is an optimal filter, we will seek a Kalman Gain that minimizes the estimate variance.

In order to minimize the estimate variance, we need to minimize the main diagonal (from the upper left to the lower right) of the covariance matrix P_n,n\mathbf{P}\_{n, n} .

The sum of the main diagonal of the square matrix is the trace of the matrix. Thus, we need to minimize tr(P_n,n)tr(\mathbf{P}\_{n, n}) . In order to find the conditions required to produce a minimum, we will differentiate tr(P_n,n)tr(\mathbf{P}\_{n, n}) w.r.t. K_n\mathbf{K}\_n and set the result to zero.

tr(P_n,n)=tr(P_n,nβˆ’1)βˆ’tr(P_n,nβˆ’1HTK_nT)βˆ’tr(K_nHP_n,nβˆ’1)+tr(K_n(HP_n,nβˆ’1HT+R_n)K_nT)∣tr(A)=tr(AT)tr(P_n,n)=tr(P_n,nβˆ’1)βˆ’2tr(K_nHP_n,nβˆ’1)+tr(K_n(HP_n,nβˆ’1HT+R_n)K_nT)ddK_ntr(P_n,n)=ddK_ntr(P_n,nβˆ’1)βˆ’ddK_n2tr(K_nHP_n,nβˆ’1)+dtr(K_n(HP_n,nβˆ’1HT+R_n)K_nT)dK_n=!0∣ddAtr(AB)=BT,ddAtr(ABAT)=2ABd(tr(P_n,n))dK_n=0βˆ’2(HP_n,nβˆ’1)T+2K_n(HP_n,nβˆ’1HT+R_n)=0(HP_n,nβˆ’1)T=K_n(HP_n,nβˆ’1HT+R_n)K_n=(HP_n,nβˆ’1)T(HP_n,nβˆ’1HT+R_n)βˆ’1∣(AB)T=BTATK_n=P_n,nβˆ’1THT(HP_n,nβˆ’1HT+R_n)βˆ’1∣ covariance matrix P symmetric (PT=P)K_n=P_n,nβˆ’1HT(HP_n,nβˆ’1HT+R_n)βˆ’1 \begin{array}{l} tr\left(\mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}}\right)=tr\left(\mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}-1}\right)-{\color{DarkOrange}tr\left(\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}\right)}\\\\ {\color{DarkOrange} -tr\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)} + tr\left(\mathbf{K}\_{\boldsymbol{n}}\left(\mathbf{H} \mathbf{P}\_{\boldsymbol{n}, \boldsymbol{n}-\mathbf{1}} \mathbf{H}^{\boldsymbol{T}}+\mathbf{R}\_{\boldsymbol{n}}\right) \mathbf{K}\_{n}^{\boldsymbol{T}}\right) \qquad | \text{} tr(\mathbf{A}) = tr(\mathbf{A}^T)\\\\\\\\ tr\left(\mathbf{P}\_{n, n}\right)=tr\left(\mathbf{P}\_{n, n-1}\right)-{\color{DarkOrange}2 tr\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)}\\\\ +tr\left(\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{\boldsymbol{T}}+\mathbf{R}\_{n}\right) \mathbf{K}\_{n}^{T}\right)\\\\\\\\ \frac{d}{d \mathbf{K}\_{n}}t r\left(\mathbf{P}\_{n, n}\right)={\color{DodgerBlue} \frac{d}{d \mathbf{K}\_{n}}t r\left(\mathbf{P}\_{n, n-1}\right)}-{\color{ForestGreen}\frac{d }{d \mathbf{K}\_{n}}2 t r\left(\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1}\right)} \\\\ +{\color{MediumOrchid}\frac{d tr(\mathbf{K}\_{n}(\mathbf{H}\mathbf{P}\_{n, n-1}\mathbf{H}^T + \mathbf{R}\_n)\mathbf{K}\_{n}^T)}{d\mathbf{K}\_{n}}} \overset{!}{=} 0 \quad \mid {\color{ForestGreen} \frac{d}{d \mathbf{A}}tr(\mathbf{A} \mathbf{B}) = \mathbf{B}^T},{\color{MediumOrchid} \frac{d}{d \mathbf{A}}tr(\mathbf{A} \mathbf{B} \mathbf{A}^T) = 2\mathbf{A} \mathbf{B}}\\\\\\\\ \frac{d\left(t r\left(\mathbf{P}\_{n, n}\right)\right)}{d \mathbf{K}\_{n}}={\color{DodgerBlue}0}-{\color{ForestGreen}2\left(\mathbf{H} \mathbf{P}\_{ n , n - 1 }\right)^{T}}\\\\ +{\color{MediumOrchid}2 \mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)}=0\\\\\\\\ {\color{ForestGreen}\left(\mathbf{H} \mathbf{P}\_{n, n-1}\right)^{T}}={\color{MediumOrchid}\mathbf{K}\_{n}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)} \\\\\\\\ \mathbf{K}\_{n}=\left(\mathbf{H} \mathbf{P}\_{n, n-1}\right)^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \quad \mid (\mathbf{AB})^T = \mathbf{B}^T \mathbf{A}^T \\\\\\\\ \mathbf{K}\_{n}=\mathbf{P}\_{n, n-1}^{T} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \quad | \text{ covariance matrix } \mathbf{P} \text{ symmetric } (\mathbf{P}^T = \mathbf{P})\\\\\\\\ \mathbf{K}\_{n}=\mathbf{P}\_{n, n-1} \mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1} \end{array}

Kalman Gain intuition

The Kalman gain tells how much I should refine my prediction (i.e., the a priori estimate) by given a measurement.

We show the intuition of Kalman Gain with a one-dimensional Kalman filter.

The one-dimensional Kalman Gain is

Kn=pn,nβˆ’1pn,nβˆ’1+rn∈[0,1] \boldsymbol{K}_{\boldsymbol{n}}=\frac{p_{n, n-1}}{p_{n, n-1}+r_{n}} \in [0, 1]
  • pn,nβˆ’1p_{n, n-1} : variance of the state prediction x^n,nβˆ’1\hat{x}_{n, n-1}
  • rnr_n : variance of the measurement znz_n

(Derivation see here)

Let’s rewrite the (one-dimensional) state update equation:

x^n,n=x^n,nβˆ’1+Kn(znβˆ’x^n,nβˆ’1)=(1βˆ’Kn)x^n,nβˆ’1+Knzn \hat{x}_{n, n}=\hat{x}_{n, n-1}+K_{n}\left(z_{n}-\hat{x}_{n, n-1}\right)=\left(1-K_{n}\right) \hat{x}_{n, n-1}+K_{n} z_{n}

The Kalman Gain KnK_n is the weight that we give to the measurement. And (1βˆ’Kn)(1 - K_n) is the weight that we give to the state prediction.

  • High Kalman Gain

    A low measurement uncertainty (small rnr_n) relative to the prediction uncertainty would result in a high Kalman Gain (close to 1). The new estimate would be close to the measurement.

    πŸ’‘ Intuition

    small rn→r_n \rightarrow accurate measurements →\rightarrow place more weight on the measurements and thus conform to them

    kalman_filter-high_Kalman_Gain.drawio
  • Low Kalman Gain

    A high measurement uncertainty (large rnr_n) relative to the prediction uncertainty would result in a low Kalman Gain (close to 0). The new estimate would be close to the prediction.

    πŸ’‘ Intuition

    large rn→r_n \rightarrow measurements are not accurate →\rightarrow place more weight on the prediction and trust them more

    kalman_filter-low_Kalman_Gain.drawio

State update equation

The state update equation updates/refines/corrects the state prediction with measurements.

x^n,n=x^n,nβˆ’1+Kn(znβˆ’Hx^n,nβˆ’1)⏟innovation \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}}=\hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}+\mathbf{K}_{\boldsymbol{n}}\underbrace{\left(\boldsymbol{z}_{n}-\mathbf{H} \hat{\boldsymbol{x}}_{\boldsymbol{n}, \boldsymbol{n}-1}\right)}_{\text{innovation}}
  • x^n,n\hat{\boldsymbol{x}}_{n, n} : estimated system state vector at time step nn

  • x^n,nβˆ’1\hat{\boldsymbol{x}}_{n, n-1} : predicted system state vector at time step nn

  • Kn\mathbf{K}_{\boldsymbol{n}} : Kalman Gain

  • H\mathbf{H} : observation matrix

  • zn\boldsymbol{z}_{n} : measurement at time step nn

    zn=Hxn+vn \boldsymbol{z}_{n} = \mathbf{H} \boldsymbol{x}_n + \boldsymbol{v}_n
    • xn\boldsymbol{x}_n : true system state (hidden state)

    • vn\boldsymbol{v}_n : measurement noise

      β†’\rightarrow Measurement uncertainty Rn\mathbf{R}_n is given by

      Rn=E{vnvnT} \mathbf{R}_n = E\{\boldsymbol{v}_n \boldsymbol{v}_n^T\}

Covariance update equation

The covariance update equation updates the uncertainty of state estimate on the base of covariance prediction

Pn,n=(Iβˆ’KnH)Pn,nβˆ’1(Iβˆ’KnH)T+KnRnKnT \mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1}\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right)^{T}+\mathbf{K}_{n} \mathbf{R}_{n} \mathbf{K}_{n}^{T}
Derivation

According to state update equation:

x^_n,n=x^_n,nβˆ’1+K_n(z_nβˆ’Hx^_n,nβˆ’1)=x^_n,nβˆ’1+K_n(Hx_n+v_nβˆ’Hx^_n,nβˆ’1) \begin{aligned} \hat{\boldsymbol{x}}\_{n, n} &= \hat{\boldsymbol{x}}\_{n, n-1}+\mathbf{K}\_{n}\left(\boldsymbol{z}\_{n}-\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\right) \\\\\\\\ &= \hat{\boldsymbol{x}}\_{n, n-1}+\mathbf{K}\_{n}\left(\mathbf{H} \boldsymbol{x}\_n + \boldsymbol{v}\_n-\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\right) \end{aligned}

The estimation error between the true (hidden) state x_n\boldsymbol{x}\_n and estimate x^_n,n\hat{\boldsymbol{x}}\_{n, n} is:

e_n=x_nβˆ’x^_n,n=x_nβˆ’x^_n,nβˆ’1βˆ’K_nHx_nβˆ’K_nv_n+K_nHx^_n,nβˆ’1=x_nβˆ’x^_n,nβˆ’1βˆ’K_nH(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n=(Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n \begin{aligned} \boldsymbol{e}\_n &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n} \\\\ &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1} - \mathbf{K}\_{n}\mathbf{H}\boldsymbol{x}\_n - \mathbf{K}\_{n}\boldsymbol{v}\_n + \mathbf{K}\_{n}\mathbf{H} \hat{\boldsymbol{x}}\_{n, n-1}\\\\ &= \boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1} - \mathbf{K}\_{n}\mathbf{H}(\boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1}) - \mathbf{K}\_{n}\boldsymbol{v}\_n \\\\ &= (\mathbf{I} - \mathbf{K}\_{n}\mathbf{H})(\boldsymbol{x}\_n - \hat{\boldsymbol{x}}\_{n, n-1}) - \mathbf{K}\_{n}\boldsymbol{v}\_n \end{aligned}

Estimate Uncertainty

P_n,n=E(e_ne_nT)=E((x_nβˆ’x^_n,n)(x_nβˆ’x^_n,n)T)∣ Plug in e_nP_n,n=E(((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n)Γ—((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n)T)P_n,n=E(((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n)Γ—(((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1))Tβˆ’(K_nv_n)T))∣ (AB)T=BTATP_n,n=E(((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)βˆ’K_nv_n)Γ—((x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)Tβˆ’(K_nv_n)T))P_n,n=E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)Tβˆ’(Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(K_nv_n)Tβˆ’K_nv_n(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T+K_nv_n(K_nv_n)T)∣ E(XΒ±Y)=E(X)Β±E(Y)P_n,n=E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T)βˆ’E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(K_nv_n)T)βˆ’E(K_nv_n(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T)+E(K_nv_n(K_nv_n)T) \begin{array}{l} \boldsymbol{\mathbf{P}}\_{n, n}=E\left(\boldsymbol{e}\_{n} \boldsymbol{e}\_{n}^{T}\right)=E\left(\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n}\right)^{T}\right) \qquad | \text{ Plug in } \boldsymbol{e}\_n\\\\\\\\ \boldsymbol{\mathbf{P}}\_{n, n}=E\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right) \right.\\\\ \left.\times\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\left(\mathbf{I}-\boldsymbol{\mathbf{K}}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1}\right)-\boldsymbol{\mathbf{K}}\_{n} v\_{n}\right) \right.\\\\ \left.\times\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\right)^{T}-\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\right) \qquad | \text{ }(\mathbf{A} \mathbf{B})^{T}=\mathbf{B}^{T} \mathbf{A}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)-\mathbf{K}\_{n} v\_{n}\right) \right. \\\\ \left.\times\left(\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}-\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)\right)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{\boldsymbol{x}}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right.\\\\ -\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\\\\ -\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\\\\ \left.+\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right) \qquad | \text{ } E(X \pm Y)=E(X) \pm E(Y)\\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ -\color{red}{E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)}\\\\ -\color{red}{E\left(\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)}\\\\ +E\left(\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right) \end{array}

x_nβˆ’x^_n,nβˆ’1\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1} is the error of the prior estimate in relation to the true value. It is uncorrelated with the current measurement noise v_n\boldsymbol{v}\_n. The expectation value of the product of two independent variables is zero.

E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(K_nv_n)T)=0E(K_nv_n(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T)=0 \begin{aligned} \color{red}{E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)} = 0 \\\\ \color{red}{E\left(\mathbf{K}\_{n} v\_{n}\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)} = 0 \end{aligned}

Therefore

P_n,n=E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T)+E(K_nv_n(K_nv_n)T)∣ (AB)T=BTATP_n,n=E((Iβˆ’K_nH)(x_nβˆ’x^_n,nβˆ’1)(x_nβˆ’x^_n,nβˆ’1)T(Iβˆ’K_nH)T)+E(K_nv_nv_nTK_nT)∣ E(aX)=aE(X)P_n,n=(Iβˆ’K_nH)E((x_nβˆ’x^_n,nβˆ’1)(x_nβˆ’x^_n,nβˆ’1)T)⏟_=P_n,nβˆ’1(Iβˆ’K_nH)T+K_nE(v_nv_nT)⏟_=R_nK_nTP_n,n=(Iβˆ’K_nH)P_n,nβˆ’1(Iβˆ’K_nH)T+K_nR_nK_nT \begin{array}{l} \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ +{\color{DodgerBlue}{E\left(\mathbf{K}\_{n} v\_{n}\left(\mathbf{K}\_{n} v\_{n}\right)^{T}\right)}} \qquad | \text{ }(\mathbf{A} \mathbf{B})^{T}=\mathbf{B}^{T} \mathbf{A}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=E\left(\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T}\right)\\\\ +{\color{DodgerBlue}{E\left(\mathbf{K}\_{n} v\_{n} v\_{n}^T \mathbf{K}\_{n}^T\right)}} \qquad | \text{ } E(a X)=a E(X) \\\\\\\\ \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) {\color{ForestGreen}\underbrace{{E\left(\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)\left(\boldsymbol{x}\_{n}-\hat{x}\_{n, n-1}\right)^{T}\right)}}\_{=\mathbf{P}\_{n, n-1}}}\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}{\color{DodgerBlue}{\underbrace{E\left( v\_{n} v\_{n}^T \right)}\_{=\mathbf{R}\_n}}} \mathbf{K}\_{n}^T \\\\\\\\ \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) {\color{ForestGreen}\mathbf{P}\_{n, n-1}} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}{\color{DodgerBlue}\mathbf{R}\_n} \mathbf{K}\_{n}^T \end{array}

In many textbook you can see a simplified form:

Pn,n=(Iβˆ’KnH)Pn,nβˆ’1 \mathbf{P}_{n, n}=\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) \mathbf{P}_{n, n-1}

This equation is elegant and easier to remember and in many cases it performs well.

However, even the smallest error in computing the Kalman Gain (due to round off) can lead to huge computation errors. The subtraction (Iβˆ’KnH)\left(\mathbf{I}-\mathbf{K}_{n} \mathbf{H}\right) can lead to nonsymmetric matrices due to floating-point errors. Therefore this equation is numerically unstable!

Derivation of a simplified form of the Covariance Update Equation
P_n,n=(Iβˆ’K_nH)P_n,nβˆ’1(Iβˆ’K_nH)T+K_nR_nK_nTP_n,n=P_n,nβˆ’1βˆ’P_n,nβˆ’1HTK_nTβˆ’K_nHP_n,nβˆ’1+K_n(HP_n,nβˆ’1HT+R_n)K_nT∣ Substitute Kalman GainP_n,n=P_n,nβˆ’1βˆ’P_n,nβˆ’1HTK_nTβˆ’K_nHP_n,nβˆ’1+P_n,nβˆ’1HT(HP_n,nβˆ’1HT+R_n)βˆ’1(HP_n,nβˆ’1HT+R_n)⏟_=1K_nTP_n,n=P_n,nβˆ’1βˆ’P_n,nβˆ’1HTK_nTβˆ’K_nHP_n,nβˆ’1+P_n,nβˆ’1HTK_nTP_n,n=P_n,nβˆ’1βˆ’K_nHP_n,nβˆ’1P_n,n=(Iβˆ’K_nH)P_n,nβˆ’1 \begin{array}{l} \mathbf{P}\_{n, n} = \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1} \left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right)^{T} +\mathbf{K}\_{n}\mathbf{R}\_n \mathbf{K}\_{n}^T \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \boldsymbol{\mathbf{H}}^{T} \mathbf{K}\_{n}^{T}-\boldsymbol{\mathbf{K}}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{K}\_{n}}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right) \mathbf{K}\_{n}^{T} \qquad | \text{ Substitute Kalman Gain}\\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +{\color{MediumOrchid}\mathbf{P}\_{n, n-1}} \underbrace{{\color{MediumOrchid}{\mathbf{H}^{T}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)^{-1}}}\left(\mathbf{H} \mathbf{P}\_{n, n-1} \mathbf{H}^{T}+\mathbf{R}\_{n}\right)}\_{=1} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\ +\mathbf{P}\_{n, n-1} \mathbf{H}^{T} \mathbf{K}\_{n}^{T} \\\\\\\\ \mathbf{P}\_{n, n}=\mathbf{P}\_{n, n-1}-\mathbf{K}\_{n} \mathbf{H} \mathbf{P}\_{n, n-1} \\\\\\\\ \mathbf{P}\_{n, n}=\left(\mathbf{I}-\mathbf{K}\_{n} \mathbf{H}\right) \mathbf{P}\_{n, n-1} \end{array}

Reference