Sampling

Reapproximation von Dichten

Approximate original continuous density with discrete Dirac Mixture

f(xβ€Ύ)=βˆ‘i=1Lwiβ‹…Ξ΄(xβ€Ύβˆ’x^β€Ύi) f(\underline{x})=\sum_{i=1}^{L} w_{i} \cdot \delta\left(\underline{x}-\underline{\hat{x}}_{i}\right)
  • Weights wi>0,βˆ‘i=1Lwi=1w_{i}>0, \displaystyle \sum_{i=1}^{L} w_{i}=1
  • xβ€Ύi\underline{x}_i: locations / samples

In univariate case (1D), compare cumulative distribution functions (CDFs) F~(x),F(x)\tilde{F}(x), F(x) using CramΓ©r–von Mises distance:

D(x^β€Ύ)=∫R(F~(x)βˆ’F(x,x^β€Ύ))2 dx D(\underline{\hat{x}})=\int_{\mathbb{R}}(\tilde{F}(x)-F\left(x, \underline{\hat{x}})\right)^{2} \mathrm{~d} x

F(x,x^β€Ύ)F(x, \underline{\hat{x}}) : Dirac mixture cumulative distribution

F(x,x^β€Ύ)=βˆ‘i=1LwiH(xβˆ’x^i) with H(x)=βˆ«βˆ’βˆžxΞ΄(t)dt={0x<012x=01x>0 F(x, \underline{\hat{x}})=\sum_{i=1}^{L} w_{i} \mathrm{H}\left(x-\hat{x}_{i}\right) \text { with } \mathrm{H}(x)=\int_{-\infty}^{x} \delta(t) \mathrm{d} t= \begin{cases}0 & x<0 \\ \frac{1}{2} & x=0 \\ 1 & x>0\end{cases}

with the Dirac position

x^β€Ύ=[x^1,x^2,…,x^L]⊀ \underline{\hat{x}}=\left[\hat{x}_{1}, \hat{x}_{2}, \ldots, \hat{x}_{L}\right]^{\top}

We minimize the CramΓ©r–von Mises distance D(x^β€Ύ)D(\underline{\hat{x}}) with Newton’s method.

Generalization of concept of CDF

Localized Cumulative Distribution (LCD)

F(mβ€Ύ,b)=∫RNf(xβ€Ύ)K(xβ€Ύβˆ’mβ€Ύ,b)dxβ€Ύ F(\underline{m}, b)=\int_{\mathbb{R}^{N}} f(\underline{x}) K(\underline{x}-\underline{m}, b) \mathrm{d} \underline{x}
  • K(β‹…,β‹…)K(\cdot, \cdot): Kernel

    K(xβ€Ύβˆ’mβ€Ύ,b)=∏k=1Nexp⁑(βˆ’12(xkβˆ’mk)2b2) K(\underline{x}-\underline{m}, b)=\prod_{k=1}^{N} \exp \left(-\frac{1}{2} \frac{\left(x_{k}-m_{k}\right)^{2}}{b^{2}}\right)
  • mβ€Ύ\underline{m}: Kernel location

  • bβ€Ύ\underline{b}: Kernel width

Properties of LCD:

  • Symmetric
  • Unique
  • Multivariate

Generalized CramΓ©r–von Mises Distance (GCvD)

D=∫R+w(b)∫RN(F~(mβ€Ύ,b)βˆ’F(mβ€Ύ,b))2 dmβ€Ύ db D=\int_{\mathbb{R}_{+}} w(b) \int_{\mathbb{R}^{N}}(\tilde{F}(\underline{m}, b)-F(\underline{m}, b))^{2} \mathrm{~d} \underline{m} \mathrm{~d} b
  • F~(mβ€Ύ,b)\tilde{F}(\underline{m}, b): LCD of continuous density
  • F(mβ€Ύ,b)F(\underline{m}, b): LCD of Dirac mixture

Minimization of GCvD: Quasi-Newton method (L-BFGS)

Projected Cumulative Distribution (PCD)

Use reapproximation methods for univariate case in multivariate case.

Radon Transform

Represent general NN-dimensional probability density functions via the set of all one-dimensional projections

  • Linear projection of random vector xβ€ΎβˆˆRN\underline{\boldsymbol{x}} \in \mathbb{R}^{N} to to scalar random variable r∈R\boldsymbol{r} \in \mathbb{R} onto line described by unit vector uβ€ΎβˆˆSNβˆ’1\underline{u} \in \mathbb{S}^{N-1}

    r=uβ€ΎβŠ€xβ€Ύ \boldsymbol{r} = \underline{u}^\top \underline{\boldsymbol{x}}
  • Given probability density function f(xβ€Ύ)f(\underline{x}) of random vector xβ€Ύ\underline{\boldsymbol{x}}, density fr(r∣uβ€Ύ)f_r(r \mid \underline{u}) is Radon transfrom of f(xβ€Ύ)f(\underline{x}) for all uβ€ΎβˆˆSNβˆ’1\underline{u} \in \mathbb{S}^{N-1}

    fr(r∣uβ€Ύ)=∫RNf(tβ€Ύ)Ξ΄(rβˆ’uβ€ΎβŠ€tβ€Ύ)dtβ€Ύ f_{r}(r \mid \underline{u})=\int_{\mathbb{R}^{N}} f(\underline{t}) \delta\left(r-\underline{u}^{\top} \underline{t}\right) \mathrm{d} \underline{t}

Representing PDFs by all one-dimensional projections

  1. Represent the two densities f~(xβ€Ύ)\tilde{f}(\underline{x}) and f(xβ€Ύ)f(\underline{x}) by their Radon transforms f~(r∣uβ€Ύ)\tilde{f}(r \mid \underline{u}) and f(r∣u)f(r \mid u)

  2. Compare the sets of projections f~(r∣uβ€Ύ)\tilde{f}(r \mid \underline{u}) and f(r∣u)f(r \mid u) for every uβ€ΎβˆˆSNβˆ’1\underline{u} \in \mathbb{S}^{N-1}. Resulting distance is

    D1(uβ€Ύ)=D(f~(r∣uβ€Ύ),f(r∣uβ€Ύ)) D_{1}(\underline{u})=D(\tilde{f}(r \mid \underline{u}), f(r \mid \underline{u}))
  3. Integrate these one-dimensional distance measures D1(uβ€Ύ)D_1(\underline{u}) over all unit vectors uβ€ΎβˆˆSNβˆ’1\underline{u} \in \mathbb{S}^{N-1} to get the multivariate distance measure D(f~(xβ€Ύ),f(xβ€Ύ))D(\tilde{f}(\underline{x}), f(\underline{x})). Minimize via univariate Newton updates.

Üb A13.2
ζˆͺ屏2022-08-14 17.37.55

PrΓ€diktion

πŸ’‘Update Sample Positionen. Gewichte bleiben gleich.

  1. fke(xβ€Ύk)f_{k}^{e}\left(\underline{x}_{k}\right) durch Dirac Mixture darstellen

    fke(xβ€Ύk)=βˆ‘i=1Lwke,iβ‹…Ξ΄(xβ€Ύkβˆ’x^β€Ύke,i)wke,i=1L,i∈{1,…,L} f_{k}^{e}\left(\underline{x}_{k}\right)=\sum_{i=1}^{L} w_{k}^{e, i} \cdot \delta\left(\underline{x}_{k}-\underline{\hat{x}}_{k}^{e, i}\right) \qquad w_{k}^{e, i}=\frac{1}{L}, i \in\left\{1, \ldots, L\right\}
  2. Ziehe Samples zum Zeitpunkt k+1k+1

    x^β€Ύk+1p,i∼f(xβ€Ύk+1∣x^ke,i) \underline{\hat{x}}_{k+1}^{p, i} \sim f\left(\underline{x}_{k+1} \mid \hat{x}_{k}^{e, i}\right)

    Gewichte bleiben gleich

    wk+1p,i=wke,i w_{k+1}^{p, i} = w_{k}^{e, i}
  3. fk+1p(xβ€Ύk)f_{k+1}^{p}\left(\underline{x}_{k}\right) durch Dirac Mixture darstellen

    fk+1p(xβ€Ύk+1)=βˆ‘i=1Lwk+1p,iΞ΄(xβ€Ύk+1βˆ’x^β€Ύk+1p,i) f_{k+1}^{p}\left(\underline{x}_{k+1}\right)=\sum_{i=1}^{L} w_{k+1}^{p, i} \delta\left(\underline{x}_{k+1}-\underline{\hat{x}}_{k+1}^{p, i}\right)

Filterung

πŸ’‘Update Gewichte. Sample Positionen bleiben gleich.

fke(xβ€Ύk)∝f(yβ€Ύk∣xβ€Ύk)β‹…fkp(xβ€Ύk)=f(yβ€Ύk∣xβ€Ύk)β‹…βˆ‘i=1Lwkp,iβ‹…Ξ΄(xβ€Ύkβˆ’x^β€Ύkp,i)=βˆ‘i=1Lwkp,iβ‹…f(yβ€Ύk∣xβ€Ύ^kp,i)⏟∝wke,iβ‹…Ξ΄(xβ€Ύkβˆ’x^β€Ύkp,i⏟x^β€Ύke,i) \begin{aligned} f_{k}^{e}\left(\underline{x}_{k}\right) &\propto f\left(\underline{y}_{k} \mid \underline{x}_{k}\right) \cdot f_{k}^{p}\left(\underline{x}_{k}\right)\\ &=f\left(\underline{y}_{k} \mid \underline{x}_{k}\right) \cdot \sum_{i=1}^{L} w_{k}^{p, i} \cdot \delta\left(\underline{x}_{k}-\underline{\hat{x}}_{k}^{p, i}\right)\\ &=\sum_{i=1}^{L} \underbrace{w_{k}^{p, i} \cdot f\left(\underline{y}_{k} \mid \hat{\underline{x}}_{k}^{p, i}\right)}_{\propto w_{k}^{e, i}} \cdot \delta(\underline{x}_{k}-\underbrace{\underline{\hat{x}}_{k}^{p, i}}_{\underline{\hat{x}}_{k}^{e, i}}) \end{aligned}
  1. Positionen bleiben gleich

    x^β€Ύke,i=x^β€Ύkp,i \underline{\hat{x}}_{k}^{e, i} = \underline{\hat{x}}_{k}^{p, i}
  2. Gewichte adaptieren

    wke,i∝wkp,iβ‹…f(yβ€Ύk∣xβ€Ύ^kp,i) w_{k}^{e, i} \propto w_{k}^{p, i} \cdot f\left(\underline{y}_{k} \mid \hat{\underline{x}}_{k}^{p, i}\right)

    und Normalisieren

    wke,i:=wke,iβˆ‘iwke,i w_{k}^{e, i}:=\frac{w_{k}^{e, i}}{\displaystyle \sum_{i} w_{k}^{e,i}}

Problem

  • Varianz der Samples erhΓΆht sich mit Filterschritten

  • Partikel sterben aus β†’\rightarrow Degenerierung des Filters

  • Aussterben schneller, je genauer die Messung, da Likelihood schmaler (Paradox!)

Resampling

  • Approximation der gewichteter Samples durch ungewichtete

    fke(xβ€Ύk)=βˆ‘i=1Lwke,iβ‹…Ξ΄(xβ€Ύkβˆ’x^β€Ύke,i)β‰ˆβˆ‘i=1L1LΞ΄(xβ€Ύkβˆ’x^β€Ύke,i) f_{k}^{e}\left(\underline{x}_{k}\right)=\sum_{i=1}^{L} w_{k}^{e, i} \cdot \delta\left(\underline{x}_{k}-\underline{\hat{x}}_{k}^{e, i}\right) \approx \sum_{i=1}^{L} \frac{1}{L} \delta\left(\underline{x}_{k}-\underline{\hat{x}}_{k}^{e, i}\right)
  • Gegeben: LL Partikel mit Gewichten wiw_i

  • Gesucht: LL Partikel mit Geweichte 1L\frac{1}{L} (gleichgewichtet)

Sequential Importance Sampling

  1. fke(xβ€Ύk)=f(xβ€Ύk∣yβ€Ύ1:k)f_{k}^{e}\left(\underline{x}_{k}\right)=f\left(\underline{x}_{k} \mid \underline{y}_{1: k}\right) auf xβ€Ύ1:kβˆ’1\underline{x}_{1: k-1} marginalisieren

    fke(xβ€Ύk)=f(xβ€Ύk∣yβ€Ύ1:k)=∫RNβ‹―βˆ«RNf(xβ€Ύ1:k∣yβ€Ύ1:k)dxβ€Ύ1:kβˆ’1 f_{k}^{e}\left(\underline{x}_{k}\right)=f\left(\underline{x}_{k} \mid \underline{y}_{1: k}\right)=\int_{\mathbb{R}^{N}} \cdots \int_{\mathbb{R}^{N}} f\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right) d \underline{x}_{1: k-1}
  2. Importance Sampling fΓΌr f(xβ€Ύk,xβ€Ύkβˆ’1∣yβ€Ύ1:k)f(\underline{x}_k, \underline{x}_{k-1} \mid \underline{y}_{1:k})

    fke(xβ€Ύk)=∫RNβ‹―βˆ«RNf(xβ€Ύ1:k∣yβ€Ύ1:k)p(xβ€Ύ1:k∣yβ€Ύ1:k)⏟=:wke,ip(xβ€Ύ1:k∣yβ€Ύ1:k)dxβ€Ύ1:kβˆ’1 f_{k}^{e}\left(\underline{x}_{k}\right) = \int_{\mathbb{R}^{N}} \cdots \int_{\mathbb{R}^{N}} \underbrace{\frac{f\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right)}{p\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right)}}_{=: w_k^{e, i}} p\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right) d \underline{x}_{1: k-1}
  3. f(xβ€Ύ1:k∣yβ€Ύ1:k)p(xβ€Ύ1:k∣yβ€Ύ1:k)\frac{f\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right)}{p\left(\underline{x}_{1: k} \mid \underline{y}_{1 : k}\right)} umschreiben

    • ZΓ€hler

      f(xβ€Ύ1:k∣yβ€Ύ1:k)∝f(yβ€Ύk∣xβ€Ύ1:k,yβ€Ύ1:kβˆ’1)β‹…f(xβ€Ύ1:k∣yβ€Ύ1:kβˆ’1)=f(yβ€Ύk∣xβ€Ύk)β‹…f(xβ€Ύk∣xβ€Ύ1:kβˆ’1,yβ€Ύ1:kβˆ’1)β‹…f(xβ€Ύ1:kβˆ’1∣yβ€Ύ1:kβˆ’1)=f(yβ€Ύk∣xβ€Ύk)β‹…f(xβ€Ύk∣xβ€Ύkβˆ’1)β‹…f(xβ€Ύ1:kβˆ’1∣yβ€Ύ1:kβ‹…1) \begin{aligned} f\left(\underline{x}_{1: k} \mid \underline{y}_{1: k}\right) &\propto f\left(\underline{y}_{k} \mid \underline{x}_{1: k}, \underline{y}_{1: k - 1}\right) \cdot f\left(\underline{x}_{1: k} \mid \underline{y}_{1: k-1}\right)\\ &=f\left(\underline{y}_{k} \mid \underline{x}_{k}\right) \cdot f\left(\underline{x}_{k} \mid \underline{x}_{1:k-1}, \underline{y}_{1:k-1}\right) \cdot f\left(\underline{x}_{1:k-1} \mid \underline{y}_{1: k-1}\right)\\ &=f\left(\underline{y}_{k} \mid \underline{x}_{k}\right) \cdot f\left(\underline{x}_{k} \mid \underline{x}_{k-1}\right) \cdot f\left(\underline{x}_{1: k-1} \mid \underline{y}_{1: k \cdot 1}\right) \end{aligned}
    • Nenner

      p(xβ€Ύ1:k∣yβ€Ύ1:k)=p(xβ€Ύk∣xβ€Ύ1:kβˆ’1,yβ€Ύ1:k)β‹…p(xβ€Ύ1:kβˆ’1∣yβ€Ύ1:kβˆ’1) p\left(\underline{x}_{1: k} \mid \underline{y}_{1: k}\right)=p\left(\underline{x}_{k} \mid \underline{x}_{1: k - 1}, \underline{y}_{1: k}\right) \cdot p\left(\underline{x}_{1: k -1} \mid \underline{y}_{1: k - 1}\right)
  4. Einsetzen, wke,iw_k^{e, i} in Rekursiven Form schreiben

    wke,i=f(x^β€Ύ1:k∣yβ€Ύ1:k)p(x^β€Ύ1:k∣yβ€Ύ1:k)∝f(yβ€Ύk∣xβ€Ύki)β‹…f(xβ€Ύki∣xβ€Ύkβˆ’1i)p(xβ€Ύki∣xβ€Ύ1:kβˆ’1i,yβ€Ύ1:k)β‹…f(xβ€Ύ1:kβˆ’1i∣yβ€Ύ1:kβ‹…1)p(xβ€Ύ1:kβˆ’1i∣yβ€Ύ1:kβˆ’1)⏟=wkβˆ’1e,i w_k^{e, i} = \frac{f\left(\underline{\hat{x}}_{1: k} \mid \underline{y}_{1 : k}\right)}{p\left(\underline{\hat{x}}_{1: k} \mid \underline{y}_{1 : k}\right)} \propto \frac{f\left(\underline{y}_{k} \mid \underline{x}_{k}^i\right) \cdot f\left(\underline{x}_{k}^i\mid \underline{x}_{k-1}^i\right)}{p\left(\underline{x}_{k}^i \mid \underline{x}_{1: k - 1}^i, \underline{y}_{1: k}\right)} \cdot \underbrace{\frac{f\left(\underline{x}_{1: k-1}^i \mid \underline{y}_{1: k \cdot 1}\right)}{p\left(\underline{x}_{1: k -1}^i \mid \underline{y}_{1: k - 1}\right)}}_{=w_{k-1}^{e, i}}

    und Normalisieren.

Spezielle Proposals

  • Standard Proposal

    p(xβ€Ύk∣xβ€Ύkβˆ’1,yβ€Ύk)=!f(xβ€Ύk∣xβ€Ύkβˆ’1) p\left(\underline{x}_{k} \mid \underline{x}_{k-1}, \underline{y}_{k}\right) \stackrel{!}{=} f\left(\underline{x}_{k} \mid \underline{x}_{k-1}\right)

    Dann ist

    wke,i∝f(yβ€Ύk∣xβ€Ύ^ki)β‹…f(xβ€Ύ^ki∣xβ€Ύ^kβˆ’1i)p(x^β€Ύki∣xβ€Ύ^kβˆ’1i,yβ€Ύk)β‹…wkβˆ’1e,i=f(yβ€Ύk∣xβ€Ύ^ki)β‹…wkβˆ’1e,i w_{k}^{e, i} \propto \frac{f\left(\underline{y}_{k} \mid \hat{\underline{x}}_{k}^{i}\right) \cdot f\left(\hat{\underline{x}}_{k}^{i} \mid \hat{\underline{x}}_{k-1}^{i}\right)}{p\left(\underline{\hat{x}}_{k}^{i} \mid \hat{\underline{x}}_{k-1}^{i}, \underline{y}_k\right)} \cdot w_{k-1}^{e, i}=f\left(\underline{y}_{k} \mid \hat{\underline{x}}_{k}^{i}\right) \cdot w_{k - 1}^{e, i}

    Sehr einfach aber keine verbesserte Performance.

  • Optimales Proposal

    p(xβ€Ύk∣xβ€Ύkβˆ’1,yβ€Ύk)=f(xβ€Ύk∣xβ€Ύkβˆ’1,yβ€Ύk)∝f(yβ€Ύk∣xβ€Ύk)β‹…f(xβ€Ύk∣xβ€Ύkβˆ’1) \begin{aligned} p\left(\underline{x}_{k} \mid \underline{x}_{k-1}, \underline{y}_{k}\right) &=f\left(\underline{x}_{k} \mid \underline{x}_{k-1}, \underline{y}_{k}\right) \\ & \propto f\left(\underline{y}_{k} \mid \underline{x}_{k}\right) \cdot f\left(\underline{x}_{k} \mid \underline{x}_{k-1}\right) \end{aligned}

    Dann ist

    wke,i=wkβˆ’1e,i w_k^{e, i} = w_{k-1}^{e, i}

    Minimierte Varianz der Gewichte aber nur in SpezialfΓ€llen verwendbar.