Reapproximation von Dichten

Reapproximation von Dichten

4 cases

截屏2022-08-10 21.58.40

Examples for reapproximation

  • Continuous → continuous: Gaussian mixture reduction

  • Continuous → discrete: deterministic sampling, i.e., replacing a continuous density with Dirac mixture

  • Discrete → continuous: density estimation, i.e., finding a continuous density representing a set of given samples

  • Discrete → discrete: Dirac mixture reduction

Challenge: Three cases involving discrete densities

  • Continuous → continuous case: Use standard distance measures, e.g. integral squared distance (ISD)

  • Discrete densities prohibit use of standard distance measures

Here we focus on continuousdiscrete Reapproximation

  • Given: Continuous density $\tilde{f}(\underline{x})$

  • Deterministic sampling, i.e., approximation with Dirac mixture

  • Definition of Dirac mixture with $L$ components

    $$ f(\underline{x})=\sum_{i=1}^{L} w_{i} \cdot \delta\left(\underline{x}-\underline{\hat{x}}_{i}\right) $$
    • Weights $w_{i}>0, \displaystyle \sum_{i=1}^{L} w_{i}=1$
    • $\underline{x}_i$: locations / samples
  • 🎯 Goal: Systematic approximation of given continuous density

  • Application examples

    • Mapping of random variables through nonlinear functions

    • Sample-based fusion and estimation (UKF)

Univariate Case (1D)

Synthesis

Instead of comparing densities $\tilde{f}(x), f(x)$, we compare cumulative distribution functions (CDFs) $\tilde{F}(x), F(x)$

CDF of $f(x)$:

$$ F(x)=P(\boldsymbol{x} \leq x)=\int_{-\infty}^{x} f(u) \mathrm{d} u $$
  • This definition is unique, as other definition $\bar{F}(x)=P(\boldsymbol{x}>x)$ is dependent

    $$ \begin{aligned} \bar{F}(x)=&P(\boldsymbol{x}>x) \\ &=\int_{x}^{\infty} f(u) d u \\ &=1-\int_{-\infty}^{x} f(u) d u \\ &=1-P(\boldsymbol{x} \leq x) \\ &=1-F(x) \end{aligned} $$
  • Example

    Dirac mixture density

    $$ f(x, \underline{\hat{x}})=\sum_{i=1}^{L} w_{i} \delta\left(x-\hat{x}_{i}\right) $$

    Dirac mixture cumulative distribution

    $$ F(x, \underline{\hat{x}})=\sum_{i=1}^{L} w_{i} \mathrm{H}\left(x-\hat{x}_{i}\right) \text { with } \mathrm{H}(x)=\int_{-\infty}^{x} \delta(t) \mathrm{d} t= \begin{cases}0 & x<0 \\ \frac{1}{2} & x=0 \\ 1 & x>0\end{cases} $$

    with the Dirac position

    $$ \underline{\hat{x}}=\left[\hat{x}_{1}, \hat{x}_{2}, \ldots, \hat{x}_{L}\right]^{\top} $$

CDF of $\tilde{f}(x)$ follows analogously:

$$ \tilde{F}(x)=\int_{-\infty}^{x} \tilde{f}(u) \mathrm{d} u $$
  • Example

    Gaussian density:

    $$ \tilde{f}(x)=\frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{1}{2} x^{2}\right) $$

    $\Rightarrow$ Gaussian cumulative distribution:

    $$ \tilde{F}(x)=\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{x}{\sqrt{2}}\right)\right) $$

We compare $\tilde{F}(x), F(x)$ use Cramér–von Mises distance:

$$ D(\underline{\hat{x}})=\int_{\mathbb{R}}(\tilde{F}(x)-F\left(x, \underline{\hat{x}})\right)^{2} \mathrm{~d} x $$

Minimization of Cramér–von Mises distance

Gradient of the distance measure $D(\underline{\hat{x}})$:

$$ \underline{G}(\underline{\hat{x}})=\nabla D(\underline{\hat{x}})=\frac{\partial D(\underline{\hat{x}})}{\partial \underline{\hat{x}}}=\left[\frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{1}}, \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{2}}, \ldots, \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{L}}\right]^{\top} $$

with

$$ \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{i}}=2 w_{i} \int_{-\infty}^{\infty}[\tilde{F}(t)-F(t, \underline{\hat{x}})] \delta\left(t-\hat{x}_{i}\right) \mathrm{d} t $$

or

$$ \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{i}}=2 w_{i}\left[\tilde{F}\left(\hat{x}_{i}\right)-F\left(\hat{x}_{i}, \underline{\hat{x}}\right)\right] \text { with } F\left(\hat{x}_{i}, \underline{\hat{x}}\right)=\sum_{j=1}^{L} w_{j} \mathrm{H}\left(\hat{x}_{i}-\hat{x}_{j}\right) $$

The Hesse matrix is

$$ \mathbf{H}(\underline{x})=\operatorname{diag}\left(\left[\frac{\partial^{2} D(\underline{\hat{x}})}{\partial \hat{x}_{1}^{2}}, \frac{\partial^{2} D(\underline{\hat{x}})}{\partial \hat{x}_{2}^{2}}, \ldots, \frac{\partial^{2} D(\underline{\hat{x}})}{\partial \hat{x}_{L}^{2}}\right]\right) $$

with

$$ \frac{\partial^{2} D(\underline{\hat{x}})}{\partial \hat{x}_{i}^{2}}=2 w_{i} \tilde{f}\left(\hat{x}_{i}\right) $$

Sorted Locations & Equal Weights

When location vector $\underline{\hat{x}}$ is sorted, i.e., $\hat{x}_{1}<\hat{x}_{2}<\ldots<\hat{x}_{L}$ , we obtain

$$ H\left(\hat{x}_{i}-\hat{x}_{j}\right)= \begin{cases} 0 & i < j \\ \frac{1}{2} & i=j \\ 1 & i > j \end{cases} $$

Cumulative distribution can be simplified

$$ F\left(\hat{x}_{i}, \underline{\hat{x}}\right)=\frac{w_{i}}{2}+\sum_{j=1}^{i-1} w_{j} $$

When samples are equally weighted (i.e. $w_i = \frac{1}{L}$), we get

$$ F(\hat{x}_{i}, \underline{\hat{x}}) = \frac{1}{2L} + \frac{i-1}{L} = \frac{2i - 1}{2L} \qquad i = 1, \dots, L $$

Analytic solutions (possible in some special cases)

$$ \tilde{F}\left(\hat{x}_{i}\right)-F\left(\hat{x}_{i}, \underline{\hat{x}}\right)=0 \Rightarrow \hat{x}_{i}=\tilde{F}^{-1}(\frac{2 i-1}{2 L}) \qquad i=1, \ldots, L $$
  • E.g. Gaussian distribution:

    $$ \tilde{F}^{-1}(x)=\sqrt{2} \operatorname{erfinv}((2 i-1) /(2 L)) $$

Example: DMA of standard Normal Distribution

With increasing number of Dirac functions, the CDF can be well approximated.

With increasing number of Dirac functions, the CDF can be well approximated.

截屏2022-08-11 09.57.08

General Optimization

In general: Minimum of $D(\underline{\hat{x}})$ is obtained iteratively using Newton’s method

$$ \Delta \underline{\hat{x}}=-\mathbf{H}^{-1}(\underline{\hat{x}}) \underline{G}(\underline{\hat{x}}) $$

with

$$ \underline{G}(\underline{\hat{\hat{x}}})=\left[\frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{1}}, \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{2}}, \ldots, \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{L}}\right]^{\top} $$

and

$$ \frac{\partial D(\underline{\hat{x}})}{\partial \hat{x}_{i}}=2 w_{i}\left[\tilde{F}\left(\hat{x}_{i}\right)-F\left(\hat{x}_{i}, \underline{\hat{x}}\right)\right] $$

The Hessian $\mathbf{H}(\underline{x})$ is given by

$$ \mathbf{H}(\underline{\hat{x}})=2 \operatorname{diag}\left(\left[w_{1} \tilde{f}\left(\hat{x}_{1}\right), w_{2} \tilde{f}\left(\hat{x}_{2}\right), \ldots, w_{L} \tilde{f}\left(\hat{x}_{L}\right)\right]\right) $$

The resulting Newton step:

$$ \Delta \underline{\hat{x}}=-\left[\frac{\tilde{F}\left(\hat{x}_{1}\right)-F\left(\hat{x}_{1}, \underline{\hat{x}}\right)}{\tilde{f}\left(\hat{x}_{1}\right)}, \frac{\tilde{F}\left(\hat{x}_{2}\right)-F\left(\hat{x}_{2}, \underline{\hat{x}}\right)}{\tilde{f}\left(\hat{x}_{2}\right)}, \ldots, \frac{\tilde{F}\left(\hat{x}_{L}\right)-F\left(\hat{x}_{L}, \underline{\hat{x}}\right)}{\tilde{f}\left(\hat{x}_{L}\right)}\right]^{\top} $$

Extension to Multivariate Distributions

  • Extension to multivariate case is not trivial

    • Less nice properties of multivariate cumulative distributions 🤪
  • Distinguish several classes of multivariate methods

    • Methods that generalize concept of CDF
    • Methods that perform reduction to univariate case
    • Kernel-based methods
    • Continuous flow between density approximations

Challenge of Multivariate Cumulative Distributions

Definition for 2D:

$$ F(u, v)=\int_{-\infty}^{u} \int_{-\infty}^{v} f(x, y) d x d y $$

However, $F(u, v)$ is asymmetric and definition is not unique.

截屏2022-08-11 11.44.30

Alternative definitions:

$$ \begin{aligned} &F(u, v)=\int_{-\infty}^{u} \int_{v}^{\infty} f(x, y) d x d y \\ &F(u, v)=\int_{u}^{\infty} \int_{v}^{\infty} f(x, y) d x d y \\ &F(u, v)=\int_{u}^{\infty} \int_{-\infty}^{v} f(x, y) d x d y \end{aligned} $$
  • Three CDFs are independent, forth is dependent.

For general $N$–dimensional random vectors: $2^N$ different variants,, $2^N - 1$ are independent

$\rightarrow$ exponentially complex!

  • Use in statistical tests difficult
  • Results differ depending on variant

Thus, we require generalization of concept of CDF. 💪

Localized Cumulative Distributions (LCDs)

Univariate case (1D)

💡 Key idea

  • Compare local probability masses of $\tilde{f}(x)$ and $f(x)$
  • Integrate over intervals at all positions $m$ and all widths $b$
截屏2022-08-11 12.04.07

Compare $\tilde{A}(m, b)$ and $A(m,b), \forall m, b$

Symmetric, unique, but redundant…

Multivariate case (2D)

截屏2022-08-11 12.13.09

Different kernels possible

  • Rectangular kernels
  • Gaussian kernels
  • Anisotropic vs. isotropic kernels
  • Separable vs. inseparable kernels

Cumulative Transformation of Densities

Given

  • Random vector $\underline{x} \in \mathbb{R}^N$
  • Probability density function $f(\underline{x}): \mathbb{R}^N \to \mathbb{R}_+$

Localized Cumulative Distribution (LCD):

$$ F(\underline{m}, b)=\int_{\mathbb{R}^{N}} f(\underline{x}) K(\underline{x}-\underline{m}, b) \mathrm{d} \underline{x} $$
  • $K(\cdot, \cdot)$: Kernel
  • $\underline{m}$: Kernel location
  • $\underline{b}$: Kernel width

Specific kernel employed:

$$ K(\underline{x}-\underline{m}, b)=\prod_{k=1}^{N} \exp \left(-\frac{1}{2} \frac{\left(x_{k}-m_{k}\right)^{2}}{b^{2}}\right) $$
  • Separable (i.e. in form of product)
  • isotropic (i.e. same in each direction)
  • Gaussian

Properties of LCD:

  • Symmetric
  • Unique
  • Multivariate

Examples

  • LCD (Rectangular Kernel)

    截屏2022-08-13 10.44.51
  • LCD (Gaussian Kernel)

    截屏2022-08-13 10.47.53

Generalized Cramér–von Mises Distance (GCvD)

Given:

  • LCD of given continuous density $\tilde{F}(\underline{m}, b)$
  • LCD of Dirac mixture $F(\underline{m}, b)$

Definition:

$$ D=\int_{\mathbb{R}_{+}} w(b) \int_{\mathbb{R}^{N}}(\tilde{F}(\underline{m}, b)-F(\underline{m}, b))^{2} \mathrm{~d} \underline{m} \mathrm{~d} b $$

Minimization of GCvD:

  • For many Dirac components → high-dimensional optimization problem
  • Gradient available, Hessian more difficult
  • Use Quasi-Newton method: L-BFGS

Projected Cumulative Distributions (PCD)

Options for Reduction to Univariate Case

Reapproximation methods for univariate case readily available. How can we use univariate methods in multivariate case?

Solution

  • Approximation on principal axis of PDF

    • Limited to densities where principal axis can be defined
    • Examples: Gaussian PDF, Bingham PDF on sphere
    • Does not cover the entire density 🤪
    截屏2022-08-13 10.57.53
  • Cartesian product of 1D approximation 👎

    • Curse of dimensionality (as very similar to grid)
    • Only for product densities (or rotations thereof)
    • Inefficient coverage
  • Representing PDFs by all one-dimensional projections (a.k.a Radon transform) 👍

    1. Represent the two densities $\tilde{f}(\underline{x})$ and $f(\underline{x})$ by infinite set of one-dimensional projections

      • Projections onto all unit vectors $\underline{u} \in \mathbb{S}^{N-1}$

      • We obtain sets of projections $\tilde{f}(r \mid \underline{u})$ and $f(r \mid u)$ ($r$: the density along the unit vector)

      • These are the Radon transforms of $\tilde{f}(\underline{x})$ and $f(\underline{x})$

    2. We compare the sets of projections $\tilde{f}(r \mid \underline{u})$ and $f(r \mid u)$ for every $\underline{u} \in \mathbb{S}^{N-1}$

      • For comparison, we use the univariate cumulative distribution functions $\tilde{F}(r \mid \underline{u})$ and $F(r \mid u)$

      • These are unique, well defined, and easy to calculate 👏

      • Resulting distance measures

        $$ D_{1}(\underline{u})=D(\tilde{f}(r \mid \underline{u}), f(r \mid \underline{u})) $$

        depend on the projection vector $\underline{u}$

    3. We integrate these one-dimensional distance measures $D_1(\underline{u})$ over all unit vectors $\underline{u} \in \mathbb{S}^{N-1}$

      • This gives multivariate distance measure $D(\tilde{f}(\underline{x}), f(\underline{x}))$
      • Typically a discretized subset of $\underline{u} \in \mathbb{S}^{N-1}$ is used
      • Distance measure minimized via univariate Newton updates

Radon Transform

  • Represent general $N$-dimensional probability density functions via the set of all one-dimensional projections

  • Linear projection of random vector $\underline{\boldsymbol{x}} \in \mathbb{R}^{N}$ to scalar random variable $\boldsymbol{r} \in \mathbb{R}$ onto line described by unit vector $\underline{u} \in \mathbb{S}^{N-1}$

    $$ \boldsymbol{r} = \underline{u}^\top \underline{\boldsymbol{x}} $$
  • Given probability density function $f(\underline{x})$ of random vector $\underline{\boldsymbol{x}}$, density $f_r(r \mid \underline{u})$ of $\boldsymbol{r}$ is given by

    $$ f_{r}(r \mid \underline{u})=\int_{\mathbb{R}^{N}} f(\underline{t}) \delta\left(r-\underline{u}^{\top} \underline{t}\right) \mathrm{d} \underline{t} $$
  • $f_r(r \mid \underline{u})$ is Radon transform of $f(\underline{x})$ for all $\underline{u} \in \mathbb{S}^{N-1}$

Visualization:

$u$ is the unit vector.

We project the density on $u$ and we get the projection (yellow area).

Then if we cut through the projection, it gives us the red line.

截屏2022-08-13 12.11.44 截屏2022-08-13 12.12.06

Dirac Mixture Densities

$$ f(\underline{x} \mid \hat{\mathbf{X}})=\sum_{i=1}^{L} w_{i} \delta\left(\underline{x}-\underline{\hat{x}}_{i}\right) $$

with

$$ \hat{\mathbf{X}}=\left[\underline{\hat{x}}_{1}, \underline{\hat{x}}_{2}, \ldots, \underline{\hat{x}}_{L}\right] $$

Radon transform is given by

$$ f_{r}(r \mid \underline{\hat{r}}, \underline{u})=\sum_{i=1}^{L} w_{i} \delta\left(\underline{u}^{\top} \underline{x} - \underline{u}^{\top} \underline{\hat{x}}_{i}\right)=\sum_{i=1}^{L} w_{i} \delta\left(r-\hat{r}_{i}(\underline{u})\right) $$
  • $\hat{r}_{i}(\underline{u})=\underline{u}^{\top} \underline{x}_{i}, i=1, \ldots, L$ are the projected Dirac locations

Collect projected Dirac locations $\hat{r}_{i}(\underline{u})$ in vector

$$ \underline{\hat{r}}=\left[\hat{r}_{1}(\underline{u}), \hat{r}_{2}(\underline{u}), \ldots, \hat{r}_{L}(\underline{u})\right]^{\top} $$

Gaussian Densities

  • For Gaussian Density $f (\underline{x}) $with mean vector $\underline{\hat{x}}$ and covariance matrix $\mathbf{C}_x$, density $f_r(r \mid \underline{u})$ resulting from the projection is also Gaussian

  • Its mean $\hat{r}(\underline{u})$ can simply be calculated by taking the expected value

    $$ \hat{r}(\underline{u})=\mathrm{E}\{\boldsymbol{r}(\underline{u})\}=\mathrm{E}\left\{\underline{u}^{\top} \underline{\boldsymbol{x}}\right\}=\underline{u}^{\top} \underline{\hat{x}} $$
  • Its standard deviation $\sigma_r(\underline{u})$ is given by

    $$ \sigma_{r}^{2}(\underline{u})=\mathrm{E}\left\{(\boldsymbol{r}(\underline{u})-\hat{r}(\underline{u}))^{2}\right\}=\mathrm{E}\left\{\left(\underline{u}^{\top} \boldsymbol{x}-\underline{u}^{\top} \underline{\hat{x}}\right)^{2}\right\}=\mathrm{E}\left\{\underline{u}^{\top}(\boldsymbol{x}-\underline{\hat{x}})(\boldsymbol{x}-\underline{\hat{x}})^{\top} \underline{u}\right\}=\underline{u}^{\top} \mathbf{C}_{x} \underline{u} $$

Gaussian Mixture Densities

  • For $N$-dimensional Gaussian mixture densities $f(\underline{x})$ of the form

    $$ f(\underline{x})=\sum_{i=1}^{M} w_{i} \frac{1}{\sqrt{(2 \pi)^{N}\left|\mathbf{C}_{x, i}\right|}} \exp \left(-\frac{1}{2}\left(\underline{x}-\underline{\hat{x}}_{i}\right)^{\top} \mathbf{C}_{x, i}^{-1}\left(\underline{x}-\underline{\hat{x}}_{i}\right)\right) $$

    the density $f_r(r, \underline{u})$ is also a Gaussian mixture

  • Due to the linearity of the projection operator, it is given by

    $$ f_{r}(r \mid \underline{u})=\sum_{i=1}^{M} w_{i} \frac{1}{\sqrt{2 \pi} \sigma_{r, i}(\underline{u})} \exp \left(-\frac{1}{2} \frac{\left(r-\hat{r}_{i}(\underline{u})\right)^{2}}{\sigma_{r, i}^{2}(\underline{u})}\right) $$

    with

    $$ \hat{r}_{i}(\underline{u})=\underline{u}^{\top} \underline{\hat{x}}_{i} $$

    and

    $$ \sigma_{r, i}(\underline{u})=\sqrt{\underline{u}^{\top} \mathbf{C}_{x, i} \underline{u}} $$

Multivariate Cramér-von Mises Distance

Multivariate distance measure between two continuous and/or discrete probability density functions

  1. One-dimensional Projections via Radon Transform

    Given density $\tilde{f}(\underline{x})$ and its approximation $f(\underline{x})$, represented by their Radon transforms $\tilde{f}(r \mid \underline{u})$ (i.e. by their 1D projections onto unit vectors $\underline{u} \in \mathbb{S}^{N-1}$)

  2. One-dimensional Cumulative Distributions

    Based on Radon transform $\tilde{f}(r \mid \underline{u})$, calculate one-dimensional cumulative distributions of the projected densities as

    $$ \tilde{F}(r \mid \underline{u})=\int_{\infty}^{r} \tilde{f}(t, \underline{u}) \mathrm{d} t $$

    and similarly for $F(r \mid \underline{u})$

    • Example: For Dirac mixture approximation, cumulative distribution function of its Radon transform is given by

      $$ F(r \mid \underline{\hat{r}}, \underline{u})=\sum_{i=1}^{L} w_{i} \mathrm{H}\left(r-\hat{r}_{i}(\underline{u})\right) $$
  3. One-dimensional Distance

    • For comparing the one-dimensional projections, we compare their cumulative distributions $\tilde{F}(r \mid \underline{u})$ and $F(r \mid \underline{\hat{r}}, \underline{u})$ for all $\underline{u} \in \mathbb{S}^{N-1}$

    • As distance measure use integral squared distance

      $$ D_{1}(\underline{\hat{r}}, \underline{u})=\int_{\mathbb{R}}[\tilde{F}(r \mid \underline{u})-F(r \mid \underline{\hat{r}}, \underline{u})]^{2} \mathrm{~d} r $$
    • Gives distance between the projected densities in the direction of the unit vector $\underline{u}$ for all $\underline{u}$

  4. One-dimensional Newton Step

    • Newton step can now be written as
    $$ \Delta \underline{\hat{r}}(\underline{\hat{r}}, \underline{u})=-\mathbf{H}^{-1}(\underline{\hat{r}}, \underline{u}) \underline{G}(\underline{\hat{r}}, \underline{u}) $$

    with

    $$ \begin{aligned} \underline{G}(\underline{\hat{r}}, \underline{u})&=\left[\frac{\partial D_{1}(\underline{\hat{r}}, \underline{u})}{\partial \hat{r}_{1}}, \frac{\partial D_{1}(\underline{\hat{r}}, \underline{u})}{\partial \hat{r}_{2}}, \ldots, \frac{\partial D_{1}(\underline{\hat{r}}, \underline{u})}{\partial \hat{r}_{L}}\right]^{\top} \\ \frac{\partial D_{1}(\underline{\hat{r}}, \underline{u})}{\partial \hat{r}_{i}}&=2 w_{i}\left[\tilde{F}\left(\hat{r}_{i} \mid \underline{u}\right)-F\left(\hat{r}_{i} \mid \underline{u}\right)\right] \end{aligned} $$
    • Hessian $\mathbf{H}(\underline{r}, \underline{u})$ is given by

      $$ \mathbf{H}(\underline{\hat{r}}, \underline{u})=2 \operatorname{diag}\left(\left[w_{1} \tilde{f}\left(\hat{r}_{1} \mid \underline{u}\right), w_{2} \tilde{f}\left(\hat{r}_{2} \mid \underline{u}\right), \ldots, w_{L} \tilde{f}\left(\hat{r}_{L} \mid \underline{u}\right)\right]\right) $$
    • $\Rightarrow$ Resulting Newton step

      $$ \Delta \underline{\hat{r}}(\underline{\hat{r}}, \underline{u})=-\left[\frac{\tilde{F}\left(\hat{r}_{1} \mid \underline{u}\right)-F\left(\hat{r}_{1} \mid \underline{\hat{r}}, \underline{u}\right)}{\tilde{f}\left(\hat{r}_{1} \mid \underline{u}\right)}, \ldots, \frac{\tilde{F}\left(\hat{r}_{L} \mid \underline{u}\right)-F\left(\hat{r}_{L} \mid \underline{\hat{r}}, \underline{u}\right)}{\tilde{f}\left(\hat{r}_{L} \mid \underline{u}\right)}\right]^{\top} $$
  5. Backprojection to $N$-dimensional space

    • For specific projection vector $\underline{u}$: Newton update $\Delta \underline{\hat{r}}(\underline{\hat{r}}, \underline{u})$

    • Backprojection into original $N$-dimensional space: Update can be used to modify original Dirac locations in direction along the vector $\underline{u}$

    • For every location vector $\underline{\hat{x}}_i$ we obtain

      $$ \Delta \underline{\hat{x}}_{i}(\underline{u})=\Delta \underline{\hat{r}}(\underline{\hat{r}}, \underline{u}) \cdot \underline{u} $$
  6. Assemble Multivariate Distance

    • Individual 1D distances $D_1(\underline{r}, \underline{u})$ can be assembled to form multivariate distance measure

    • Performed by integrating over all 1D distances depending on unit vector $\underline{u}$

      $$ D_{N}(\hat{\mathbf{X}})=\int_{\mathbb{S}^{N-1}} D_{1}(\underline{\hat{r}}, \underline{u}) \mathrm{d} \underline{u} $$
    • Plugging in $D_1(\underline{r}, \underline{u})$:

      $$ D_{N}(\hat{\mathbf{X}})=\int_{\mathbb{S}^{N-1}} \int_{\mathbb{R}}[\tilde{F}(r \mid \underline{u})-F(r \mid \underline{\hat{r}}, \underline{u})]^{2} \mathrm{~d} r \mathrm{~d} \underline{u} \quad \text { with } r=\underline{u}^{\top} \cdot \underline{x} $$
  7. Perform Full Newton Update

    Full Newton update by integrating over all partial updates along projection vectors $\underline{u}$

    $$ \Delta \underline{\hat{x}}_{i}=\int_{\mathbb{S}^{N-1}} \Delta \underline{\hat{x}}_{i}(\underline{u}) \mathrm{d} \underline{u} $$

Discretization of Space of Unit Vectors

  • In practice, Space $\mathbb{S}^{N-1}$ containing unit vectors $\underline{u}$ has to be discretized

  • Two options are available for performing the discretization:

    • Deterministic discretization, e.g., by calculating a grid
    • Random discretization by drawing uniform samples from the hypersphere
  • For both cases: Consider $K$ samples $\underline{u}_k$

  • Integration reduces to summation

    $$ \Delta \underline{\hat{x}}_{i} \approx \frac{1}{K} \sum_{k=1}^{K} \Delta \underline{\hat{x}}_{i}\left(\underline{\hat{u}}_{k}\right) \quad \text { for } i=1,2, \ldots, L $$
  • Stopping criterion

    • Given initial locations for the location of the Dirac components

    • Full Newton updates are performed until the maximum change over all location vectors

      $$ \max _{i}\left|\Delta \underline{\hat{x}}_{i}\right| $$

      falls below a given threshold

Complete Algorithm (Randomized Variant)

截屏2022-08-14 11.44.09