Bolzmann Machine

Boltzmann Machine

  • Stochastic recurrent neural network
  • Introduced by Hinton and Sejnowski
  • Learn internal representations
  • Problem: unconstrained connectivity

Representation

Model can be represented by Graph:

ζˆͺ屏2020-08-18 22.51.29

States

Types:

  • Visible states
    • Represent observed data
    • Can be input/output data
  • Hidden states
    • Latent variable we want to learn
  • Bias states
    • Always one to encode the bias

Binary states

  • unit value ∈0,1\in \\{0, 1\\}

Stochastic

  • Decision of whether state is active or not is stochastically

  • Depend on the input

    z_i=b_i+βˆ‘_js_jw_ij z\_{i}=b\_{i}+\sum\_{j} s\_{j} w\_{i j}
    • b_ib\_i: Bias
    • S_jS\_j: State jj
    • w_ijw\_{ij}: Weight between state jj and state ii
    p(s_i=1)=11+eβˆ’z_i p\left(s\_{i}=1\right)=\frac{1}{1+e^{-z\_{i}}}

Connections

  • Graph can be fully connected (no restrictions)

  • Unidircted:

    w_ij=w_ji w\_{ij} = w\_{ji}
  • No self connections:

    w_ii=0 w\_{ii} = 0

Energy

Energy of the network

$$ \begin{aligned} E &= -S^TWS - b^TS \\\\ &= -\sum\_{i Probability of input vector vv

p(v)=eβˆ’E(v)βˆ‘_ueβˆ’E(u) p(v)= \frac{e^{-E(v)}}{\displaystyle \sum\_{u} e^{-E(u)}}

Updating the nodes

  • decrease the Energy of the network in average

  • reach Local Minimum (Equilibrium)

  • Stochastic process will avoid local minima

    p(s_i=1)=11+eβˆ’z_iz_i=Ξ”E_i=E_i=0βˆ’E_i=1 \begin{array}{c} p\left(s\_{i}=1\right)=\frac{1}{1+e^{-z\_{i}}} \\\\ z\_{i}=\Delta E\_{i}=E\_{i=0}-E\_{i=1} \end{array}

Simulated Annealing

ζˆͺ屏2020-08-18 23.53.50

Use Temperature to allow for more changes in the beginning

  • Start with high temperature

  • β€œanneal” by slowing lowering T

  • Can escape from local minima πŸ‘

Search Problem

ζˆͺ屏2020-08-18 23.54.58
  • Input is set and fixed (clamped)

  • Annealing is done

  • Answer is presented at the output

  • Hidden units add extra representational power

Learning problem

  • Situations

    • Present data vectors to the network
  • Problem

    • Learn weights that generate these data with high probability
  • Approach

    • Perform small updates on the weights
    • Each time perform search problem

Pros & Cons

βœ… Pros

  • Boltzmann machine with enough hidden units can compute any function

⛔️ Cons

  • Training is very slow and computational expensive 😒

Restricted Boltzmann Machine

ζˆͺ屏2020-08-18 23.58.36
  • Boltzmann machine with restriction

  • Graph must be bipartite

    • Set of visible units

    • Set of hidden units

  • βœ… Advantage

    • No connection between hidden units
    • Efficient training

Energy

img

Energy:

E(v,h)=βˆ’aTvβˆ’bThβˆ’vTWh=βˆ’βˆ‘_ia_iv_iβˆ’βˆ‘_jb_jh_jβˆ’βˆ‘iβˆ‘jviwijhj \begin{aligned} E(v, h) &= -a^{\mathrm{T}} v-b^{\mathrm{T}} h-v^{\mathrm{T}} W h \\\\ &= -\sum\_{i} a\_{i} v\_{i}-\sum\_{j} b\_{j} h\_{j}-\sum_{i} \sum_{j} v_{i} w_{i j} h_{j} \end{aligned}

Probability of hidden unit:

p(h_j=1∣V)=Οƒ(b_j+βˆ‘_i=1mW_ijv_i) p\left(h\_{j}=1 \mid V\right)=\sigma\left(b\_{j}+\sum\_{i=1}^{m} W\_{i j} v\_{i}\right)

Probability of input vector:

p(v_i∣H)=Οƒ(a_i+βˆ‘_j=1FW_ijh_j) p\left(v\_{i} \mid H\right)=\sigma\left(a\_{i}+\sum\_{j=1}^{F} W\_{i j} h\_{j}\right)
>Οƒ(x)=11+eβˆ’x> > \sigma(x)=\frac{1}{1+e^{-x}} >

Free Energy:

eβˆ’F(V)=βˆ‘_j=1Feβˆ’E(v,h)F(v)=βˆ’βˆ‘_i=1mv_ia_iβˆ’βˆ‘j=1Flog⁑(1+ezj)zj=b_j+βˆ‘_i=1mW_ijv_i \begin{array}{l} e^{-F(V)}=\sum\_{j=1}^{F} e^{-E(v, h)} \\\\ F(v)=-\sum\_{i=1}^{m} v\_{i} a\_{i}-\sum_{j=1}^{F} \log \left(1+e^{z_{j}}\right) \\\\ z_{j}=b\_{j}+\sum\_{i=1}^{m} W\_{i j} v\_{i} \end{array}