Gesture Recognition

Gesture Recognition

Introduction

Gesture

  • a movement usually of the body or limbs that expresses or emphasizes an idea, sentiment, or attitude
  • the use of motions of the limbs or body as a means of expression

Automatic Gesture Recognition

  • A gesture recognition system generates a semantic description for certain body motions
  • Gesture recognition exploits the power of non-verbal communication, which is very common in human-human interaction
  • Gesture recognition is often built on top of a human motion tracker

Applications

  • Multimodal Interaction
    • Gestures + Speech recognition
    • Gestures + gaze
    • Human-Robot Interaction
    • Interaction with Smart Environments
  • Understanding Human Interaction

Types of Gestures

  • Hand & arm gestures

    • Pointing Gestures
    • Sign Language
  • Head gestures

    • Nodding, head shaking, turning, pointing
  • Body gestures

截屏2021-07-20 22.36.21

Automatic Gesture Recognition

截屏2021-07-20 22.37.29

  • Feature Acquisition
    • Appearances: Markers, color, motion, shape, segementation, stereo, local descriptors, space-time interest points, …
    • Model based: body- or hand-models
  • Classifiers
    • SVM, ANN, HMMs, Adaboost, Dec. Trees, Deep Learning …

Hidden Markov Models (HMMs) for Gesture Recognition

截屏2021-07-20 22.40.46

  • hidden”: comes from observing observations and drawing conclusions WITHOUT knowing the hidden sequence of states

  • Markov assumption (1st order): the next state depends ONLY on the current state (not on the complete state history)

A Hidden Markov Model is a five-tuple

(S,π,A,B,V) (S, \pi, \mathbf{A}, B, V)
  • S=s_1,s_2,,s_nS = \\{s\_1, s\_2, \dots, s\_n\\}: set of states
  • π\pi: the initial probability distribution
    • π(s_i)\pi(s\_i) = probability of s_is\_i being the first state of a state sequence
  • A=(a_ij)\mathbf{A} = (a\_{ij}): the matrix of state transition probabilities
    • (a_ij)(a\_{ij}): probability of state s_js\_j following s_is\_i
  • B=b_1,b_2,,b_nB = \\{b\_1, b\_2, \dots, b\_n\\}: the set of emission probability distributions/densities
    • b_i(x)b\_i(x): probability of observing xx when the system is in state s_is\_i
  • VV: the observable feature space
    • Can be discrete (V=x_1,x_2,,x_vV = \\{x\_1, x\_2, \dots, x\_v\\}) or continuous (V=RdV = \mathbb{R}^d)

Properties of HMMs

  • For the initial probabilities:

    _iπ(s_i)=1 \sum\_i \pi(s\_i) = 1
    • Often simplified by π(s_1)=1,π(s_i>1)=0 \pi(s\_1) = 1, \quad \pi(s\_i > 1) = 0
  • For state transition probabilities:

    i:_ja_ij=1 \forall i: \sum\_j a\_{ij} = 1
    • Often: a_ij=0a\_{ij} = 0 for most jj except for a few states
  • When V=x_1,x_2,,x_vV = \\{x\_1, x\_2, \dots, x\_v\\} then b_ib\_i are discrete probability distributions, the HMMs are called discrete HMMs

  • When V=RdV = \mathbb{R}^d then b_ib\_i are continuous probability density functions, the HMMs are called continuous (density) HMMs

HMM Topologies

截屏2021-07-20 23.06.32

The Observation Model

Most popular: Gaussian mixture models

P(x_ts_j)=_k=1n_jc_jk1(2π)nΣ_jke12(x_tμ_jk)TΣ_jk1(x_tμ_jk) P\left(x\_{t} \mid s\_{j}\right)=\sum\_{k=1}^{n\_{j}} c\_{j k} \cdot \frac{1}{\sqrt{(2 \pi)^{n}\left|\Sigma\_{j k}\right|}} e^{-\frac{1}{2}\left(x\_{t}-\mu\_{j k}\right)^{\mathrm{T}} \Sigma\_{j k}^{-1}\left(x\_{t}-\mu\_{j k}\right)}
  • n_jn\_j: number of Gaussians (in state jj)
  • c_jkc\_{jk}: mixture weight for kk-th Gaussian (in state jj)
  • μ_jk\mu\_{jk}: means of kk-th Gaussian (in state jj)
  • Σ_jk\Sigma\_{jk}: covariane matrix of kk-th Gaussian (in state jj)

Three Main Tasks with HMMs

Given an HMM λ\lambda and an observation x_1,x_2,,x_Tx\_1, x\_2, \dots, x\_T

  • The evaluation problem

    compute the probability of the observation p(x_1,x_2,,x_Tλ)p(x\_1, x\_2, \dots, x\_T | \lambda)

    \rightarrow “Forward Algorithm”

  • The decoding problem

    compute the most likely state sequence s_q1,s_q2,,s_qTs\_{q1}, s\_{q2}, \dots, s\_{qT}, i.e.

    argmax_q1,,qτp(q_1,..,q_Tx_1,x_2,,x_T,λ) \operatorname{argmax}\_{q 1, \ldots, q \tau} p\left(q\_{1}, . ., q\_{T} \mid x\_{1}, x\_{2}, \ldots, x\_{T}, \lambda\right)

    \rightarrow “Viterbi-Algorithm”

  • The learning/optimization problem

    Find an HMM λ\lambda^\prime s.t. p(x_1,x_2,,x_Tλ)>p(x_1,x_2,,x_Tλ)p\left(x\_{1}, x\_{2}, \ldots, x\_{T} \mid \lambda^{\prime}\right)>p\left(x\_{1}, x\_{2}, \ldots, x\_{T} \mid \lambda\right)

    \rightarrow “Baum-Welch-Algo”, “Viterbi-Learning”

Sign Language Recognition

  • American Sign Language (ASL)
    • 6000 gesture describe persons, places and things
    • Exact meaning and strong rules of context and grammar for each
  • Sign recognition
    • HMM ideal for complex and structured hand gestures of ASL

Feature extraction

  • Camera either located as a 1st-person and a 2nd-person view
  • Segment hand blobs by a skin color model

HMM for American Sign Language

  • Four-State HMM for each word

    截屏2021-07-20 23.39.28

  • Training

    • Automatic segmentation of sentences in five portions
    • Initial estimates by iterative Viterbi-alignment
    • Then Baum-Welch re-estimation
    • No context used
  • Recognition

    • With and without part-of-speech grammar
    • All features / only relative features used

ASL Results

Desk-based

348 training and 94 testing sentences without contexts

Accuracy:

Acc=NDSIN Acc = \frac{N-D-S-I}{N}
  • NN: #Words
  • DD: #Deletions
  • SS: #Substituitions
  • II: #Insertions

截屏2021-07-20 23.42.19

Wearable-based

  • 400 training sentences and 100 for testing
  • Test 5-word sentences
  • Restricted and unrestricted similar!

截屏2021-07-20 23.43.41

Pointing Gesture Recognition

  • Pointing gestures

    • are used to specify objects and locations

    • can be needful to resolve ambiguities in verbal statements

  • Definition: Pointing gesture = movement of the arm towards a pointing target

  • Tasks

    • Detect occurrence of human pointing gestures in natural arm movements
    • Extract the 3D pointing direction

Interaction in a Smart Room