<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Face Detection | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/face-detection/</link><atom:link href="https://haobin-tan.netlify.app/tags/face-detection/index.xml" rel="self" type="application/rss+xml"/><description>Face Detection</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 13 Nov 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Face Detection</title><link>https://haobin-tan.netlify.app/tags/face-detection/</link></image><item><title>Face Detection: Color-Based</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-lecture/03-face-detection-color/</link><pubDate>Fri, 06 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-lecture/03-face-detection-color/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Different color spaces and classifiers can be used&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Models: histograms, Gaussian Models, Mixture of Gaussians Model&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Histogram-backprojection / Histogram matching&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Bayes classifier&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Discriminative Classifiers (ANN, SVM)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Bayesian classifier and ANN seem to work well&lt;/p>
&lt;ul>
&lt;li>Sufficient training data is needed for modeling the pdf, in particular for Bayesian approach (positive &amp;amp; negative pdfs learned)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Advantages: Fast, rotation &amp;amp; scale invariant, robust against occlusions&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Disadvantages:&lt;/p>
&lt;ul>
&lt;li>Affected by illumination&lt;/li>
&lt;li>Cannot distinguish head and hands&lt;/li>
&lt;li>Skin-colored objects in the background problematic&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Metric: ROC curve used to compare classification results / methods&lt;/p>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="color-based-face-detection-overview">Color-based face detection overview&lt;/h2>
&lt;p>💡 &lt;strong>Idea: human skin has consistent color, which is distinct from many objects&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2014.57.37.png" alt="截屏2020-11-10 14.57.37">&lt;/p>
&lt;p>Possible approach:&lt;/p>
&lt;ol>
&lt;li>Find skin colored pixels&lt;/li>
&lt;li>Group skin colored pixels&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>(and apply some heuristics) to find the face&lt;/li>
&lt;/ul>
&lt;h2 id="color">Color&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Grayscale&lt;/strong> Image: Each pixel represented by &lt;strong>one&lt;/strong> number (typically integer between 0 and 255)&lt;/li>
&lt;li>&lt;strong>Color&lt;/strong> image: Pixels represented by &lt;strong>three&lt;/strong> numbers&lt;/li>
&lt;/ul>
&lt;p>Different representations exist &amp;ndash;&amp;gt; „Color Spaces“&lt;/p>
&lt;h3 id="color-spaces">Color spaces&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>RGB&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>most widely used&lt;/p>
&lt;/li>
&lt;li>
&lt;p>specifies colors in terms of the primary colors &lt;strong>red (R), green (G), and blue (B)&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2015.00.08-20201110184617048.png" alt="截屏2020-11-10 15.00.08">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>HSV/HSI&lt;/strong>: &lt;strong>hue (H)&lt;/strong>, &lt;strong>saturation (S)&lt;/strong> and &lt;strong>value(V)/intensity (I)&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Closely related to human perception (hue, colorfulness and brightness)&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2017.27.38.png" alt="截屏2020-11-10 17.27.38">&lt;/p>
&lt;ul>
&lt;li>Hue: &amp;ldquo;color&amp;rdquo;&lt;/li>
&lt;li>Saturation: How &amp;ldquo;pure&amp;rdquo; the color is?&lt;/li>
&lt;li>Value: &amp;ldquo;lightness&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Class Y spaces&lt;/strong>: YCbCr (Digital Video), YIQ (NTSC), YUV (PAL)&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Y channel contains brightness, other two channels store chrominance (U=B-Y, V=R-Y)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Conversion from RGB to Yxx is a linear transformation&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2018.18.27.png" alt="截屏2020-11-10 18.18.27">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Perceptually uniform spaces&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Perceived color difference is uniform to difference in color values&lt;/li>
&lt;li>Euclidian distance can be used for color comparison&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2018.19.07.png" alt="截屏2020-11-10 18.19.07">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chromatic Color Spaces&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Two color channels containing chrominance (colour) information&lt;/p>
&lt;ul>
&lt;li>HS (taken from HSV)&lt;/li>
&lt;li>UV (taken from YUV)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Normalized rg from RGB:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>r = R / (R+G+B)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>g = G / (R+G+B)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>b = B / (R+G+B)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Sometimes it is argued that chromatic skin color models are more robust&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="problems">Problems&lt;/h4>
&lt;ul>
&lt;li>Reflected color depends on spectrum of the light source (and properties of the object / surface)&lt;/li>
&lt;li>If the light source / illumination changes, the reflected color signal changes!!! 🤪&lt;/li>
&lt;/ul>
&lt;h2 id="how-to-model-skin-color">How to model skin color?&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="#histogram-as-skin-color-model">Non-parametric models: typically histograms&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="#parametric-models">Parametric models&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Gaussian Model&lt;/li>
&lt;li>Gaussian Mixture Model&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Or just learn decision boundaries between classes (&lt;a href="#discriminative-models--classifiers">discriminative model&lt;/a>)&lt;/p>
&lt;ul>
&lt;li>ANN, SVM, &amp;hellip;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="histogram-as-skin-color-model">Histogram as skin color model&lt;/h3>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-10%2018.34.57.png" alt="截屏2020-11-10 18.34.57">&lt;/p>
&lt;ul>
&lt;li>👍 Advantages: Works very well in practice&lt;/li>
&lt;li>👎 Disadvantages
&lt;ul>
&lt;li>Memory size quickly gets high&lt;/li>
&lt;li>A large number of labelled skin and non-skin samples is needed!&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="histogram-backprojection">Histogram Backprojection&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>The simplest (and fastest) way to utilize histogram information&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each pixel in the backprojection is set to the value of the (skin-color) histogram bin indexed by the color of the respective pixel&lt;/p>
&lt;ul>
&lt;li>A color $x$ is considered as skin color if $H\_{+}(x) > \theta$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>E.g.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-07-22%2022.20.33.png" alt="截屏2021-07-22 22.20.33">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="histogram-matching">Histogram Matching&lt;/h4>
&lt;ul>
&lt;li>Backprojection
&lt;ul>
&lt;li>is good, when the color distribution of the target is monomodal.&lt;/li>
&lt;li>is not optimal, when the target is multi colored! &amp;#x1f622;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>🔧 Solution: Build a histogram of the image within the search window, and compare it to the target histogram.
&lt;ul>
&lt;li>distance metrics for histograms, e.g.:
&lt;ul>
&lt;li>
&lt;p>Battacharya distance&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Histogram intersection&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Earth-movers distance,&amp;hellip;&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="histogram-backprojection-vs-matching">Histogram Backprojection vs. Matching&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Histogram Backprojection&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Compares color of a single pixel with color model&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Fast and simple&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Can only cope well with mono-modal distributions&lt;/p>
&lt;/li>
&lt;li>
&lt;p>sufficient for skin-color classification&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Histogram Matching / Intersection&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Compares color histogram of image patch with color model&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Better performance&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Can cope with multi-modal distributions&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Computationally expensive&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="parametric-models">Parametric models&lt;/h3>
&lt;h4 id="gaussian-density-models">Gaussian Density Models&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Gaussian Densities&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Assume that the distribution of skin colors p(x) has a parametric functional form&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Most common function: Gaussion function $\mathrm{G}(\mathbf{x} ; \mu, \mathbf{C})$
&lt;/p>
$$
p(x | \text{skin})=G(x ; \mu, C)=\frac{1}{(2 \pi)^{d / 2}|C|^{1 / 2} }\exp \left\\{-1 / 2(x-\mu)^{\top} C^{-1}(x-\mu)\right\\}
$$
&lt;ul>
&lt;li>Mean $\mu$ and covariance matrix $C$ are estimated from a training set of skin colors $S = {x\_1,x\_2,...,x\_N}$:
&lt;ul>
&lt;li>$\mu = E\{x\}$&lt;/li>
&lt;li>$C = E\{(\boldsymbol{x}-\mu)^T(\boldsymbol{x}-\mu)\}$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>A color is considered as skin color if&lt;/p>
&lt;ul>
&lt;li>$p(x|\text{skin}) > \theta$&lt;/li>
&lt;li>$p(x|\text{skin}) > p(x|\text{non-skin})$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="mixture-of-gaussian-models">Mixture of Gaussian Models&lt;/h4>
$$
p(x)=\sum\_{i=1}^{K} \pi\_{i} G\left(x, \mu\_{i}, C\_{i}\right)
$$
&lt;ul>
&lt;li>
&lt;p>Parameter set $\Phi$ can be estimated using the &lt;strong>EM&lt;/strong> algorithm&lt;/p>
&lt;ul>
&lt;li>Iteratively changes parameters so as to maximize the log-likelihood of the training set:
$$
L=\log \prod\_{i=1}^{N} p\left(x\_{i} \mid \Phi\right)
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>A color is considered as skin color if&lt;/p>
&lt;ul>
&lt;li>$p(x|\text{skin}) > \theta$&lt;/li>
&lt;li>$p(x|\text{skin}) > p(x|\text{non-skin})$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="bayes-classifier">Bayes Classifier&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Skin Classification using &lt;strong>Bayes Decision Rule&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Minimum cost decision rule&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Classify pixel to skin class if $P(\text{Skin} | x)>P(\text{Non-Skin} | x)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Decision Rule:
&lt;/p>
$$
\frac{p(\mathbf{x} \mid \text {Skin})}{p(\mathbf{x} \mid \text {Non-Skin})} \geq \frac{P(\text {Non-Skin})}{P(\text {Skin})}
$$
&lt;/li>
&lt;li>
&lt;p>The classconditionals $p(x|\omega)$ can be estimated from the corresponding histograms:
&lt;/p>
$$
p\left(x \mid \omega\_{i}\right)=h\_{i}(x) / \sum\_{x} h\_{i}(x)
$$
&lt;ul>
&lt;li>$h\_i(x)$: count of pixels from class $\omega\_{i}$ that have value $x$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="discriminative-models--classifiers">Discriminative Models / Classifiers&lt;/h3>
&lt;ul>
&lt;li>Artificial Neural Networks&lt;/li>
&lt;li>Support Vector Machine&lt;/li>
&lt;/ul>
&lt;h2 id="performance-measures">Performance Measures&lt;/h2>
&lt;h3 id="for-classification">For classification&lt;/h3>
&lt;p>When comparing recognition hypotheses with ground-truth annotations have to consider four cases:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/confusion-matrix.png" alt="Measuring Performance: The Confusion Matrix – Glass Box" style="zoom: 40%;" />
&lt;blockquote>
&lt;p>More see: &lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/evaluation/">Evaluation&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;h4 id="roc-receiver-operating-characteristic">ROC (Receiver Operating Characteristic)&lt;/h4>
&lt;ul>
&lt;li>Used for the task of classification&lt;/li>
&lt;li>Measures the trade-off between true positive rate and false positive rate&lt;/li>
&lt;/ul>
$$
\begin{array}{l}
\text { true positive rate }=\frac{\mathrm{TP}}{\mathrm{Pos}}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} \\\\
\text { false positive rate }=\frac{\mathrm{FP}}{\mathrm{Neg}}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>Each prediction hypothesis has generally an associated probability value or score&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The performance values can therefore plotted into a graph for each possible score as a threshold&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-12%2023.27.18.png" alt="截屏2020-11-12 23.27.18">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="skin-color-analysis-and-comparison">Skin-color: Analysis and Comparison&lt;/h3>
&lt;p>Conclusions &lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Bayesian approach and MLP worked best&lt;/p>
&lt;ul>
&lt;li>Bayesian approach needs much more memory&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Approach is largely unaffected by choice of color space, but&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Results degraded when only chrominance channels were used&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="from-skin-colored-pixels-to-faces">From Skin-Colored Pixels to Faces&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Skin-colored pixels need to be grouped into object representations&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-13%2014.56.21.png" alt="截屏2020-11-13 14.56.21" style="zoom:80%;" />
&lt;/li>
&lt;li>
&lt;p>🔴 Problems:&lt;/p>
&lt;ul>
&lt;li>skin-colored background,&lt;/li>
&lt;li>further skin-colored body parts (hands, arms, &amp;hellip;),&lt;/li>
&lt;li>Noise, &amp;hellip;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="perceptual-grouping">Perceptual Grouping&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Morphological Operators&lt;/strong>: Operators performing an action on shapes where the input and output is a binary image.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Threshold each pixel‘s skin affiliation &amp;ndash;&amp;gt; Binary Image&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2014.58.11.png" alt="截屏2020-11-13 14.58.11">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Morphological Erosion&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;em>Remove&lt;/em> pixels from edges of objects&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Set pixel value to &lt;strong>min&lt;/strong> value of surrounding pixels&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2015.00.53.png" alt="截屏2020-11-13 15.00.53">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Morphological Dilatation&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;em>Add&lt;/em> pixels to edges of objects&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Set pixel value to &lt;strong>max&lt;/strong> value of surrounding pixels&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2015.41.11.png" alt="截屏2020-11-13 15.41.11">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Morphological Opening&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Apply erosion, then dilatation&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2015.42.38.png" alt="截屏2020-11-13 15.42.38">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Goal:&lt;/p>
&lt;ul>
&lt;li>Smooth outline&lt;/li>
&lt;li>Open small bridges&lt;/li>
&lt;li>Eliminate outliers&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Morphological Closing&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Apply dilatation, then erosion&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2015.45.25.png" alt="截屏2020-11-13 15.45.25">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Goal:&lt;/p>
&lt;ul>
&lt;li>Smooth inner edges&lt;/li>
&lt;li>Connect small distances&lt;/li>
&lt;li>Fill unwanted holes&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Apply morphological closing then morphological opening&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Resulting image is reduced to connected regions of skin color (blobs)&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2015.59.57.png" alt="截屏2020-11-13 15.59.57">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="from-skin-blobs-to-faces">From Skin Blobs To Faces&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Goal: align bounding box around face candidate&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-13%2016.01.23.png" alt="截屏2020-11-13 16.01.23">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Important for:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Face Recognition&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Head Pose Estimation&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Different approaches:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Choose cluster with biggest size&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Ellipse fitting (approximate face region by ellipse)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Heuristics to distinguish between different skin clusters&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use temporal information (tracking)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Facial Feature Detection&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&amp;hellip;&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>S. L. Phung, A. Bouzerdoum and D. Chai, &amp;ldquo;Skin segmentation using color pixel classification: analysis and comparison,&amp;rdquo; in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148-154, Jan. 2005, doi: 10.1109/TPAMI.2005.17.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item><item><title>Face Detection: Neural-Network-Based</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-lecture/04-face-detection-ann/</link><pubDate>Fri, 13 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-lecture/04-face-detection-ann/</guid><description>&lt;h2 id="motivation">Motivation&lt;/h2>
&lt;ul>
&lt;li>Idea: Use a search-window to scan over an image&lt;/li>
&lt;li>Train a classifier to decide whether the search windows contains a face or not?&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-11-13%2016.16.57.png" alt="截屏2020-11-13 16.16.57" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="detection">Detection&lt;/h2>
&lt;h3 id="simple-neuron-model">Simple neuron model&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-11-13%2016.20.47.png" alt="截屏2020-11-13 16.20.47" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="topologies">Topologies&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-13%2016.21.15.png" alt="截屏2020-11-13 16.21.15" style="zoom:67%;" />
&lt;h3 id="parameters">Parameters&lt;/h3>
&lt;p>Adjustable Parameters are&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Connection weights (to be learned)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Activation function (fixed)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Number of layers (fixed)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Number of neurons per layer (fixed)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="training">Training&lt;/h3>
&lt;p>Backpropagation with gradient descent&lt;/p>
&lt;h2 id="neural-network-based-face-detection1">Neural Network Based Face Detection&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/h2>
&lt;ul>
&lt;li>Idea: Use an artifical neural network to detect upright frontal faces
&lt;ul>
&lt;li>
&lt;p>Network receives as input a 20x20 pixel region of an image&lt;/p>
&lt;/li>
&lt;li>
&lt;p>output ranges from -1 (no face present) to +1 (face present)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>the neural network „face-filter“ is applied at every location in the image&lt;/p>
&lt;/li>
&lt;li>
&lt;p>to detect faces with different sizes, the input image is repeatedly scaled down&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="network-topology">Network Topology&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-11-13%2016.28.33.png" alt="截屏2020-11-13 16.28.33" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>20x20 pixel input retina&lt;/li>
&lt;li>4 types of receptive hidden fields&lt;/li>
&lt;li>One real-valued output&lt;/li>
&lt;/ul>
&lt;h3 id="system-overview">System Overview&lt;/h3>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-11-13%2016.29.19.png" alt="截屏2020-11-13 16.29.19" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h3 id="network-training">Network Training&lt;/h3>
&lt;h4 id="training-set">Training Set&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>1050 normalized face images&lt;/p>
&lt;/li>
&lt;li>
&lt;p>15 face images generated by rotating and scaling original face images&lt;/p>
&lt;/li>
&lt;li>
&lt;p>1000 randomly chosen non-face images&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="preprocessing">Preprocessing&lt;/h4>
&lt;ul>
&lt;li>correct for different lighting conditions (overall brightness, shadows)&lt;/li>
&lt;li>rescale images to fixed size&lt;/li>
&lt;/ul>
&lt;h4 id="histogram-equalization">Histogram equalization&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Defines a mapping of gray levels $p$ into gray levels $q$ such that the distribution of $q$ is close to being uniform&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stretches contrast (expands the range of gray levels)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transforms different input images so that they have similar intensity distributions (thus reducing the effect of different illumination)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-13%2016.32.18.png" alt="截屏2020-11-13 16.32.18" style="zoom:67%;" />
&lt;/li>
&lt;li>
&lt;p>Algorithm&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The probability of an occurrence of a pixel of level $i$ in the image:
&lt;/p>
$$
p\left(x\_{i}\right)=\frac{n\_{i}}{n}, \qquad i \in 0, \ldots, L-1
$$
&lt;ul>
&lt;li>$L$: number of gray levels&lt;/li>
&lt;li>$n\_i$: number of occurences of gray level $i$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Define $c$ as the cumulative distribution function:
&lt;/p>
$$
c(i)=\sum\_{j=0}^{i} p\left(x\_{j}\right)
$$
&lt;/li>
&lt;li>
&lt;p>Create a transformation of the form
&lt;/p>
$$
y\_i = T(x\_i) = c(i), \qquad y\_i \in [0, 1]
$$
&lt;p>
will produce a level $y$ for each level $x$ in the original image, such that the cumulative probability function of $y$ will be linearized across the value range.
&lt;/p>
$$
y\_{i}^{\prime}=y\_{i} \cdot(\max -\min )+\min
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="training-procedure">Training Procedure&lt;/h4>
&lt;ol>
&lt;li>
&lt;p>Randomly choose 1000 non-face images&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Train network to produce 1 for faces, -1 for non-faces&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Run network on images containing no faces. Collect subimages in which network incorrectly identifes a face (output &amp;gt; 0)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Select up to 250 of these „false positives“ at random and add them to the training set as negative examples&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="neural-network-based-face-filter">Neural Network Based Face Filter&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Output of ANN defines a filter for faces&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Search&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Scan input image with search window, apply ANN to search window&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Input image needs to be rescaled in order to detect faces with different size&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Output needs to be post-processed&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Noise removal&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Merging overlapping detections&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Speed up can be achieved&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Increase step size&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make ANN more flexible to translation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hierarchical, pyramidal search&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="localization-and-ground-truth">Localization and Ground-Truth&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>For localization, the test data is mostly annotated with ground-truth bounding boxes&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Comparing hypotheses to Ground-Truth&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Overlap
&lt;/p>
$$
O = \frac{\text{GT } \cap \text{ DET}}{\text{GT } \cup \text{ DET}}
$$
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%e6%88%aa%e5%b1%8f2020-11-13%2016.43.11.png" alt="截屏2020-11-13 16.43.11" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;blockquote>
&lt;p>Also called &lt;strong>Intersection over Union (IoU)&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>Often used as threshold: Overlap&amp;gt;50%&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>&lt;em>Neural Network Based Face Detection, by Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998.&lt;/em>&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item></channel></rss>