<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>HPE | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/hpe/</link><atom:link href="https://haobin-tan.netlify.app/tags/hpe/index.xml" rel="self" type="application/rss+xml"/><description>HPE</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 25 May 2021 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>HPE</title><link>https://haobin-tan.netlify.app/tags/hpe/</link></image><item><title>Human Pose Estimation (HPE)</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-hpe/</link><pubDate>Tue, 25 May 2021 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-hpe/</guid><description/></item><item><title>Human Pose Estimation Datasets</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-hpe/hpe_datasets/</link><pubDate>Tue, 25 May 2021 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/cv-hpe/hpe_datasets/</guid><description>&lt;h2 id="coco-keypoints-detection">COCO Keypoints Detection&lt;/h2>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/keypoints-splash.png" alt="https://cocodataset.org/#keypoints-2018">&lt;/p>
&lt;p>17 Keypoints:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/fix-overlay-issue.jpg" alt="img">&lt;/p>
&lt;p>Keypoint detection format:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-05-25%2015.48.14.png" alt="截屏2021-05-25 15.48.14">&lt;/p>
&lt;h3 id="annotations">Annotations&lt;/h3>
&lt;p>Annotations for keypoints are just like in Object Detection (Segmentation), except a number of keypoints is specified in sets of 3, &lt;code>(x, y, v)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="err">annotation&lt;/span>&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="err">x&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">y&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">v&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">...&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;num_keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;image_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;category_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;segmentation&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">RLE&lt;/span> &lt;span class="err">or&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="err">polygon&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;area&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">float&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="err">x&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">y&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">width&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">height&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;iscrowd&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span> &lt;span class="err">or&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>&lt;strong>&amp;ldquo;keypoints&amp;rdquo;&lt;/strong>: a length &lt;code>3k &lt;/code>array where &lt;code>k&lt;/code> is the total number of keypoints defined for the category.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Each keypoint has&lt;/p>
&lt;ul>
&lt;li>
&lt;p>a 0-indexed location &lt;code>x, y&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>visible flag &lt;code>v&lt;/code>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>v=0&lt;/code>: not labeled (in which case &lt;code>x=y=0&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>v=1&lt;/code>: labeled but not visible&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>v=2&lt;/code>: labeled and visible&lt;/p>
&lt;blockquote>
&lt;p>A keypoint is considered visible if it falls inside the object segment.&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>For example, &lt;code>(229, 256, 2)&lt;/code> means there’s a keypoint at pixel &lt;code>x=229&lt;/code>, &lt;code>y=256&lt;/code> and &lt;code>v=2&lt;/code> indicates that it is a visible keypoint&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&amp;ldquo;num_keypoints&amp;rdquo;&lt;/strong>: indicates the number of labeled keypoints (&lt;code>v&amp;gt;0&lt;/code>) for a given object (many objects, e.g. crowds and small objects, will have num_keypoints=0).&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Example&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="s2">&amp;#34;annotations&amp;#34;&lt;/span>&lt;span class="err">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;segmentation&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[[&lt;/span>&lt;span class="mf">204.01&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mf">306.23&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">...&lt;/span>&lt;span class="mf">206.53&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mf">307.95&lt;/span>&lt;span class="p">]],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;num_keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;area&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mf">5463.6864&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;iscrowd&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">229&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">256&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="err">...&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">223&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">369&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;image_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">289343&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mf">204.01&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mf">235.08&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mf">60.84&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mf">177.36&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;category_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">201376&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="categories">Categories&lt;/h3>
&lt;p>Currently keypoints are only labeled for the &lt;code>person&lt;/code> category (for most medium/large non-crowd person instances).&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">int&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="err">str&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="err">str&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;skeleton&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="err">edge&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Compared to Object Detection, categories of keypoint detection has two additional fields&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&amp;ldquo;keypoints&amp;rdquo;&lt;/strong>: a length &lt;code>k&lt;/code> array of keypoint names&lt;/li>
&lt;li>&lt;strong>&amp;ldquo;skeleton&amp;rdquo;&lt;/strong>: defines connectivity via a list of keypoint edge pairs and is used for visualization.
&lt;ul>
&lt;li>E.g. &lt;code>[16, 14]&lt;/code> means &amp;ldquo;left_ankle&amp;rdquo; connects to &amp;ldquo;left_knee&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Example&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="s2">&amp;#34;categories&amp;#34;&lt;/span>&lt;span class="err">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;person&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;person&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;keypoints&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;nose&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;left_eye&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_eye&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;left_ear&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_ear&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;left_shoulder&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_shoulder&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;left_elbow&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_elbow&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;left_wrist&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_wrist&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;left_hip&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_hip&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;left_knee&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_knee&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;left_ankle&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="s2">&amp;#34;right_ankle&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;skeleton&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>&lt;span class="mi">16&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">14&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">14&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">12&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">17&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">13&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">12&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">13&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">6&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">12&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">13&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">6&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">[&lt;/span>&lt;span class="mi">6&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">8&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">9&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">9&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">11&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">4&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">6&lt;/span>&lt;span class="p">],[&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="mi">7&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Visualization: see &lt;a href="https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb">pycocoDemo.ipynb&lt;/a>&lt;/p>
&lt;h2 id="mpii">MPII&lt;/h2>
&lt;ul>
&lt;li>State of the art benchmark for evaluation of articulated human pose estimation.&lt;/li>
&lt;li>Includes around &lt;strong>25K images&lt;/strong> containing over &lt;strong>40K people&lt;/strong> with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities.&lt;/li>
&lt;li>Overall the dataset covers &lt;strong>410 human activities&lt;/strong> and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames.&lt;/li>
&lt;/ul>
&lt;h3 id="keypoints">Keypoints&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Id&lt;/th>
&lt;th>Name&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>0&lt;/td>
&lt;td>r ankle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>r knee&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>r hip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>l hip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4&lt;/td>
&lt;td>l knee&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>5&lt;/td>
&lt;td>l ankle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>6&lt;/td>
&lt;td>pelvis&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>7&lt;/td>
&lt;td>thorax&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>8&lt;/td>
&lt;td>upper neck&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>9&lt;/td>
&lt;td>head top&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10&lt;/td>
&lt;td>r wrist&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>11&lt;/td>
&lt;td>r elbow&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>12&lt;/td>
&lt;td>r shoulder&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>13&lt;/td>
&lt;td>l shoulder&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>14&lt;/td>
&lt;td>l elbow&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>15&lt;/td>
&lt;td>l wrist&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/fZbgd1Z10FSRkSsfO21cc3PQlfFIcPs0rCODw12YGKG1-OowzsHg6vy0i7MyeDbpaNgjWiXKAvFr44KnIsDFhdItus9VRl5yrpahpx0gDg7mx7zvhdQmwZtzK0n-fxoHYhQMSy7_.png" alt="img" style="zoom:67%;" />
&lt;h2 id="posetrack">PoseTrack&lt;/h2>
&lt;p>&lt;a href="www.posetrack.net">PoseTrack&lt;/a> is a large-scale benchmark for human pose estimation and tracking in image
sequences. It provides a publicly available training and validation set as well as an evaluation server for benchmarking on a held-out test set.&lt;/p>
&lt;h3 id="tasks">Tasks&lt;/h3>
&lt;h4 id="single-frame-pose-estimation">Single-frame Pose Estimation&lt;/h4>
&lt;ul>
&lt;li>The aim of this task is to perform multi-person human pose estimation in single frames.&lt;/li>
&lt;li>It is similar to the ones covered by existing datasets like &amp;ldquo;MPII Human Pose&amp;rdquo;
and MS COCO Keypoints Challenge.&lt;/li>
&lt;li>Note that this scenario assumes that body poses are estimated independently in each frame.&lt;/li>
&lt;li>&lt;strong>Evaluation:&lt;/strong> The evaluation is performed using standard &lt;strong>mean Average Precision (mAP)&lt;/strong> metric&lt;/li>
&lt;/ul>
&lt;h4 id="articulated-people-tracking">Articulated People Tracking&lt;/h4>
&lt;ul>
&lt;li>This task requires to provide temporally consistent poses for all people visible in the video. This means that in addition to pose estimation of each person, it is also required to track body joints of people.&lt;/li>
&lt;li>&lt;strong>Evaluation:&lt;/strong> The evaluation will include both pose estimation accuracy as well as pose tracking accuracy.
&lt;ul>
&lt;li>The pose estimation accuracy is evaluated using the stand &lt;strong>mAP&lt;/strong> metric&lt;/li>
&lt;li>The evaluation of pose tracking is according to the &lt;strong>&lt;a href="https://cvhci.anthropomatik.kit.edu/images/stories/msmmi/papers/eurasip2008.pdf">CLEAR MOT&lt;/a>&lt;/strong> metrics, the &lt;em>de-facto&lt;/em> standard for evaluation of multi-target tracking.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Trajectory-based measures are also evaluated that count the number of mostly tracked (MT), mostly lost (ML) tracks and the number of times a ground-truth trajectory is fragmented (FM).&lt;/li>
&lt;/ul>
&lt;h3 id="annotations-1">Annotations&lt;/h3>
&lt;ul>
&lt;li>Each person is labeled with a &lt;strong>head bounding box&lt;/strong> and &lt;strong>positions of the body joints&lt;/strong>.&lt;/li>
&lt;li>Omit annotations of people in dense crowds and in some cases also choose to skip annotating people in upright standing poses.&lt;/li>
&lt;li>Ignore regions to specify which people in the image where ignored during annotation.&lt;/li>
&lt;li>Each sequence included in the PoseTrack benchmark correspond to about &lt;strong>5 seconds of video&lt;/strong>. The number of frames in each sequence might vary as different videos were recorded with different number of frames per second (FPS).
&lt;ul>
&lt;li>&lt;strong>Training&lt;/strong> sequences: annotations for 30 consecutive frames centered in the middle of the sequence&lt;/li>
&lt;li>&lt;strong>Validation and test&lt;/strong> sequences: annotate 30 consecutive frames and in addition annotate every 4-th frame of the sequence&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="annotation-format">Annotation Format&lt;/h3>
&lt;p>File format of PoseTrack 2018 is based on the Microsoft COCO dataset annotation format&lt;/p>
&lt;h4 id="json-dictionary-structure">&lt;code>.json&lt;/code> Dictionary Structure&lt;/h4>
&lt;p>At top level, each .json file stores a dictionary with three elements:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>images&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>A list of described images. The list must contain the information for all images referenced by a person description in the file.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each list element is a dictionary and must contain only two fields&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>file_name&lt;/code> : must refer to the original posetrack image&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>id&lt;/code> (unique int)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="err">has_no_densepose:&lt;/span>&lt;span class="kc">true&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">is_labeled:&lt;/span>&lt;span class="kc">true&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">file_name:&lt;/span>&lt;span class="s2">&amp;#34;images/val/000342_mpii_test/000000.jpg&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">nframes:&lt;/span>&lt;span class="mi">100&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">frame_id:&lt;/span>&lt;span class="mi">10003420000&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">vid_id:&lt;/span>&lt;span class="s2">&amp;#34;000342&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">id:&lt;/span>&lt;span class="mi">10003420000&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>annotations&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Another list of dictionaries&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each item of the list describes one detected person and is itself a dictionary. It must have at least the following fields:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>image_id&lt;/code>: int, an image with a corresponding id must be in &lt;code>images&lt;/code>,&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>track_id&lt;/code>&lt;/p>
&lt;ul>
&lt;li>int, the track this person is performing&lt;/li>
&lt;li>unique per frame&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>keypoints&lt;/code>: list of floats, length three times number of estimated keypoints
in order &lt;code>x, y, ?&lt;/code> for every point. (The third value per keypoint is only there for COCO format consistency and not used.)&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Example&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="err">bbox_head:&lt;/span> &lt;span class="p">[]&lt;/span> &lt;span class="err">#&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="err">items&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">keypoints:&lt;/span> &lt;span class="p">[]&lt;/span> &lt;span class="err">#&lt;/span> &lt;span class="mi">51&lt;/span> &lt;span class="err">items&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">track_id:&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">image_id:&lt;/span> &lt;span class="mi">10003420000&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">bbox:&lt;/span> &lt;span class="p">[]&lt;/span> &lt;span class="err">#&lt;/span> &lt;span class="mi">4&lt;/span> &lt;span class="err">items&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">scores:&lt;/span> &lt;span class="p">[]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">category_id:&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">id:&lt;/span> &lt;span class="mi">1000342000000&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>scores&lt;/code>&lt;/p>
&lt;ul>
&lt;li>list of float, length number of estimated keypoints&lt;/li>
&lt;li>each value between 0. and 1. providing a prediction confidence for each keypoint&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>categories&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Must be a list containing precisely one item, describing the person structure&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The dictionary must contain&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>name: person&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>keypoints&lt;/code>: a list of strings which must be a superset of [&lt;code>nose&lt;/code>, &lt;code>upper_neck&lt;/code>, &lt;code>head_top&lt;/code>, &lt;code>left_shoulder&lt;/code>,&lt;code>right_shoulder&lt;/code>, &lt;code>left_elbow&lt;/code>, &lt;code>right_elbow&lt;/code>, &lt;code>left_wrist&lt;/code>, &lt;code>right_wrist&lt;/code>,&lt;code>left_hip&lt;/code>, &lt;code>right_hip&lt;/code>, &lt;code>left_knee&lt;/code>, &lt;code>right_knee&lt;/code>, &lt;code>left_ankle&lt;/code>,&lt;code>right_ankle&lt;/code>]. (The order may be arbitrary.)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="err">supercategory:&lt;/span> &lt;span class="s2">&amp;#34;person&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">id:&lt;/span> &lt;span class="mi">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">name:&lt;/span> &lt;span class="s2">&amp;#34;person&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">keypoints:&lt;/span> &lt;span class="p">[]&lt;/span> &lt;span class="err">#&lt;/span> &lt;span class="mi">17&lt;/span> &lt;span class="err">items&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="err">skeleton:&lt;/span> &lt;span class="p">[]&lt;/span> &lt;span class="err">#&lt;/span> &lt;span class="mi">19&lt;/span> &lt;span class="err">items&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h3 id="keypoints-annotations">Keypoints Annotations&lt;/h3>
&lt;p>Keypoints annotations by PoseTrack are similar to COCO keypoints, except&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>left_eye&lt;/code> and &lt;code>right_eye&lt;/code> are changed to &lt;code>head_bottom&lt;/code> and &lt;code>head_top&lt;/code>, respectively&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Annotations for ears are excluded. (I.e., only &lt;strong>15&lt;/strong> keypoints are annotated)&lt;/p>
&lt;blockquote>
&lt;p>Note: If you look at the annotation closely, there&amp;rsquo;re 51 elements in &lt;code>keypoints&lt;/code> dictionary (3 elements &lt;code>(x, y, v)&lt;/code> for each keypoint). In other words, there&amp;rsquo;re still 17 annotated keypoints.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/PoseTrack_keypoint_annotation.png" alt="PoseTrack_keypoint_annotation">&lt;/p>
&lt;p>To (manually) exclude &lt;code>left_ear&lt;/code> and &lt;code>right_ear&lt;/code>, elements 9 to 14, which correpond to &lt;code>(x, y, v)&lt;/code> of &lt;code>left_ear&lt;/code> and &lt;code>right_ear&lt;/code>, are all set to 0.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-06-08%2012.08.58.png" alt="截屏2021-06-08 12.08.58">&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>id&lt;/th>
&lt;th>COCO Keypoints&lt;/th>
&lt;th>PoseTrack&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>0&lt;/td>
&lt;td>nose&lt;/td>
&lt;td>nose&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>left_eye&lt;/td>
&lt;td>&lt;strong>head_bottom&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>right_eye&lt;/td>
&lt;td>&lt;strong>head_top&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>left_ear&lt;/td>
&lt;td>&lt;strong>&lt;del>left_ear&lt;/del>&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4&lt;/td>
&lt;td>right_ear&lt;/td>
&lt;td>&lt;strong>&lt;del>right_ear&lt;/del>&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>5&lt;/td>
&lt;td>left_shoulder&lt;/td>
&lt;td>left_shoulder&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>6&lt;/td>
&lt;td>right_shoulder&lt;/td>
&lt;td>right_shoulder&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>7&lt;/td>
&lt;td>left_elbow&lt;/td>
&lt;td>left_elbow&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>8&lt;/td>
&lt;td>right_elbow&lt;/td>
&lt;td>right_elbow&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>9&lt;/td>
&lt;td>left_wrist&lt;/td>
&lt;td>left_wrist&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10&lt;/td>
&lt;td>right_wrist&lt;/td>
&lt;td>right_wrist&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>11&lt;/td>
&lt;td>left_hip&lt;/td>
&lt;td>left_hip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>12&lt;/td>
&lt;td>right_hip&lt;/td>
&lt;td>right_hip&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>13&lt;/td>
&lt;td>left_knee&lt;/td>
&lt;td>left_knee&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>14&lt;/td>
&lt;td>right_knee&lt;/td>
&lt;td>right_knee&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>15&lt;/td>
&lt;td>left_ankle&lt;/td>
&lt;td>left_ankle&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>16&lt;/td>
&lt;td>right_ankle&lt;/td>
&lt;td>right_ankle&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Visualization:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/b42-wjZiHyFx6ONjlGPlUmKiFWjdsnJqxW6dg1Bt2OkVnXz6g4Z4fPFxNSaqpT0F9OOGTWO_-aixY7B72hyr6j2dPeqKrmzmQ7tSzBF8H1dZVCabe9L-UWHUTSFrcv5mFxdv0Oee.png" alt="img" style="zoom:60%;"/>&lt;/th>
&lt;th style="text-align:right">&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/PoseTrack_visualization.png" alt="PoseTrack_visualization" style="zoom:60%;"/>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;/table>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>COCO&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://cocodataset.org/#format-data">Data format&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format">Create COCO Annotations From Scratch&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://zhuanlan.zhihu.com/p/29393415">COCO数据集的标注格式&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://posetrack.net/">PoseTrack&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item></channel></rss>