<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>YOLO | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/yolo/</link><atom:link href="https://haobin-tan.netlify.app/tags/yolo/index.xml" rel="self" type="application/rss+xml"/><description>YOLO</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 02 Dec 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>YOLO</title><link>https://haobin-tan.netlify.app/tags/yolo/</link></image><item><title>You Only Look Once (YOLO)</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/object-detection/yolo/</link><pubDate>Wed, 04 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/object-detection/yolo/</guid><description>&lt;p>The problem of sliding windows method is that it does not output the most accuracte bounding boxes. A good way to get this output more accurate bounding boxes is with the &lt;strong>YOLO (You Only Look Once)&lt;/strong> algorithm.&lt;/p>
&lt;h2 id="overview-how-does-yolo-work">Overview: How does YOLO work?&lt;/h2>
&lt;p>Let&amp;rsquo;s say we have an input image (e.g. at 100x100), we&amp;rsquo;re going to place down a grid on this image. For the purpose of simplicity and illustration, we&amp;rsquo;re going to use a 3x3 grid as example.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2011.39.03.png" alt="截屏2020-11-05 11.39.03" style="zoom:80%;" />
&lt;p>(In an actual implementation, we&amp;rsquo;ll use a finer one, like 19x19 grid)&lt;/p>
&lt;h3 id="labels-for-training">Labels for training&lt;/h3>
&lt;p>For &lt;strong>each&lt;/strong> grid cell, we specify a target label $\mathbf{y}$:
&lt;/p>
$$
\mathbf{y} = \left(
\begin{array}{c}
P\_c \\\\
b\_x \\\\
b\_y \\\\
b\_h \\\\
b\_w \\\\
c\_1 \\\\
c\_2 \\\\
\vdots \\\\
c\_n
\end{array}
\right)
\in \mathbb{R}^{5 + n}
$$
&lt;ul>
&lt;li>
&lt;p>$P\_c$: objectness&lt;/p>
&lt;ul>
&lt;li>depends on whether there&amp;rsquo;s an object in that grid cell.&lt;/li>
&lt;li>If yes, then $P\_c = 1$. else $P\_c=0$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Bounding box coordinates&lt;/p>
&lt;ul>
&lt;li>$b\_x, b\_y \in (0, 1)$: describe the center point of the object &lt;strong>relative&lt;/strong> to the grid cell
&lt;ul>
&lt;li>If $>1$, then the center point is outside of the current grid cell and it should be assigned to another grid cell&lt;/li>
&lt;li>Some parameterizations also use Sigmoid function to ensure $b\_x, b\_y \in (0, 1)$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>$b\_h, b\_w$: height and width of the bounding box,
&lt;ul>
&lt;li>specified as a fraction of the overall width of the grid cell (can be $\geq 1$)&lt;/li>
&lt;li>Some parameterizations also use exponential function to ensure non-negativity&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>$c\_1, c\_2, \dots, c\_n$: object classes probabilities we want to detect&lt;/p>
&lt;ul>
&lt;li>
&lt;p>E.g. we want to detect 3 classes of object:&lt;/p>
&lt;ul>
&lt;li>pedestrian ($c\_1$),&lt;/li>
&lt;li>car ($c\_2$),&lt;/li>
&lt;li>motorcycle ($c\_3$),&lt;/li>
&lt;/ul>
&lt;p>so our target $\mathbf{y}$ will be:
&lt;/p>
$$
\mathbf{y} = \left(
\begin{array}{c}
P\_c \\\\
b\_x \\\\
b\_y \\\\
b\_h \\\\
b\_w \\\\
c\_1 \\\\
c\_2 \\\\
c\_3
\end{array}
\right)
\in \mathbb{R}^{8}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="example">Example&lt;/h4>
&lt;p>If we consider the upper left grid cell (at position $(0, 0)$)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2011.58.40.png" alt="截屏2020-11-05 11.58.40" style="zoom:80%;" />
&lt;p>There&amp;rsquo;s no object in this grid cell, so $P\_c = 0$, and we don&amp;rsquo;t have to care for the rest elements of $\mathbf{y}$:
&lt;/p>
$$
\mathbf{y} = \left(
\begin{array}{c}
0 \\\\
? \\\\
? \\\\
? \\\\
? \\\\
? \\\\
? \\\\
?
\end{array}
\right)
\in \mathbb{R}^{8}
$$
&lt;blockquote>
&lt;p>Here we use the symbol &lt;code>?&lt;/code>​ to mark &amp;ldquo;don&amp;rsquo;t care&amp;rdquo;.&lt;/p>
&lt;p>However, neural network can&amp;rsquo;t output a question mark, can&amp;rsquo;t output a &amp;ldquo;don&amp;rsquo;t care&amp;rdquo;. So wes&amp;rsquo;ll put some numbers for the rest. But these numbers will basically be ignored because the neural network is telling you that there&amp;rsquo;s no object there. So it doesn&amp;rsquo;t really matter whether the output is a bounding box or there&amp;rsquo;s is a car. So basically just be some set of numbers, more or less noise.&lt;/p>
&lt;/blockquote>
&lt;p>Now, how about the grid cells in the second row?&lt;/p>
&lt;p>To give a bit more detail, this image has two objects. And what the YOLO algorithm does is&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>it takes the midpoint of reach of the two objects and then assigns the object to the grid cell containing the midpoint.&lt;/strong> So the left car is assigned to the left grid cell marked with green; and the car on the rightis assigned to the grid cell marked with yellow.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2012.18.07.png" alt="截屏2020-11-05 12.18.07" style="zoom:80%;" />
&lt;ul>
&lt;li>For the left grid cell marked with green, the target label $\mathbf{y}$ would be as follows:
$$
\mathbf{y} = \left(
\begin{array}{c}
1 \\\\
b\_x \\\\
b\_y \\\\
b\_h \\\\
b\_w \\\\
0 \\\\
1 \\\\
0
\end{array}
\right)
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Even though the central grid cell has some parts of both cars, we&amp;rsquo;ll pretend the central grid cell has &lt;strong>no&lt;/strong> interesting object. So the class label of the central grid cell is
&lt;/p>
$$
\mathbf{y} = \left(
\begin{array}{c}
0 \\\\
? \\\\
? \\\\
? \\\\
? \\\\
? \\\\
? \\\\
?
\end{array}
\right)
$$
&lt;/li>
&lt;/ul>
&lt;p>For each of these 9 grid cells, we end up with a 8 dimensional output vector. So the total target output volume is $(3 \times 3) \times 8$.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/yolo1.png" alt="yolo1">&lt;/p>
&lt;p>&lt;strong>Generally speaking, assuming that we have $n \times n$ grid cells, and we want to detect $C$ classes of objects, then the target output volume will be $(n \times n) \times (5 + C)$.&lt;/strong>&lt;/p>
&lt;h3 id="training">Training&lt;/h3>
&lt;p>To train our neural network, the input is $100 \times 100 \times 3$. And then we have a usual convnet with conv, layers of max pool layers, and so on. So that in the end, this eventually maps to a $3 \times 3 \times 8$ output volume. And so what we do is we have an input $X$ which is the input image like that, and we have these target labels $\mathbf{y}$ which are $3 \times 3 \times 8$, and we use backpropagation to train the neural network to map from any input $X$ to this type of output volume $\mathbf{y}$.&lt;/p>
&lt;h3 id="thumbsup-advantages">&amp;#x1f44d; Advantages&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>The neural network outputs precise bounding boxes &amp;#x1f44f;&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Effeicient and fast thanks to convolution operations &amp;#x1f44f;&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="intersection-over-union-iou">Intersection over Union (IoU)&lt;/h2>
&lt;p>How can we tell whether our object detection algorithm is working well?&lt;/p>
&lt;p>The &lt;strong>Intersection-over-Union (IoU)&lt;/strong>, aka Jaccard Index or Jaccard Overlap, measure the degree or extent to which two boxes overlap.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/IoU.jpg">&lt;figcaption>
&lt;h4>Intersection over Union (IoU). Src: [a-PyTorch-Tutorial-to-Object-Detection](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection)&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;p>In object detection:
&lt;/p>
$$
\text{IoU} = \frac{\text{Overlapping region between ground truth and prediction bounding box}}{\text{Combined region of ground truth and prediction bounding box}}
$$
&lt;p>
If $\text{IoU} \geq \text{threshold}$, we would say the prediction is correct.&lt;/p>
&lt;p>By convention, $\text{threshold} = 0.5$. We can also chosse other value greater than 0.5.&lt;/p>
&lt;p>Example:&lt;/p>
&lt;figure>&lt;img src="https://media5.datahacker.rs/2018/11/IoU.png">&lt;figcaption>
&lt;h4>IoU example. Src: [026 CNN Intersection over Union | Master Data Science](https://www.google.com/url?sa=i&amp;amp;url=http%3A%2F%2Fdatahacker.rs%2Fdeep-learning-intersection-over-union%2F&amp;amp;psig=AOvVaw2K4pvRAkwPw3FZYIelxngf&amp;amp;ust=1604671149058000&amp;amp;source=images&amp;amp;cd=vfe&amp;amp;ved=0CA0QjhxqFwoTCIjNgoLI6-wCFQAAAAAdAAAAABAg)&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="non-max-suppresion">Non-max suppresion&lt;/h2>
&lt;p>One of the problems we have addressed in YOLO is that it can detect the same object multiple times.&lt;/p>
&lt;p>For example:&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/non-max-suppression.png">&lt;figcaption>
&lt;h4>Each car has two or more detections with different probabilities. The reason is that some of the grids that thinks that they contain the center point of the object. Src: [a-PyTorch-Tutorial-to-Object-Detection](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection)&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;p>&lt;strong>Non-max Suppression&lt;/strong> is a way to make sure that YOLO detects the object just once. It cleans up redundant detections. So they end up with just one detection per object, rather than multiple detections per object.&lt;/p>
&lt;ol>
&lt;li>Takes the detection with the largest $P\_c$ (the probability of a detection) &lt;em>(&amp;ldquo;That&amp;rsquo;s my most confident detection, so let&amp;rsquo;s highlight that and just say I found the car there.&amp;rdquo;)&lt;/em>&lt;/li>
&lt;li>Looks at all of the remaining rectangles and all the ones with a high overlap (i.e. with a high IOU), just suppress/darken/discard them&lt;/li>
&lt;/ol>
&lt;p>Example:&lt;/p>
&lt;figure>&lt;img src="https://www.jeremyjordan.me/content/images/2018/07/Screen-Shot-2018-07-10-at-9.46.29-PM.png">&lt;figcaption>
&lt;h4>Non-max suppression example. Src: [An overview of object detection: one-stage methods.](https://www.jeremyjordan.me/object-detection-one-stage/)&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;p>For multi-class detection, non-max suppression should be carried out &lt;strong>on each class separately&lt;/strong>.&lt;/p>
&lt;h2 id="anchor-box">Anchor box&lt;/h2>
&lt;p>One of the problems with object detection as we have seen it so far is that &lt;strong>each of the grid cells can detect only one object&lt;/strong>. What if a grid cell wants to detect multiple objects?&lt;/p>
&lt;p>For example: we want to detect 3 classes (pedestrians, cars, motorcycles), and our input image looks like this:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-05%2015.24.05.png" alt="截屏2020-11-05 15.24.05">&lt;/p>
&lt;p>The midpoint of the pedestrian and the midpoint of the car are in almost the same place and both of them fall into the same grid cell. If the output vector
&lt;/p>
$$
\mathbf{y} = \left(
\begin{array}{c}
P\_c \\\\
b\_x \\\\
b\_y \\\\
b\_h \\\\
b\_w \\\\
c\_1 \\\\
c\_2 \\\\
c\_3
\end{array}
\right)
$$
&lt;p>
we have seen before, it won&amp;rsquo;t be able to output two detections &amp;#x1f622;.&lt;/p>
&lt;p>With the idea of &lt;strong>anchor boxes&lt;/strong>, we are going to&lt;/p>
&lt;ul>
&lt;li>pre-defne a number of different shapes of anchor boxes (in this example, just 2)&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/anchor-box.png" alt="anchor-box">&lt;/p>
&lt;ul>
&lt;li>
&lt;p>and associate them in the class labels
&lt;/p>
$$
\mathbf{y} = \left(\underbrace{P\_c, b\_x, b\_y, b\_h, b\_w, c\_1, c\_2, c\_3}\_{\text{anchor box 1}} , \underbrace{P\_c, b\_x, b\_y, b\_h, b\_w, c\_1, c\_2, c\_3}\_{\text{anchor box 2}}\right)^T \in \mathbb{R}^{16}
$$
&lt;ul>
&lt;li>Because the shape of the pedestrian is more similar to the shape of anchor box 1 than anchor box 2, we can use the first eight numbers to encode pedestrian.&lt;/li>
&lt;li>Because the box around the car is more similar to the shape of anchor box 2 than anchor box 1, we can then use the second 8 numbers to encode that the second object here is the car&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>To summarise, with a number of pre-defined anchor boxes: Each object in training image is assigned to&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>the grid cell that contains object&amp;rsquo;s midpoint and&lt;/strong>&lt;/li>
&lt;li>&lt;strong>anchor box for the grid cell with the highest IoU with the ground truth bounding box&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>In other words, now the object is assigned to a $(\text{grid cell}, \text{anchor box})$ pair.&lt;/strong>&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>If&lt;/p>
&lt;ul>
&lt;li>we have pre-defined $B$ different size of bounding boxes&lt;/li>
&lt;li>the size of input image is $n \times n$&lt;/li>
&lt;li>we want to detect $C$ classes of objects&lt;/li>
&lt;/ul>
&lt;p>Then the output volume will be
&lt;/p>
$$
(n \times n) \times B(5 + C)
$$
&lt;/span>
&lt;/div>
&lt;h3 id="how-to-choose-the-anchor-boxes">How to choose the anchor boxes?&lt;/h3>
&lt;ul>
&lt;li>People used to just choose them &lt;strong>by hand&lt;/strong> or choose maybe 5 or 10 anchor box shapes that spans a variety of shapes that seems to cover the types of objects to detect&lt;/li>
&lt;li>A better way to do this is to use a &lt;strong>K-means&lt;/strong> algorithm, to group together two types of objects shapes we tend to get. (in the later YOLO research paper)&lt;/li>
&lt;/ul>
&lt;h2 id="putting-them-all-together">Putting them all together&lt;/h2>
&lt;p>Suppose we&amp;rsquo;re trying to train a model to detect three classes of objects:&lt;/p>
&lt;ul>
&lt;li>pedestrians&lt;/li>
&lt;li>cars&lt;/li>
&lt;li>motorcycles&lt;/li>
&lt;/ul>
&lt;p>And the input image looks like this:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2016.07.04.png" alt="截屏2020-11-05 16.07.04" style="zoom:80%;" />
&lt;p>Suppose we have pre-defined two different sizes of bounding boxes&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/anchor-box.png" alt="anchor-box">&lt;/p>
&lt;p>Anchor box 2 has a higher IoU with the ground truth bounding box of the car, then:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/YOLO.png" alt="YOLO" style="zoom:80%;" />
&lt;p>The final output volume is $3 \times 3 \times 2 \times 8$&lt;/p>
&lt;h3 id="making-predictions">Making predictions&lt;/h3>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-11-05%2016.13.12.png" alt="截屏2020-11-05 16.13.12">&lt;/p>
&lt;h3 id="outputing-the-non-max-supressed-outputs">Outputing the non-max supressed outputs&lt;/h3>
&lt;p>Let&amp;rsquo;s look at an new input image,&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2016.17.32-20201105162152696.png" alt="截屏2020-11-05 16.17.32" style="zoom:67%;" />
&lt;p>and suppose that we still use 2 pre-defined anthor boxes for detecting pedestrians, cars, and motorcycles.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>For each grid cell, get 2 predicted bounding boxes. Notice that some of the bounding boxes can go outside the height and width of the grid cell that they came from&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2016.17.41.png" alt="截屏2020-11-05 16.17.41" style="zoom:67%;" />
&lt;/li>
&lt;li>
&lt;p>Get rid of low probability predictions&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2016.18.48.png" alt="截屏2020-11-05 16.18.48" style="zoom: 67%;" />
&lt;/li>
&lt;li>
&lt;p>For each class, use non-max suppression to generate final predictions. And so the output of this is hopefully that we will have detected all the cars and all the pedestrians in this image.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-11-05%2016.24.29-20201105162519547.png" alt="截屏2020-11-05 16.24.29" style="zoom:67%;" />
&lt;/li>
&lt;/ol>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://www.coursera.org/learn/convolutional-neural-networks">Convolutional Neural Network, &lt;em>Andrew Ng&lt;/em>&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://www.jeremyjordan.me/object-detection-one-stage/">An overview of object detection: one-stage methods.&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Annotation Conversion: COCO JSON to YOLO Txt</title><link>https://haobin-tan.netlify.app/docs/ai/computer-vision/object-detection/coco-json-to-yolo-txt/</link><pubDate>Wed, 02 Dec 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/computer-vision/object-detection/coco-json-to-yolo-txt/</guid><description>&lt;h2 id="bounding-box-formats-comparison-and-conversion">Bounding box formats comparison and conversion&lt;/h2>
&lt;p>In COCO Json, the format of bounding box is:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="s2">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="err">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">&amp;lt;absolute_x_top_left&amp;gt;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">&amp;lt;absolute_y_top_left&amp;gt;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">&amp;lt;absolute_width&amp;gt;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="err">&amp;lt;absolute_height&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>However, the annotation is different in YOLO. For each &lt;code>.jpg&lt;/code> image, there&amp;rsquo;s a &lt;code>.txt&lt;/code> file (in the same directory and with the same name, but with &lt;code>.txt&lt;/code>-extension). This &lt;code>.txt&lt;/code> file holds the objects and their bounding boxes in this image (one line for each object), in the following format &lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;lt;object-class&amp;gt; &amp;lt;relative_x_center&amp;gt; &amp;lt;relative_y_center&amp;gt; &amp;lt;relative_width&amp;gt; &amp;lt;relative_height&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>&lt;code>&amp;lt;object-class&amp;gt;&lt;/code> : integer number of object from &lt;strong>&lt;code>0&lt;/code> to &lt;code>(classes-1)&lt;/code>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>&amp;lt;relative_x_center&amp;gt; &amp;lt;relative_y_center&amp;gt; &amp;lt;relative_width&amp;gt; &amp;lt;relative_height&amp;gt;&lt;/code>&lt;/p>
&lt;p>float values relative to width and height of image (equal from (0.0 to 1.0])&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>For example, for &lt;code>img1.jpg&lt;/code> there should be &lt;code>img1.txt&lt;/code> containing something looks like followings:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">1 0.716797 0.395833 0.216406 0.147222
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">0 0.687109 0.379167 0.255469 0.158333
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2 0.420312 0.395833 0.140625 0.166667
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The following figure illustrates the difference of bounding box annotation between COCO and YOLO:&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/annotation-convertion-COCO-and-YOLO.png">&lt;figcaption>
&lt;h4>Bounding box format: COCO vs YOLO&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;p>Convert the bounding box annotation format from COCO to YOLO:
&lt;/p>
$$
\begin{array}{ll}
x\_{yolo} &amp;= (x\_{coco} + \frac{w\_{coco}}{2}) / w\_{img} \\\\
y\_{yolo} &amp;= (y\_{coco} + \frac{h\_{coco}}{2}) / h\_{img} \\\\
w\_{yolo} &amp;= w\_{coco} / w\_{img} \\\\
h\_{yolo} &amp;= h\_{coco} / h\_{img}
\end{array}
$$
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">convert_bbox_coco2yolo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img_width&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">img_height&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bbox&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Convert bounding box from COCO format to YOLO format
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Parameters
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> ----------
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> img_width : int
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> width of image
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> img_height : int
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> height of image
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> bbox : list[int]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> bounding box annotation in COCO format:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> [top left x position, top left y position, width, height]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> Returns
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> -------
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> list[float]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> bounding box annotation in YOLO format:
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> [x_center_rel, y_center_rel, width_rel, height_rel]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># YOLO bounding box format: [x_center, y_center, width, height]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># (float values relative to width and height of image)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x_tl&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y_tl&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">h&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">bbox&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dw&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">1.0&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">img_width&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">dh&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mf">1.0&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">img_height&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x_center&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">x_tl&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">w&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mf">2.0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">y_center&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">y_tl&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">h&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mf">2.0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">x_center&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">dw&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">y_center&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">dh&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">w&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">w&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">dw&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">h&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">h&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">dh&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">h&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="convert-coco-json-to-yolo-txt">Convert COCO JSON to YOLO txt&lt;/h2>
&lt;p>The structure of training set in COCO format is:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">- train
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- _annotations.coco.json
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_001.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_002.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_003.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>_annotations.coco.json&lt;/code> contains all information about the dataset, images, and annotations. (More see: &lt;a href="https://haobin-tan.netlify.app/docs/ai/computer-vision/object-detection/coco-dataset-format/">COCO JSON Format for Object Detection&lt;/a>)&lt;/p>
&lt;p>The structure of training set in YOLO format is:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">- train
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- _darknet.labels
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_001.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_001.txt
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_002.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_002.txt
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_003.jpg
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- img_003.txt
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>_darknet.labels&lt;/code> contains objects names, each in new line&lt;/li>
&lt;li>For each &lt;code>.jpg&lt;/code> image there&amp;rsquo;s a corresponding &lt;code>.txt&lt;/code> file with the same name&lt;/li>
&lt;/ul>
&lt;p>Now we create &lt;code>.txt&lt;/code> file for each image based on &lt;code>_annotations.coco.json&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">os&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">json&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">tqdm&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">tqdm&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">shutil&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">make_folders&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;output&amp;#34;&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">exists&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">shutil&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">rmtree&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">makedirs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">path&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">convert_coco_json_to_yolo_txt&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">output_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">json_file&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">path&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">make_folders&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">output_path&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json_file&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">f&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">json_data&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">json&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">load&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">f&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># write _darknet.labels, which holds names of all classes (one class per line)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">label_file&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">join&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">output_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;_darknet.labels&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">label_file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;w&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">f&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">category&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">tqdm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json_data&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;categories&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">desc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;Categories&amp;#34;&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">category_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">category&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">f&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">category_name&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">image&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">tqdm&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">json_data&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;images&amp;#34;&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">desc&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;Annotation txt for each iamge&amp;#34;&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">image&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">image&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;file_name&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_width&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">image&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;width&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">img_height&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">image&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;height&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">anno_in_image&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">anno&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">anno&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">json_data&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;annotations&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="k">if&lt;/span> &lt;span class="n">anno&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;image_id&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">img_id&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">anno_txt&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">os&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">path&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">join&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">output_path&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">img_name&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">split&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;.&amp;#34;&lt;/span>&lt;span class="p">)[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="s2">&amp;#34;.txt&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="nb">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">anno_txt&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;w&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">f&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">anno&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">anno_in_image&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">category&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">anno&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;category_id&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">bbox_COCO&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">anno&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">x&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">w&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">h&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">convert_bbox_coco2yolo&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">img_width&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">img_height&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bbox_COCO&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">f&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">write&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">category&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">x&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.6f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.6f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">w&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.6f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">h&lt;/span>&lt;span class="si">:&lt;/span>&lt;span class="s2">.6f&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="se">\n&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;Converting COCO Json to YOLO txt finished!&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="example">Example&lt;/h3>
&lt;p>Assuming we have a COCO Json file &lt;code>_annotations.coco.json&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;info&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;year&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2020&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;version&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;description&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Exported from roboflow.ai&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;contributor&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Roboflow&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;url&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;https://app.roboflow.ai/datasets/hard-hat-sample/1&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;date_created&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2000-01-01T00:00:00+00:00&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;licenses&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;url&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;https://creativecommons.org/publicdomain/zero/1.0/&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Public Domain&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;categories&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Workers&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;none&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;head&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Workers&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;helmet&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Workers&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">3&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;person&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;supercategory&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;Workers&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;images&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;license&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;file_name&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;0001.jpg&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;height&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">275&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;width&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">490&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;date_captured&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;2020-07-20T19:39:26+00:00&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;annotations&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;image_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;category_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">45&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">85&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">85&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;area&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">7225&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;segmentation&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;iscrowd&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;image_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;category_id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;bbox&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">324&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">29&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">72&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="mi">81&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;area&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">5832&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;segmentation&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="p">[],&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;iscrowd&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">0&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">convert_coco_json_to_yolo_txt&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s2">&amp;#34;output&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s2">&amp;#34;_annotations.coco.json&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Categories: 100%|██████████| 4/4 [00:00&amp;lt;00:00, 2471.24it/s]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Annotation txt for each iamge: 100%|██████████| 1/1 [00:00&amp;lt;00:00, 1800.13it/s]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Converting COCO Json to YOLO txt finished!
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>An folder named &lt;code>output&lt;/code> is created and has the structure:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">- output
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- 0001.txt
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> |- _darknet.labels
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Content of &lt;code>_darknet.labels&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Workers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">head
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">helmet
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">person
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Content of &lt;code>0001.txt&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">2 0.178571 0.161818 0.173469 0.309091
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2 0.734694 0.252727 0.146939 0.294545
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Instruction from YOLO v4 repo: &lt;a href="https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects">https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/AlexeyAB/Yolo_mark/issues/60#issuecomment-401854885">Specific format of annotation&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://www.cnblogs.com/hejunlin1992/p/9925293.html">darknet训练yolov3时的一些注意事项&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://manivannan-ai.medium.com/how-to-train-yolov2-to-detect-custom-objects-9010df784f36">How to train YOLOv2 to detect custom objects&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://roboflow.com/formats">Computer Vision Annotation Formats&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>Reference: &lt;a href="https://github.com/AlexeyAB/Yolo_mark/issues/60">https://github.com/AlexeyAB/Yolo_mark/issues/60&lt;/a>&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item></channel></rss>