YOLOv4: Train on Custom Dataset

Clone and build Darknet

Clone darknet repo

git clone https://github.com/AlexeyAB/darknet

Change makefile to have GPU and OPENCV enabled

cd darknet
sed -i 's/OPENCV=0/OPENCV=1/' Makefile
sed -i 's/GPU=0/GPU=1/' Makefile
sed -i 's/CUDNN=0/CUDNN=1/' Makefile
sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile

Verify CUDA

/usr/local/cuda/bin/nvcc --version

Compile on Linux using `make`

Make darknet

make

GPU=1 : build with CUDA to accelerate by using GPU
CUDNN=1 : build with cuDNN v5-v7 to accelerate training by using GPU
CUDNN_HALF=1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2x
OPENCV=1 to build with OpenCV 4.x/3.x/2.4.x - allows to detect on video files and video streams from network cameras or web-cams
DEBUG=1 to bould debug version of Yolo
OPENMP=1 to build with OpenMP support to accelerate Yolo by using multi-core CPU

Do not worry about any warnings when running make command.

Prepare custom dataset

The custom dataset should be in YOLOv4 or darknet format:

For each .jpg image file, there should be a corresponding .txt file
- In the same directory, with the same name, but with .txt-extension
  For example, if there’s an .jpg image named BloodImage_00001.jpg, there should also be a corresponding .txt file named BloodImage_00001.txt
In this .txt file: object number and object coordinates on this image, for each object in new line.
Format:
```
<object-class> <x_center> <y_center> <width> <height>
```
- <object-class> : integer object number from 0 to (classes-1)
- <x_center> <y_center> <width> <height> : float values relative to width and height of image, it can be equal from (0.0 to 1.0]
  - <x_center> <y_center> are center of rectangle (are not top-left corner)

Configure files for training

For training cfg/yolov4-custom.cfg download the pre-trained weights-file yolov4.conv.137

cd darknet
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137

In folder ./cfg, create custom config file (let’s call it custom-yolov4-detector.cfg) with the same content as in yolov4-custom.cfg and
- change line batch to batch=64
- change line subdivisions to subdivisions=16
- change line max_batches to classes*2000 but
  - NOT less than number of training images
  - NOT less than number of training images
  - NOT less than 6000
  e.g. max_batches=6000 if you train for 3 classes
- change line steps to 80% and 90% of max_batches (e.g. steps=4800, 5400)
- set network size width=416 height=416 or any value multiple of 32
- change line classes=80 to number of objects in each of 3 [yolo]-layers
- change [filters=255] to $\text{filters}=(\text{classes} + 5) \times 3$ in the 3 [convolutional] before each [yolo] layer, keep in mind that it only has to be the last [convolutional] before each of the [yolo] layers.
  Note: Do not write in the cfg-file: filters=(classes + 5) x 3!!!
  It has to be the specific number!
  E.g. classes=1 then should be filters=18; classes=2 then should be filters=21
  So for example, for 2 objects, your custom config file should differ from yolov4-custom.cfg in such lines in each of 3 [yolo]-layers:
```
[convolutional]
filters=21

[region]
classes=2
```
- when using [Gaussian_yolo] layers, change [filters=57] $\text{filters}=(\text{classes} + 9) \times 3$ in the 3 [convolutional] before each [Gaussian_yolo] layer
Create file obj.names in the directory data/, with objects names - each in new line
Create fiel obj.data in the directory data/, containing (where classes = number of objects):
For example, if we two objects
```
classes = 2
train  = data/train.txt
valid  = data/test.txt
names = data/obj.names
backup = backup/
```
Put image files (.jpg) of your objects in the directory data/obj/
Create train.txt in directory data/ with filenames of your images, each filename in new line, with path relative to darknet.
For example containing:
```
data/obj/img1.jpg
data/obj/img2.jpg
data/obj/img3.jpg
```
Download pre-trained weights for the convolutional layers and put to the directory darknet (root directory of the project)
- for yolov4.cfg, yolov4-custom.cfg (162 MB): yolov4.conv.137
- for yolov4-tiny.cfg, yolov4-tiny-3l.cfg, yolov4-tiny-custom.cfg(19 MB): yolov4-tiny.conv.29
- for csresnext50-panet-spp.cfg (133 MB): csresnext50-panet-spp.conv.112
- for yolov3.cfg, yolov3-spp.cfg (154 MB): darknet53.conv.74
- for yolov3-tiny-prn.cfg , yolov3-tiny.cfg (6 MB): yolov3-tiny.conv.11
- for enet-coco.cfg (EfficientNetB0-Yolov3) (14 MB): enetb0-coco.conv.132

Start training

./darknet detector train data/obj.data custom-yolov4-detector.cfg yolov4.conv.137 -dont_show

file yolo-obj_last.weights will be saved to the backup\ for each 100 iterations
-dont_show: disable Loss-Window, if you train on computer without monitor (e.g remote server)

To see the mAP & loss0chart during training on remote server:

use command ./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -dont_show -mjpeg_port 8090 -map
then open URL http://ip-address:8090 in Chrome/Firefox browser)

After training is complete, you can get weights from backup/

If you want the training to output only main information (e.g loss, mAP, remaining training time) instead of full logging, you can use this command

./darknet detector train data/obj.data custom-yolov4-detector.cfg yolov4.conv.137 -dont_show -map 2>&1 | tee log/train.log | grep -E "hours left|mean_average"

Then the output will look like followings:

 1189: 1.874030, 2.934438 avg loss, 0.002610 rate, 2.930427 seconds, 76096 images, 3.905244 hours left

Notes

If during training you see nan values for avg (loss) field - then training goes wrong! 🤦‍♂️
But if nan is in some other lines - then training goes well.
if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64

Train tiny-YOLO

Do all the same steps as for the full yolo model as described above. With the exception of:

Download file with the first 29-convolutional layers of yolov4-tiny:
```
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29
```
(Or get this file from yolov4-tiny.weights file by using command: ./darknet partial cfg/yolov4-tiny-custom.cfg yolov4-tiny.weights yolov4-tiny.conv.29 29)

Make your custom model yolov4-tiny-obj.cfg based on cfg/yolov4-tiny-custom.cfg instead of yolov4.cfg

import re

# num_classes: number of object classes
max_batches = max(num_classes * 2000, num_train_images, 6000)
steps1 = .8 * max_batches
steps2 = .9 * max_batches
num_filters = (num_classes + 5) * 3

# Assuming that we have already defined the following hyperparameters:
# - TINY_CONFIG_FILE: config file we're gonna use for training
# - WIDTH, HEIGHT: width and height of image
with open("cfg/yolov4-tiny-custom.cfg", "r") as reader, open(TINY_CONFIG_FILE, "w") as writer:
    content = reader.read()

    content = re.sub("subdivisions=\d*", f"subdivisions={SUBDIVISION}", content)
    content = re.sub("width=\d*", f"width={WIDTH}", content)
    content = re.sub("height=\d*", f"height={HEIGHT}", content)
    content = re.sub("max_batches = \d*", f"max_batches = {max_batches}", content)
    content = re.sub("steps=\d*,\d*", f"steps={steps1},{steps2}", content)
    content = re.sub("classes=\d*", f"classes={num_classes}", content)
    content = re.sub("pad=1\nfilters=\d*", f"pad=1\nfilters={num_filters}", content)

    writer.write(content)

Start training:

./darknet detector train data/obj.data yolov4-tiny-obj.cfg yolov4-tiny.conv.29

Google Colab Notebook

Colab Notebook

Small hacks to keep colab notebook training

Open up the inspector view on Chrome
Switch to the console window

Paste the following code

function ClickConnect(){
console.log("Working"); 
document
  .querySelector('#top-toolbar > colab-connect-button')
  .shadowRoot.querySelector('#connect')
  .click() 
}
setInterval(ClickConnect,60000)

and hit Enter.

It will click the screen every 10 minutes so that you don’t get kicked off for being idle!

Convert YOLOv4 to TensorRT through ONNX

To convert YOLOv4 to TensorRT engine through ONNX, I used the code from TensorRT_demos following its step-by-step instructions. For more details about the code, check out this blog post.

Note that the Code in this repo was designed to run on Jetson platforms. In my case, conversion from YOLOv4 to TensorRT engine was conducted on Jetson Nano.

Convert YOLOv4 for custom trained models

To apply the conversion for custom trained models, see TensorRT YOLOv3 For Custom Trained Models. You need to stick to the naming convention {yolo_version}-{custom_name}-{image_size}. Otherwise you’ll get errors during conversion.