COCO JSON Format for Object Detection

The COCO dataset is formatted in JSON and is a collection of “info”, “licenses”, “images”, “annotations”, “categories” (in most cases), and “segment info” (in one case).

{
    "info": {...},
    "licenses": [...],
    "images": [...],
    "annotations": [...],
    "categories": [...], <-- Not in Captions annotations
    "segment_info": [...] <-- Only in Panoptic annotations
}

Note:

  • categories field is NOT in Captions annotations
  • segment_info field is ONLY in Panoptic annotations

Info

The “info” section contains high level information about the dataset. If you are creating your own dataset, you can fill in whatever is appropriate.

Example:

"info": {
    "description": "COCO 2017 Dataset",
    "url": "http://cocodataset.org",
    "version": "1.0",
    "year": 2017,
    "contributor": "COCO Consortium",
    "date_created": "2017/09/01"
}

Lincenses

The “licenses” section contains a list of image licenses that apply to images in the dataset

Example:

"licenses": [
    {
        "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License"
    },
    {
        "url": "http://creativecommons.org/licenses/by-nc/2.0/",
        "id": 2,
        "name": "Attribution-NonCommercial License"
    },
    ...
]

Images

  • Contains the complete list of images in your dataset
  • No labels, bounding boxes, or segmentations specified in this part, it’s simply a list of images and information about each one.
  • coco_url, flickr_url, and date_captured are just for reference. Your deep learning application probably will only need the file_name.
  • Image ids need to be unique (among other images)
  • They do not necessarily need to match the file name (unless the deep learning code you are using makes an assumption that they’ll be the same)

Example:

"images": [
    {
        "license": 4,
        "file_name": "000000397133.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
        "height": 427,
        "width": 640,
        "date_captured": "2013-11-14 17:02:52",
        "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
        "id": 397133
    },
    {
        "license": 1,
        "file_name": "000000037777.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
        "height": 230,
        "width": 352,
        "date_captured": "2013-11-14 20:55:31",
        "flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
        "id": 37777
    },
    ...
]

Annotations

COCO has five annotation types: for object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning. The annotations are stored using JSON.

Object detection

it draws shapes around objects in an image. It has a list of categories and annotations.

Categories

  • Contains a list of categories (e.g. dog, boat)
    • each of those belongs to a supercategory (e.g. animal, vehicle).
  • The original COCO dataset contains 90 categories.
  • You can use the existing COCO categories or create an entirely new list of your own.
  • Each category id must be unique (among the rest of the categories).

Example:

"categories": [
    {
        "supercategory": "person",
        "id": 1,
        "name": "person"
    },
    {
        "supercategory": "vehicle",
        "id": 2,
        "name": "bicycle"
    },
    {
        "supercategory": "vehicle",
        "id": 3,
        "name": "car"
    },
    ...
]

Annotations

  • segmentation : list of points (represented as $(x, y)$ coordinate ) which define the shape of the object

  • area : measured in pixels (e.g. a 10px by 20px box would have an area of 200)

  • iscrowd : specifies whether the segmentation is for a single object (iscrowd=0) or for a group/cluster of objects (iscrowd=1)

  • image_id: corresponds to a specific image in the dataset

  • bbox : bounding box, format is [top left x position, top left y position, width, height]

  • category_id: corresponds to a single category specified in the categories section

  • id: Each annotation also has an id (unique to all other annotations in the dataset)

Example:

"annotations": [
    {
        "segmentation": [[510.66,423.01,511.72,420.03,...,510.45,423.01]],
        "area": 702.1057499999998,
        "iscrowd": 0,
        "image_id": 289343,
        "bbox": [473.07,395.93,38.65,28.67],
        "category_id": 18,
        "id": 1768
    },
    ...
]
  • Has a segmentation list of vertices (x, y pixel positions)
  • Has an area of 702 pixels (pretty small) and a bounding box of [473.07,395.93,38.65,28.67]
  • Is not a crowd (meaning it’s a single object)
  • Is category id of 18 (which is a dog)
  • Corresponds with an image with id 289343 (which is a person on a strange bicycle and a tiny dog)

Example

Source: https://roboflow.com/formats/coco-json

{
    "info": {
        "year": "2020",
        "version": "1",
        "description": "Exported from roboflow.ai",
        "contributor": "Roboflow",
        "url": "https://app.roboflow.ai/datasets/hard-hat-sample/1",
        "date_created": "2000-01-01T00:00:00+00:00"
    },
    "licenses": [
        {
            "id": 1,
            "url": "https://creativecommons.org/publicdomain/zero/1.0/",
            "name": "Public Domain"
        }
    ],
    "categories": [
        {
            "id": 0,
            "name": "Workers",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "head",
            "supercategory": "Workers"
        },
        {
            "id": 2,
            "name": "helmet",
            "supercategory": "Workers"
        },
        {
            "id": 3,
            "name": "person",
            "supercategory": "Workers"
        }
    ],
    "images": [
        {
            "id": 0,
            "license": 1,
            "file_name": "0001.jpg",
            "height": 275,
            "width": 490,
            "date_captured": "2020-07-20T19:39:26+00:00"
        }
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 2,
            "bbox": [
                45,
                2,
                85,
                85
            ],
            "area": 7225,
            "segmentation": [],
            "iscrowd": 0
        },
        {
            "id": 1,
            "image_id": 0,
            "category_id": 2,
            "bbox": [
                324,
                29,
                72,
                81
            ],
            "area": 5832,
            "segmentation": [],
            "iscrowd": 0
        }
    ]
}

Reference

Previous
Next