Object Detection

Object detection combines classification and localization to identify objects in an image and specify their locations through bounding boxes. JustDeepIt supports multiple well-known deep neural network architectures, such as Faster R-CNN[1], YOLOv3[2], SSD[3], RetinaNet[4], and FCOS[5], to build object detection models for training and inference. The inference results can be stored as images with bounding boxes or a JSON file in the COCO format. The following image is an example of wheat head detection results with Faster R-CNN using GWHD datasset[6].

../_images/tutorials_GWHD_inference_output.jpg

GUI

The GUI window for object detection consists of three tabs: Preferences, Training, and Inference. These tabs are used for setting common parameters, training models, and inference (i.e., detecting objects) from the test images using the trained model, respectively. Tabs Training and Inference are disabled until the settings in tab Preferences are defined.

Preferences

Tab Preferences is used for setting common parameters, such as the architecture of detection model, number of CPUs and GPUs to be used, and the location (i.e., directory path) to a workspace which is used to save intermediate and final results. The required fields are highlighted with orange borders. Detailed descriptions of the arguments are provided in the following table.

../_images/app_od_pref.png

Argument

Description

backend

The backend to build an object detection model. The current version of JustDeepIt only supports MMDetection as a backend.

architecture

Architecture of object detection model. If custom is specified, user can use the customized configuration to generate a model.

config

A path to a configuration file of MMDetection or Detectron2. This field will be activated when custom is specified in architecture.

class label

A path to a text file which contains class labels. The file should be multiple rows with one column, and string in each row represents a class label (e.g., class_label.txt).

CPU

Number of CPUs.

GPU

Number of GPUs.

workspace

Workspace to store intermediate and final results.

Training

Tab Training is used to train the model for object detection. It allows users to set general parameters of training, such as the optimization algorithm, optimization scheduler, batch size, and number of epochs. Detailed descriptions of the arguments are provided in the following table.

../_images/app_od_train.png

Argument

Description

model weight

A path to store the model weight. If the file is exists, then resume training from the given weight.

training dataset

Information for training dataset. A path to a folder which contains training images; annotation format (COCO, Pascal VOC); and a path to an annotation file (COCO format) or a folder (Pascal VOC format).

validation dataset

Information for validation dataset. Left blank if no validation dataset.

test dataset

Information for test dataset. Left blank if no test dataset.

optimizer

A optimizer for model training. The supported optimizers can be checked from the MMDetection website.

scheduler

A scheduler for model training. The supported schedulers can be checked from the MMDetection website.

batch size

Batch size.

epochs

Number of epochs.

cutoff

Cutoff of confidence score for training.

Inference

Tab Inference is used for detecting objects from test images using the trained model. It allows users to set the confidence score of object detection results and batch size.

../_images/app_od_eval.png

Argument

Description

model weight

A path to a trained model weight.

image folder

A path to a folder contained test images.

batch size

Batch size.

cutoff

Cutoff of confidence score for inference (i.e., object detection).

CUI

JustDeepIt implements three simple methods, train, save, and inference. train is used for training the models, while save is used for saving the trained weight, and inference is used for detecting objects in test images. Detailed descriptions of these functions are provided below.

Architectures

A neural network architecture for object detection can be initialized with class justdeepit.models.OD. For example, Faster R-CNN can be initialized by executing the following code.

from justdeepit.models import OD

model = OD('./class_label.txt', model_arch='fasterrcnn')

To initialize Faster R-CNN with the pre-trained weight (e.g. the weight pre-trained with COCO dataset), the argument model_weight can be used. Note that, the weight file (.pth) pre-trained with COCO dataset can be downloaded from the GitHub repositories of MMDetection or Detectron2.

from justdeepit.models import OD

weight_fpath = '/path/to/pretrained_weight.pth'
model = OD('./class_label.txt', model_arch='fasterrcnn', model_weight=weight_fpath)

Training

Method train is used for the model training and requires at least two arguments to specify a folder containing the training images and annotations. Annotations can be specified in a single file in the COCO format or a folder containing multiple files in the Pascal VOC format. Refer to the API documentation of train for detailed usage.

Training a model with annotation in COCO format.

from justdeepit.models import OD

train_dataset = {
    'images': '/path/to/folder/images',
    'annotations': '/path/to/coco/annotation.json',
    'annotation_format': 'coco'
}

model = OD('./class_label.txt', model_arch='fasterrcnn')
model.train(train_dataset)

Training a model with annotation in Pascal VOC (xml) format.

from justdeepit.models import OD

train_dataset = {
    'images': '/path/to/folder/images',
    'annotation' = '/path/to/folder/voc',
    'annotation_format': 'voc'
}

model = OD('./class_label.txt', model_arch='fasterrcnn')
model.train(train_dataset)

The trained weight can be saved using method save, which simultaneously stores the trained weight (.pth) and model configuration file (.py). Refer to the API documentation of save for detailed usage.

model.save('trained_weight.pth')

Inference

Method inference is used to detect objects in the test images using the trained model. This method requires at least one argument to specify a single image, list of images, or a folder containing multiple images. The detection results are returned as a class object of justdeepit.utils.ImageAnnotations.

To save the results in the COCO format, we can use method format implemented in class justdeepit.utils.ImageAnnotations to generate a JSON file in the COCO format.

from justdeepit.models import OD

test_images = ['sample1.jpg', 'sample2.jpg', 'sample3.jpg']

model = OD('./class_label.txt', model_arch='fasterrcnn', model_weight='trained_weight.pth')
outputs = model.inference(test_images)

outputs.format('coco', './predicted_outputs.coco.json')

To save the detection results as images, for example, showing the detected bounding boxes on the images, method draw implemented in class justdeepit.utils.ImageAnnotation can be used.

for output in outputs:
    output.draw('bbox', os.path.join('./predicted_outputs', os.path.basename(output.image_path)))

Refer to the corresponding API documentation of inference, format, and draw, for the detailed usage.

References