Automatic Bird Identification: Part II

Tags: birds python OpenCV machine learning

In this post I'll be concentrating on creating the lst and rec files needed for model training using the training data that we generated in Part I of this series.

The model will use the mxnet framework which reads rec files as training data. First we'll create the lst files guided by a config file that we can define now:

The Config File

from os import path

# top level path to our bird related directories
BASE_PATH = "/ML/Birds"

# based on the base path, derive the images path
IMAGES_PATH = path.sep.join([BASE_PATH, "DATA"])

LABELS_PATH = "/bird_ml/dataset.csv"

TRAIN_MX_LIST = path.sep.join([MX_OUTPUT, "lists/train.lst"])
VAL_MX_LIST = path.sep.join([MX_OUTPUT, "lists/val.lst"])
TEST_MX_LIST = path.sep.join([MX_OUTPUT, "lists/test.lst"])

TRAIN_MX_REC = path.sep.join([MX_OUTPUT, "rec/train.rec"])
VAL_MX_REC = path.sep.join([MX_OUTPUT, "rec/val.rec"])
TEST_MX_REC = path.sep.join([MX_OUTPUT, "rec/test.rec"])

# Label encoder used to convert species names to integer categories
LABEL_ENCODER_PATH = path.sep.join([BASE_PATH, "output/le.cpickle"])

# define the RGB means from the ImageNet dataset
R_MEAN = 123.68
G_MEAN = 116.779
B_MEAN = 103.939

# define the percentage of validation and testing images relative
# to the number of training images

# define the batch size

The RBG means are taken from the original Imagenet paper by Simonyan and Zisserman. That way we can fine tune an existing Imagenet trained model rather than training our model entirely from scratch.

This first model will only attempt to classify 2 classes since we don't have much training data yet and we only want to make sure our data processing pipeline is functioning properly. We're not expecting a high quality model just yet.

Building the Record File Dataset

While our image dataset is already created, our model actually needs to consume rec files rather than raw image files so creating those will be our next step. The rec files are essentially just images compressed into the record file format. Their creation will be handled by the file included with the MxNet framework, but we'll use our own script to create the lst files that serve to pair each image filepath with it's encoded bird species class.

from config import bird_config as config
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import progressbar
import pickle
import os

# read the contents of the labels file, then initialize the list of
# image paths and labels
print("[INFO] loading image paths and labels...")
rows = open(config.LABELS_PATH).read()
rows = rows.strip().split("\n")[1:]
trainPaths = []
trainLabels = []

# loop over the rows
for row in rows:
    # unpack the row, then update the image paths and labels list
    filename, species = row.split(",")
    filename = filename[filename.rfind("/") + 1:]
    trainPaths.append(os.sep.join([config.IMAGES_PATH, filename]))

# now that we have the total number of images in the dataset that
# can be used for training, compute the number of images that
# should be used for validation and testing
numVal = int(len(trainPaths) * config.NUM_VAL_IMAGES)
numTest = int(len(trainPaths) * config.NUM_TEST_IMAGES)

# our class labels are represented as strings so we need to encode
# them
print("[INFO] encoding labels...")
le = LabelEncoder().fit(trainLabels)
trainLabels = le.transform(trainLabels)

# perform sampling from the training set to construct a a validation
# set
print("[INFO] constructing validation data...")
split = train_test_split(trainPaths, trainLabels, test_size=numVal,
(trainPaths, valPaths, trainLabels, valLabels) = split

# perform stratified sampling from the training set to construct a
# a testing set
print("[INFO] constructing testing data...")
split = train_test_split(trainPaths, trainLabels, test_size=numTest,
(trainPaths, testPaths, trainLabels, testLabels) = split

# construct a list pairing the training, validation, and testing
# image paths along with their corresponding labels and output list
# files
datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_MX_LIST),
    ("val", valPaths, valLabels, config.VAL_MX_LIST),
    ("test", testPaths, testLabels, config.TEST_MX_LIST)]

# loop over the dataset tuples
for (dType, paths, labels, outputPath) in datasets:
    # open the output file for writing
    print("[INFO] building {}...".format(outputPath))
    f = open(outputPath, "w")

    # initialize the progress bar
    widgets = ["Building List: ", progressbar.Percentage(), " ",
               progressbar.Bar(), " ", progressbar.ETA()]
    pbar = progressbar.ProgressBar(maxval=len(paths),

    # loop over each of the individual images + labels
    for (i, (path, label)) in enumerate(zip(paths, labels)):
        # write the image index, label, and output path to file
        row = "\t".join([str(i), str(label), path])

    # close the output file

# write the label encoder to file
print("[INFO] serializing label encoder...")
f = open(config.LABEL_ENCODER_PATH, "wb")

Running our build dataset script will create the lst files in the directory defined by the config file we created earlier. Now that we have those, we can run the im2rec script to generate the rec files.

To make things easy I grabbed the im2py script off of MxNet's github and put it in my project directory:

cd /bird_ml/

To make generation of the rec files easier, I wrote a small script to create and organize them.


/bird_ml/venv/bin/python /bird_ml/ \
    --resize 256 --quality 100 --encoding '.jpg' /ML/Birds/lists ""

mv /ML/Birds/lists/*.idx /ML/Birds/rec
mv /ML/Birds/lists/*.rec /ML/Birds/rec

Now that we have our rec files, we're ready to start training! I'll handle that in the next post.