This Tiny Python Package Creates Huge Augmented Datasets

After months of hard work, you and your team have gathered a vast amount of data for your machine-learning project.

The project budget is almost over, and what’s left is only enough for training the model.

But as soon as you train the model, you start to see the model isn’t generalizing the problem well. The data you collected is not enough. The training accuracy is so good, but on the validation set, it drops drastically.

Technically, this is what we famously call overfitting.

There are different ways to deal with overfitting. But your team concludes none of them are working.

What’s left is one of two options:

Collect more data
Create copies of existing data points with slight adjustments ( data augmentation)

Data augmentation is proven to improve machine learning model accuracy without collecting further data. It’s a widespread technique many practitioners frequently use.

Collecting data is a costly task in many cases. You may have to pay for equipment for permissions, and not to mention, to label them after collection.

Take, for instance, the medical image classification problem. There are legal restrictions on how you should collect healthcare data. And after collection, to label them, you need the expertise of skilled professionals such as doctors.

This post will discuss an image data augmentation tool called Albumentations through examples. It’s an open-source Python library released under the MIT license.

You can install it with the following command from the PyPI repository. If you are looking for advanced instructions, please consult the official documentation.

pip install -U albumentations

Python

What data augmentation does?

Data augmentation creates copies of existing data points with some transformation. For instance, we can crop and rotate an image to look new to our training model. We could multifold our dataset and train the model to improve its accuracy with this ensembling method.

Here’s an illustration of what it means.

This is how data augmentation works. We can create multiple copies of the original image with sight variations. So that the machine learning model will now find it different and learn to recognize all of them. Rather than doing it manually, we can use a Python library such as Albumentation to integrate it into our pipeline.

We transformed the base image using different techniques. In many photos, we’ve used more than one technique in combination. By continuing this process, we can generate tons of data points.

Creating an Image augmentation pipeline using Albumentations.

Creating an augmentation pipeline using Albumentations is very straightforward.

Initially, we need to compose an augmentation pipeline by configuring a list of transformations. Then we can use any image processing library, such as Pillow or OpenCV, to read images from the filesystem. Every time we pass an image through the list of transformations we configured, it gives us an altered image.

Here’s an example usage you can replicate to get started.

import albumentations as A
from PIL import Image
import numpy as np

# Create a pipline with 4 different transformations. 
transform = A.Compose(
    [
        A.RandomCrop(width=256, height=256),
        A.HorizontalFlip(p=0.5),
        A.RandomBrightnessContrast(brightness_limit=.5, contrast_limit=.3),
        A.Rotate(),
    ]
)

# Read the image and convert it to a numpy array
pillow_image = Image.open("image.original.jpg")
image = np.array(pillow_image)

# Apply transformation
transformed = transform(image=image)

# Access and show transformation
transformed_image = transformed["image"]
img = Image.fromarray(transformed_image)

img.show()

Python

In the above example, we’ve used four types of transformations.

We cropped the image starting from random locations. We configured the RandomCrop API to result in a picture of 256×256 size.

We used a horizontal flip. Note that this operation is not applied all the time. We’ve configured a probability value of 0.5 for it. It means every image going through this pipeline has a 50% chance of being flipped horizontally.
The random brightness and contrast API will change the respective image features for limits of 0.5 and 0.3. We haven’t explicitly mentioned a probability value as we did with the horizontal flip. But the API has a default of 0.5. Hence, every image has a 50% chance of slight brightness and contrast modification. When it applies, brightness won’t be altered by more than 50%, and contrast to a maximum of 30%.
Finally, we randomly rotate the image. We haven’t overridden any default values here.

Then we read an image from the filesystem using Pillow, a widespread Python library for image processing. We’ve also transformed it into a NumPy array.

Lastly, we sent our image through the configured pipeline and displayed the result.

Albumentation has a couple of dozens of such transformations. You can learn more about them in detail from their API documentation.

How to create augmented data with annotations?

Most computer vision applications have to deal with annotated images. These are objects marked and labeled in a photo for training the ML model.

A good data augmentation tool should re-calculate the positions of the annotated objects in the transformed data automatically. For instance, this image is flipped and the position of the dog and cat are different now. However, the Python library for data augmentation, Albumentation re-annotated them well on the resulting image as well.

When augmenting such datasets, we also need to know the new positions of those annotated objects.

Our augmentation technique should re-calculate the coordinates accordingly. This task is effortless in Albumentation. We need to provide the initial coordinates and category id’s annotated objects to the pipeline. The result will have its new position.

bboxes = [[0, 128, 300, 420], [366.7, 340, 270, 230]]
category_ids = [1, 2]

transform = A.Compose(
    [A.HorizontalFlip(p=0.5), A.Rotate()],
    bbox_params=A.BboxParams(format="coco", label_fields=["category_ids"]), # Configuring pipeline for annotation
)

# Passing annotation coordinates and categories with the image
transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)

Python

We’ve altered the Compose method in the above example that creates the augmentation pipeline. We added another input, bbox_params, with its configurations.

By this, we’re telling Albumentation to use the coco format for augmenting annotated images and use the category_label to find its label. Coco is one of the four types of reannotation methods available in this library.

Finally, we pass two extra arguments at the run-time with every image. The first one defines the boxes by it’s starting coordinates, width, and height. Then the labels of each container we defined.

Here’s how the result looks.

Resulting image has re-annotated objects.

After transformation, the positions of the objects have changed. Yet, the pipeline has correctly spotted its new place.

Final Thought

Data augmentation saves an incredible amount of time, effort, and your budget by reusing existing images. We could do this with any image processing library. But some specific tasks may require extra effort to do it correctly.

For instance, annotated images require re-annotating the objects on the augmented image. A tool like Albumentations comes in handy in those cases.

This post is a short introduction to what we can do with this Python library. I hope next time you train a machine learning model, you’d use it to improve the accuracy without considering further data collection.

Thanks for the read, friend. It seems you and I have lots of common interests. Say Hi to me on LinkedIn, Twitter, and Medium. I’ll break the ice for you.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.