{"id":300,"date":"2022-01-18T00:00:00","date_gmt":"2022-01-18T00:00:00","guid":{"rendered":"https:\/\/tac.debuzzify.com\/?p=300"},"modified":"2023-06-21T12:55:08","modified_gmt":"2023-06-21T12:55:08","slug":"data-augmentation-in-python","status":"publish","type":"post","link":"https:\/\/www.the-analytics.club\/data-augmentation-in-python\/","title":{"rendered":"This Tiny Python Package Creates Huge Augmented Datasets"},"content":{"rendered":"\n\n\n
After months of hard work, you and your team have gathered a vast amount of data for your machine-learning project.<\/p>\n\n\n\n
The project budget is almost over, and what\u2019s left is only enough for training the model.<\/p>\n\n\n\n
But as soon as you train the model, you start to see the model isn\u2019t generalizing the problem well. The data you collected is not enough. The training accuracy is so good, but on the validation set, it drops drastically.<\/p>\n\n\n\n
Technically, this is what we famously call overfitting<\/a>.<\/p>\n\n\n\n There are different ways to deal with overfitting. But your team concludes none of them are working.<\/p>\n\n\n\n What\u2019s left is one of two options:<\/p>\n\n\n\n Data augmentation is proven to improve machine learning model accuracy<\/a> without collecting further data. It\u2019s a widespread technique many practitioners frequently use.<\/p>\n\n\n\n Collecting data is a costly task in many cases. You may have to pay for equipment for permissions, and not to mention, to label them after collection.<\/p>\n\n\n\n Related:<\/i> How to know if deep learning is right for you?<\/i><\/b><\/a><\/p>\n\n\n\n Take, for instance, the medical image classification problem. There are legal restrictions<\/a> on how you should collect healthcare data. And after collection, to label them, you need the expertise of skilled professionals such as doctors.<\/p>\n\n\n\n This post will discuss an image data augmentation tool called Albumentations<\/a> through examples. It\u2019s an open-source Python library released under the MIT license.<\/p>\n\n\n\n You can install it with the following command from the PyPI repository. If you are looking for advanced instructions, please consult the official documentation<\/a>.<\/p>\n\n\n\n\n