{"id":300,"date":"2022-01-18T00:00:00","date_gmt":"2022-01-18T00:00:00","guid":{"rendered":"https:\/\/tac.debuzzify.com\/?p=300"},"modified":"2023-06-21T12:55:08","modified_gmt":"2023-06-21T12:55:08","slug":"data-augmentation-in-python","status":"publish","type":"post","link":"https:\/\/www.the-analytics.club\/data-augmentation-in-python\/","title":{"rendered":"This Tiny Python Package Creates Huge Augmented Datasets"},"content":{"rendered":"\n\n\n

After months of hard work, you and your team have gathered a vast amount of data for your machine-learning project.<\/p>\n\n\n\n

The project budget is almost over, and what\u2019s left is only enough for training the model.<\/p>\n\n\n\n

But as soon as you train the model, you start to see the model isn\u2019t generalizing the problem well. The data you collected is not enough. The training accuracy is so good, but on the validation set, it drops drastically.<\/p>\n\n\n\n

Technically, this is what we famously call overfitting<\/a>.<\/p>\n\n\n\n

There are different ways to deal with overfitting. But your team concludes none of them are working.<\/p>\n\n\n\n

What\u2019s left is one of two options:<\/p>\n\n\n\n