How could you give the power of building machine learning models and scale them for anyone in an organization?
How could anyone build models without heavily relying on data scientists?
Auto ML solutions automatically provide the ability to discover the best algorithms and hyperparameters for your data.
Building machine learning models can be tedious and time-consuming, especially if you're not an experienced data scientist. And even if you are an experienced data scientist, building models still requires a lot of trial and error to find the best algorithms and hyperparameters for your data. This is where Auto ML solutions come in.
Auto ML is a process of automatically discovering the best algorithms and hyperparameters for your data. You don't have to spend hours or days trying different models and tweaking parameters. Auto ML does all of that for you.
This post will examine Amazon SageMaker Canvas, a brand new AutoML solution to build end-to-end data pipelines and train ML models at scale.
An intro to Amazon SageMaker.
SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at scale. If not familiar, I would highly recommend checking out the docs.
If you want to build a machine learning workflow without SageMaker, you'll need to set up and manage a lot of infrastructures yourself.
For instance, to store and process your data, you'll need to set up and manage data storage (e.g., S3, DynamoDB) and compute resources (e.g., EC2, EMR). To train your machine learning models, you'll need to set up and manage a training environment (e.g., TensorFlow, MXNet, PyTorch). To deploy your machine learning models, you'll need to set up and manage a prediction environment (e.g., Amazon Lambda, Amazon API Gateway).
SageMaker takes care of all of that for you.
With SageMaker, you can focus on what matters most to you: building and training your machine learning models. When you need more resources, SageMaker will automatically scale up for you. And when you're done with your machine learning models, SageMaker will automatically clean up all your resources.
SageMaker Canvas is a new End-to-End AutoML solution that makes it easy for anyone in an organization to quickly and easily build, train, and deploy machine learning models at scale.
SageMaker Canvas is designed to be used by both data scientists and non-data scientists.
Data scientists can use SageMaker Canvas to build and iterate on machine learning models quickly. Non-data scientists can use SageMaker Canvas to build machine learning models without heavily relying on data scientists.
Building a machine learning model with SageMaker Canvas is simple. Here are the reasons why SageMaker is here to make a difference in the way people work with ML:
Connecting a wide range of data sources
Connecting to data sources is the first hurdle for most people. With SageMaker Canvas, you can connect to data sources using a simple point and click interface. SageMaker connects well with popular data sources like Amazon S3, Amazon Redshift, and Amazon Athena. You can also connect your data lake/warehouse such as Snowflake.
Once connected, you can join datasets from multiple data sources and get a single view of your data.
Semi-automated data cleaning and preparation
After your data is connected, the next step is to prepare your data for modeling. SageMaker Canvas provides a visual interface that makes it easy to do.
You can easily select the columns you want to use for your machine learning models, and SageMaker will automatically detect the data types. You can also specify how you want to handle missing values.
Building and training ML models with drag and drop interface
After your data is prepared, the next step is to select an algorithm and hyperparameters. With SageMaker Canvas, you can choose from a wide range of algorithms, including XGBoost, Linear Learner, Factorization Machines, and more profound neural networks such as Amazon Neural Factorization Machines (NFM) and Amazon DeepAR+.
You can also specify the range of hyperparameters.
You can also specify the hyperparameters for your machine learning models. SageMaker will automatically tune your hyperparameters and select the best model for you.
Deploy your ML model into production systems
After your machine learning model is trained, the next step is to deploy your machine learning model. With SageMaker Canvas, you can quickly deploy your machine learning models as a web service or batch prediction.
You can also monitor your deployed machine learning models in real-time and get alerts when there are issues.
Flexible Pricing Model
SageMaker Canvas has a flexible pricing model when you're using Canvas. Each session costs $1.9 for every hour.
Model training is priced separately and in steps. The first 10M cells cost you $30/million cells. This is often good enough for basic models. For the following 90M cells, the cost is only $15/million cells. Lastly, you'll have to pay only $7/million for datasets beyond 100M cells. That's for projects with more extensive training sets.
If you compare these costs with hiring a separate data scientist for every task, you'd appreciate SageMaker Canvas.
SageMaker Canvas is an excellent tool by AWS. The tool allows everyone to build an end-to-end machine learning pipeline without a data scientist.
It's built on top of Amazon SageMaker, a one-stop-shop for all ML-related infrastructure needs. Canvas extends it with a visual editor and friendly workflow interfaces to simplify the process.
Further, Canvas's attractive Flexi price model makes it a perfect solution for organizations at every stage.