Machine learning for novices and experts (original) (raw)

With Amazon Redshift, you can leverage Machine Learning (ML) capabilities to gain insights from your data, whether you are a novice or an expert in ML. Machine Learning is an Amazon Redshift feature that enables you to create, train, and deploy ML models using SQL commands, without the need for extensive ML expertise or complex data engineering.

The following sections guide you through the process of utilizing Machine Learning, empowering you to unlock the full potential of your data with Amazon Redshift.

Amazon Redshift ML enables you to train models with one single SQL CREATE MODEL command. The CREATE MODEL command creates a model that Amazon Redshift uses to generate model-based predictions with familiar SQL constructs.

Amazon Redshift ML is especially useful when you don't have expertise in machine learning, tools, languages, algorithms, and APIs. With Amazon Redshift ML, you don't have to perform the undifferentiated heavy lifting required for integrating with an external machine learning service. Amazon Redshift saves you the time to format and move data, manage permission controls, or build custom integrations, workflows, and scripts. You can easily use popular machine learning algorithms and simplify training needs that require frequent iteration from training to prediction. Amazon Redshift automatically discovers the best algorithm and tunes the best model for your problem. You can make predictions from within the Amazon Redshift cluster without the need to move data out of Amazon Redshift nor to interface with and pay for another service.

Amazon Redshift ML supports data analysts and data scientists in using machine learning. It also makes it possible for machine learning experts to use their knowledge to guide the CREATE MODEL statement to use only the aspects that they specify. By doing so, you can speed up the time that CREATE MODEL needs to find the best candidate, improve the accuracy of the model, or both.

The CREATE MODEL statement offers flexibility in how you can specify the parameters to training job. Using this flexibility, both machine learning novices or experts can choose their preferred preprocessors, algorithms, problem types, and hyperparameters. For example, a user interested in customer churn might specify for the CREATE MODEL statement that the problem type is a binary classification, which works well for customer churn. Then the CREATE MODEL statement narrows down its search for the best model into binary classification models. Even with the user choice of the problem type, there are still many options that the CREATE MODEL statement can work with. For example, the CREATE MODEL discovers and applies the best preprocessing transformations and discovers the best hyperparameter settings.

Amazon Redshift ML makes training easier by automatically finding the best model using Amazon SageMaker Autopilot. Behind the scenes, Amazon SageMaker Autopilot automatically trains and tunes the best machine learning model based on your supplied data. Amazon SageMaker Neo then compiles the training model and makes it available for prediction in your Redshift cluster. When you run a machine learning inference query using a trained model, the query can use the massively parallel processing capabilities of Amazon Redshift. At the same time, the query can use machine learning–based prediction.

As an Amazon Redshift ML user, you can choose any of the following options to train and deploy your model: