In this data, the anomalies occurred when ridership was uncharacteristically high or low. In this post, we used SageMaker Random Cut Forest to detect anomalous data points in a taxi ridership dataset. We also see spikes on Labor Day weekend, New Year’s Day and the July 4th holiday weekend. The data in the following screenshot shows that the biggest spike in ridership occurs on November 2, 2014, which was the annual NYC marathon. Where score > (select score_cutoff_value from score_cutoff) Select ride_timestamp, nbr_passengers, public.remote_fn_rcf(nbr_passengers) as score (select stddev(public.remote_fn_rcf(nbr_passengers)) as std, avg(public.remote_fn_rcf(nbr_passengers)) as mean, ( mean + 3 * std ) as score_cutoff_value Prepare data to create a remote inference model using Amazon Redshift MLĬreate the schema and load the data in Amazon Redshift using the following SQL: You use this when you create the remote inference model in Amazon Redshift. On the Amazon SageMaker console, under Inference in the navigation pane, choose Endpoints to find your model name. Set up parameters as shown in the following screenshot and then run all cells. Then choose bring-your-own-model-remote-inference.ipynb. To deploy the model, go to the SageMaker console and open the notebook that was created by the CloudFormation template. The following figure illustrates how we use Amazon Redshift ML to create a model using the SageMaker endpoint. We then use this model to predict anomalous events by generating an anomaly score for each data point. We naturally expect to find anomalous events occurring during the NYC marathon, Thanksgiving, Christmas, New Year’s Day, and on the day of a snowstorm. The data consists of the number of New York City taxi passengers over the course of 6 months aggregated into 30-minute buckets. We downloaded the data and stored it in an Amazon Simple Storage Service (Amazon S3) bucket. In this post, we use the SageMaker RCF algorithm to train an RCF model using the Notebook generated by the CloudFormation template on the Numenta Anomaly Benchmark (NAB) NYC Taxi dataset. Examples of anomalies that are important to detect include when website activity uncharacteristically spikes, when temperature data diverges from a periodic behavior, or when changes to public transit ridership reflect the occurrence of a special event. For more information about various SageMaker algorithms and their inference formats, see Random Cut Forest (RCF) Algorithm.Īmazon SageMaker Random Cut Forest (RCF) is an algorithm designed to detect anomalous data points within a dataset. Solution overviewĪmazon Redshift ML supports text and CSV inference formats. You can use the following AWS CloudFormation template to provision all the required resources in your AWS accounts automatically. You also have to make sure that the SageMaker model is deployed and you have the endpoint. For an introduction to Amazon Redshift ML and instructions on setting it up, see Create, train, and deploy machine learning models in Amazon Redshift using SQL with Amazon Redshift ML. To get started, we need an Amazon Redshift cluster with the Amazon Redshift ML feature enabled. Then, we show how end users can invoke the model. We first train and deploy a Random Cut Forest model in SageMaker, and demonstrate how you can create a model with SQL to invoke that SageMaker predictions remotely. This post shows how you can enable your data warehouse users to use SQL to invoke a remote SageMaker endpoint for prediction. Additionally, Amazon Redshift ML allows data scientists to either import existing SageMaker models into Amazon Redshift for in-database inference or remotely invoke a SageMaker endpoint. We also discussed how Amazon Redshift ML enables ML experts to create XGBoost or MLP models in an earlier post. In a previous post, we covered how Amazon Redshift ML allows you to use your data in Amazon Redshift with SageMaker, a fully managed ML service, without requiring you to become an expert in ML. Data analysts and database developers want to use this data to train ML models, which can then be used to generate insights for use cases such as forecasting revenue, predicting customer churn, and detecting anomalies.Īmazon Redshift ML makes it easy for SQL users to create, train, and deploy ML models using familiar SQL commands. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. June 2023: This post was reviewed and updated for accuracy.Īmazon Redshift, a fast, fully managed, widely used cloud data warehouse, natively integrates with Amazon SageMaker for machine learning (ML).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |