Actions on rules using Amazon CloudWatch and AWS Lambda (original) (raw)
DocumentationAmazon SageMakerDeveloper Guide
Amazon CloudWatch collects Amazon SageMaker AI model training job logs and Amazon SageMaker Debugger rule processing job logs. Configure Debugger with Amazon CloudWatch Events and AWS Lambda to take action based on Debugger rule evaluation status.
Example notebooks
You can run the following example notebooks, which are prepared for experimenting with stopping a training job using actions on Debugger's built-in rules using Amazon CloudWatch and AWS Lambda.
- Amazon SageMaker Debugger - Reacting to CloudWatch Events from Rules
This example notebook runs a training job that has a vanishing gradient issue. The Debugger VanishingGradient built-in rule is used while constructing the SageMaker AI TensorFlow estimator. When the Debugger rule detects the issue, the training job is terminated. - Detect Stalled Training and Invoke Actions Using SageMaker Debugger Rule
This example notebook runs a training script with a code line that forces it to sleep for 10 minutes. The Debugger StalledTrainingRule built-in rule invokes issues and stops the training job.
Topics
- Access CloudWatch logs for Debugger rules and training jobs
- Set up Debugger for automated training job termination using CloudWatch and Lambda
- Disable the CloudWatch Events rule to stop using the automated training job termination
Use Debugger built-in actions for rules
CloudWatch logs for Debugger rules and training jobs
Did this page help you? - Yes
Thanks for letting us know we're doing a good job!
If you've got a moment, please tell us what we did right so we can do more of it.
Did this page help you? - No
Thanks for letting us know this page needs work. We're sorry we let you down.
If you've got a moment, please tell us how we can make the documentation better.