Default system monitoring and customized framework profiling for target steps or a target time range (original) (raw)

If you want to specify target steps or target time intervals to profile your training job, you need to specify parameters for the FrameworkProfile class. The following code examples show how to specify the target ranges for profiling along with system monitoring.

from sagemaker.debugger import ProfilerConfig, FrameworkProfile  
      
profiler_config=ProfilerConfig(  
    framework_profile_params=FrameworkProfile(start_step=5, num_steps=10)  
)  

With the following example configuration, Debugger monitors the entire training job every 1000 milliseconds and profiles a target step range from step 5 to step 15 (for 10 steps).

from sagemaker.debugger import ProfilerConfig, FrameworkProfile  
      
profiler_config=ProfilerConfig(  
    system_monitor_interval_millis=1000,  
    framework_profile_params=FrameworkProfile(start_step=5, num_steps=10)  
)  
import time  
from sagemaker.debugger import ProfilerConfig, FrameworkProfile  
profiler_config=ProfilerConfig(  
    framework_profile_params=FrameworkProfile(start_unix_time=int(time.time()), duration=600)  
)  

With the following example configuration, Debugger monitors the entire training job every 1000 milliseconds and profiles a target time range from the current Unix time for 600 seconds.

import time  
from sagemaker.debugger import ProfilerConfig, FrameworkProfile  
profiler_config=ProfilerConfig(  
    system_monitor_interval_millis=1000,  
    framework_profile_params=FrameworkProfile(start_unix_time=int(time.time()), duration=600)  
)  

The framework profiling is performed for all of the profiling options at the target step or time range.
To find more information about available profiling options, see SageMaker Debugger APIs – FrameworkProfile in theAmazon SageMaker Python SDK.
The next section shows you how to script the available profiling options.

Default profiling

Profiling with different profiling options

Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.