Pipelines — sagemaker 2.247.0 documentation (original) (raw)
ConditionStep
class sagemaker.workflow.condition_step.ConditionStep(name, depends_on=None, display_name=None, description=None, conditions=None, if_steps=None, else_steps=None)
Conditional step for pipelines to support conditional branching in the execution of steps.
Construct a ConditionStep for pipelines to support conditional branching.
If all the conditions in the condition list evaluate to True, the if_steps are marked as ready for execution. Otherwise, the else_steps are marked as ready for execution.
Parameters:
- name (str) – The name of the condition step.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – The list of Step/StepCollection` names or Step instances or StepCollection instances that the current Stepdepends on.
- display_name (str) – The display name of the condition step.
- description (str) – The description of the condition step.
- conditions (List _[_Condition]) – A list of sagemaker.workflow.conditions.Conditioninstances.
- if_steps (List [_ _Union_ _[_Step,_ StepCollection] ]) – A list of sagemaker.workflow.steps.Stepor sagemaker.workflow.step_collections.StepCollection instances that are marked as ready for execution if the list of conditions evaluates to True.
- else_steps (List [_ _Union_ _[_Step,_ StepCollection] ]) – A list of sagemaker.workflow.steps.Stepor sagemaker.workflow.step_collections.StepCollection instances that are marked as ready for execution if the list of conditions evaluates to False.
Deprecated since version sagemaker.workflow.condition_step.JsonGet.
Conditions
class sagemaker.workflow.conditions.ConditionTypeEnum(*args, value=, **kwargs)
Condition type enum.
class sagemaker.workflow.conditions.Condition(condition_type=_Nothing.NOTHING)
Abstract Condition entity.
Parameters:
condition_type (ConditionTypeEnum) –
condition_type
The type of condition.
Type:
Method generated by attrs for class Condition.
class sagemaker.workflow.conditions.ConditionComparison(condition_type=_Nothing.NOTHING, left=None, right=None)
Generic comparison condition that can be used to derive specific condition comparisons.
Parameters:
- condition_type (ConditionTypeEnum) –
- left (ExecutionVariable | Parameter | Properties | StepOutput | str | int | bool | float | None) –
- right (ExecutionVariable | Parameter | Properties | StepOutput | str | int | bool | float | None) –
left
The execution variable, parameter, property, step output or Python primitive value to use in the comparison.
Type:
Union[ConditionValueType, PrimitiveType]
right
The execution variable, parameter, property, step output or Python primitive value to compare to.
Type:
Union[ConditionValueType, PrimitiveType]
Method generated by attrs for class ConditionComparison.
class sagemaker.workflow.conditions.ConditionEquals(left, right)
A condition for equality comparisons.
Construct A condition for equality comparisons.
Parameters:
- left (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
- right (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to compare to.
class sagemaker.workflow.conditions.ConditionGreaterThan(left, right)
A condition for greater than comparisons.
Construct an instance of ConditionGreaterThan for greater than comparisons.
Parameters:
- left (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
- right (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to compare to.
class sagemaker.workflow.conditions.ConditionGreaterThanOrEqualTo(left, right)
A condition for greater than or equal to comparisons.
Construct of ConditionGreaterThanOrEqualTo for greater than or equal to comparisons.
Parameters:
- left (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
- right (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to compare to.
class sagemaker.workflow.conditions.ConditionLessThan(left, right)
A condition for less than comparisons.
Construct an instance of ConditionLessThan for less than comparisons.
Parameters:
- left (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
- right (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to compare to.
class sagemaker.workflow.conditions.ConditionLessThanOrEqualTo(left, right)
A condition for less than or equal to comparisons.
Construct ConditionLessThanOrEqualTo for less than or equal to comparisons.
Parameters:
- left (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to use in the comparison.
- right (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property, or Python primitive value to compare to.
class sagemaker.workflow.conditions.ConditionIn(value, in_values)
A condition to check membership.
Construct a ConditionIn condition to check membership.
Parameters:
- value (Union [ ConditionValueType , PrimitiveType ]) – The execution variable, parameter, property or primitive value to check for membership.
- in_values (List [ Union [ ConditionValueType , PrimitiveType ] ]) – The list of values to check for membership in.
class sagemaker.workflow.conditions.ConditionNot(expression)
A condition for negating another Condition.
Construct a ConditionNot condition for negating another Condition.
Parameters:
expression (Condition) –
expression
A Condition to take the negation of.
Type:
class sagemaker.workflow.conditions.ConditionOr(conditions=None)
A condition for taking the logical OR of a list of Condition instances.
Construct a ConditionOr condition.
Parameters:
conditions (List_[_Condition]) –
conditions
A list of Condition instances to logically OR.
Type:
List[Condition]
CheckJobConfig
class sagemaker.workflow.check_job_config.CheckJobConfig(role, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)
Check job config for QualityCheckStep and ClarifyCheckStep.
Constructs a CheckJobConfig instance.
Parameters:
- role (str) – An AWS IAM role. The Amazon SageMaker jobs use this role.
- instance_count (int) – The number of instances to run the jobs with (default: 1).
- instance_type (str) – Type of EC2 instance to use for the job (default: ‘ml.m5.xlarge’).
- volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).
- volume_kms_key (str) – A KMS key for the processing volume (default: None).
- output_kms_key (str) – The KMS key id for the job’s outputs (default: None).
- max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600 if not specified
- base_job_name (str) – Prefix for the job name. If not specified, a default name is generated based on the training image name and current timestamp (default: None).
- sagemaker_session (sagemaker.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed (default: None). If not specified, one is created using the default AWS configuration chain.
- env (dict) – Environment variables to be passed to the job (default: None).
- tags (Optional [ Tags ]) – List of tags to be passed to the job (default: None).
- network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets (default: None).
Entities
class sagemaker.workflow.entities.Entity
Base object for workflow entities.
Entities must implement the to_request method.
class sagemaker.workflow.entities.DefaultEnumMeta(cls, bases, classdict, *, boundary=None, _simple=False, **kwds)
An EnumMeta which defaults to the first value in the Enum list.
class sagemaker.workflow.entities.Expression
Base object for expressions.
Expressions must implement the expr property.
class sagemaker.workflow.entities.PipelineVariable
Base object for pipeline variables
PipelineVariable subclasses must implement the expr property. Its subclasses include:Parameter,Properties,Join,JsonGet,ExecutionVariable.StepOutput.
Execution Variables
class sagemaker.workflow.execution_variables.ExecutionVariable(name)
Pipeline execution variables for workflow.
Create a pipeline execution variable.
Parameters:
name (str) – The name of the execution variable.
class sagemaker.workflow.execution_variables.ExecutionVariables
Provide access to all available execution variables:
- ExecutionVariables.START_DATETIME
- ExecutionVariables.CURRENT_DATETIME
- ExecutionVariables.PIPELINE_NAME
- ExecutionVariables.PIPELINE_ARN
- ExecutionVariables.PIPELINE_EXECUTION_ID
- ExecutionVariables.PIPELINE_EXECUTION_ARN
- ExecutionVariables.TRAINING_JOB_NAME
- ExecutionVariables.PROCESSING_JOB_NAME
Functions
class sagemaker.workflow.functions.Join(on=_Nothing.NOTHING, values=_Nothing.NOTHING)
Join together properties.
Examples: Build a Amazon S3 Uri with bucket name parameter and pipeline execution Id and use it as training input:
bucket = ParameterString('bucket', default_value='my-bucket')
TrainingInput( s3_data=Join( on='/', values=['s3:/', bucket, ExecutionVariables.PIPELINE_EXECUTION_ID] ), content_type="text/csv")
Parameters:
values
The primitive type values, parameters, step properties, expressions to join.
Type:
List[Union[PrimitiveType, Parameter, Expression]]
on
The string to join the values on (Defaults to “”).
Type:
Method generated by attrs for class Join.
class sagemaker.workflow.functions.JsonGet(step_name=None, property_file=None, json_path=None, s3_uri=None, step=None)
Get JSON properties from PropertyFiles or S3 location.
Parameters:
- step_name (str) –
- property_file (PropertyFile | str | None) –
- json_path (str) –
- s3_uri (Join | None) –
- step (Step) –
step_name
The step name from which to get the property file.
Type:
property_file
Either a PropertyFile instance or the name of a property file.
Type:
Optional[Union[PropertyFile, str]]
json_path
The JSON path expression to the requested value.
Type:
s3_uri
The S3 location from which to fetch a Json file. The Json file is the output of a step defined with @step
decorator.
Type:
Optional[sagemaker.workflow.functions.Join]
step
The upstream step object which the s3_uri is associated to.
Type:
Method generated by attrs for class JsonGet.
Parameters
class sagemaker.workflow.parameters.ParameterTypeEnum(*args, value=, **kwargs)
Parameter type enum.
class sagemaker.workflow.parameters.Parameter(name=_Nothing.NOTHING, parameter_type=_Nothing.NOTHING, default_value=None)
Pipeline parameter for workflow.
Parameters:
name
The name of the parameter.
Type:
parameter_type
The type of the parameter.
Type:
default_value
The default value of the parameter.
Type:
PrimitiveType
Method generated by attrs for class Parameter.
class sagemaker.workflow.parameters.ParameterString(name, default_value=None, enum_values=None)
String parameter for pipelines.
Create a pipeline string parameter.
Parameters:
- name (str) – The name of the parameter.
- default_value (str) – The default value of the parameter. The default value could be overridden at start of an execution. If not set or it is set to None, a value must be provided at the start of the execution.
- enum_values (List _[_str]) – Enum values for this parameter.
class sagemaker.workflow.parameters.ParameterInteger(name, default_value=None)
Integer parameter for pipelines.
Create a pipeline integer parameter.
Parameters:
- name (str) – The name of the parameter.
- default_value (int) – The default value of the parameter. The default value could be overridden at start of an execution. If not set or it is set to None, a value must be provided at the start of the execution.
class sagemaker.workflow.parameters.ParameterFloat(name, default_value=None)
Float parameter for pipelines.
Create a pipeline float parameter.
Parameters:
- name (str) – The name of the parameter.
- default_value (float) – The default value of the parameter. The default value could be overridden at start of an execution. If not set or it is set to None, a value must be provided at the start of the execution.
sagemaker.workflow.parameters.ParameterBoolean
alias of functools.partial(<class ‘sagemaker.workflow.parameters.Parameter’>, parameter_type=<ParameterTypeEnum.BOOLEAN: ‘Boolean’>)
Pipeline
class sagemaker.workflow.pipeline.Pipeline(name='', parameters=None, pipeline_experiment_config=<sagemaker.workflow.pipeline_experiment_config.PipelineExperimentConfig object>, steps=None, sagemaker_session=None, pipeline_definition_config=<sagemaker.workflow.pipeline_definition_config.PipelineDefinitionConfig object>)
Pipeline for workflow.
Initialize a Pipeline
Parameters:
- name (str) – The name of the pipeline.
- parameters (Sequence _[_Parameter]) – The list of the parameters.
- pipeline_experiment_config (Optional _[_PipelineExperimentConfig]) – If set, the workflow will attempt to create an experiment and trial before executing the steps. Creation will be skipped if an experiment or a trial with the same name already exists. By default, pipeline name is used as experiment name and execution id is used as the trial name. If set to None, no experiment or trial will be created automatically.
- steps (Sequence [_ _Union_ _[_Step,_ StepCollection, StepOutput] ]) – The list of the non-conditional steps associated with the pipeline. Any steps that are within theif_steps or else_steps of a ConditionStep cannot be listed in the steps of a pipeline. Of particular note, the workflow service rejects any pipeline definitions that specify a step in the list of steps of a pipeline and that step in theif_steps or else_steps of any ConditionStep.
- sagemaker_session (sagemaker.session.Session) – Session object that manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the pipeline creates one using the default AWS configuration chain.
- pipeline_definition_config (Optional _[_PipelineDefinitionConfig]) – If set, the workflow customizes the pipeline definition using the configurations specified. By default, custom job-prefixing is turned off.
create(role_arn=None, description=None, tags=None, parallelism_config=None)
Creates a Pipeline in the Pipelines service.
Parameters:
- role_arn (str) – The role arn that is assumed by the pipeline to create step artifacts.
- description (str) – A description of the pipeline.
- tags (Optional [ Tags ]) – Tags to be passed to the pipeline.
- parallelism_config (Optional _[_ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
Returns:
A response dict from the service.
Return type:
describe()
Describes a Pipeline in the Workflow service.
Returns:
Response dict from the service. See boto3 client documentation
Return type:
update(role_arn=None, description=None, parallelism_config=None)
Updates a Pipeline in the Workflow service.
Parameters:
- role_arn (str) – The role arn that is assumed by pipelines to create step artifacts.
- description (str) – A description of the pipeline.
- parallelism_config (Optional _[_ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
Returns:
A response dict from the service.
Return type:
upsert(role_arn=None, description=None, tags=None, parallelism_config=None)
Creates a pipeline or updates it, if it already exists.
Parameters:
- role_arn (str) – The role arn that is assumed by workflow to create step artifacts.
- description (str) – A description of the pipeline.
- tags (Optional [ Tags ]) – Tags to be passed.
- steps (parallelism_config ( Optional [ Config for parallel) – is applied to each of the executions
- that (Parallelism configuration) – is applied to each of the executions
- parallelism_config (ParallelismConfiguration) –
Returns:
response dict from service
Return type:
delete()
Deletes a Pipeline in the Workflow service.
Returns:
A response dict from the service.
Return type:
start(parameters=None, execution_display_name=None, execution_description=None, parallelism_config=None, selective_execution_config=None)
Starts a Pipeline execution in the Workflow service.
Parameters:
- parameters (Dict [_str,_ Union [_str,_ bool, int, float] ]) – values to override pipeline parameters.
- execution_display_name (str) – The display name of the pipeline execution.
- execution_description (str) – A description of the execution.
- parallelism_config (Optional _[_ParallelismConfiguration]) – Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
- selective_execution_config (Optional _[_SelectiveExecutionConfig]) – The configuration for selective step execution.
Returns:
A _PipelineExecution instance, if successful.
definition()
Converts a request structure to string representation for workflow service calls.
Returns:
A JSON formatted string of pipeline definition.
Return type:
list_executions(sort_by=None, sort_order=None, max_results=None, next_token=None)
Lists a pipeline’s executions.
Parameters:
- sort_by (str) – The field by which to sort results(CreationTime/PipelineExecutionArn).
- sort_order (str) – The sort order for results (Ascending/Descending).
- max_results (int) – The maximum number of pipeline executions to return in the response.
- next_token (str) – If the result of the previous ListPipelineExecutions request was truncated, the response includes a NextToken. To retrieve the next set of pipeline executions, use the token in the next request.
Returns:
List of Pipeline Execution Summaries. See boto3 client list_pipeline_executionshttps://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.list_pipeline_executions
Return type:
build_parameters_from_execution(pipeline_execution_arn, parameter_value_overrides=None)
Gets the parameters from an execution, update with optional parameter value overrides.
Parameters:
- pipeline_execution_arn (str) – The arn of the reference pipeline execution.
- parameter_value_overrides (Dict [_str,_ Union [_str,_ bool, int, float] ]) – Parameter dict to be updated with the parameters from the referenced execution.
Returns:
A parameter dict built from an execution and provided parameter value overrides.
Return type:
Dict[str, str | bool | int | float]
put_triggers(triggers, role_arn=None)
Attach triggers to a parent SageMaker Pipeline.
Parameters:
- triggers (List [ Trigger ]) – List of supported triggers. Currently, this can only be of type PipelineSchedule.
- role_arn (str) – The role arn that is assumed by EventBridge service.
Returns:
Successfully created trigger Arn(s). Currently, the pythonSDK only supports
PipelineSchedule triggers, thus, this is a list of EventBridge Schedule Arn(s) that were created/upserted.
Return type:
List[str]
describe_trigger(trigger_name)
Describe Trigger for a parent SageMaker Pipeline.
Parameters:
trigger_name (str) – Trigger name to be described. Currently, this can only be an EventBridge schedule name.
Returns:
Trigger describe responses from EventBridge.
Return type:
delete_triggers(trigger_names)
Delete Triggers for a parent SageMaker Pipeline if they exist.
Parameters:
trigger_names (List _[_str]) – List of trigger names to be deleted. Currently, these can only be EventBridge schedule names.
class sagemaker.workflow.pipeline._PipelineExecution(arn, sagemaker_session=_Nothing.NOTHING)
Internal class for encapsulating pipeline execution instances.
Parameters:
arn
The arn of the pipeline execution.
Type:
sagemaker_session
Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the pipeline creates one using the default AWS configuration chain.
Type:
Method generated by attrs for class _PipelineExecution.
stop()
Stops a pipeline execution.
describe()
Describes a pipeline execution.
Returns:
Information about the pipeline execution. Seeboto3 client describe_pipeline_execution.
list_steps()
Describes a pipeline execution’s steps.
Returns:
Information about the steps of the pipeline execution. Seeboto3 client list_pipeline_execution_steps.
list_parameters(max_results=None, next_token=None)
Gets a list of parameters for a pipeline execution.
Parameters:
- max_results (int) – The maximum number of parameters to return in the response.
- next_token (str) – If the result of the previous ListPipelineParametersForExecutionrequest was truncated, the response includes a NextToken. To retrieve the next set of parameters, use the token in the next request.
Returns:
Information about the parameters of the pipeline execution. This function is also a wrapper for list_pipeline_parameters_for_execution.
wait(delay=30, max_attempts=60)
Waits for a pipeline execution.
Parameters:
- delay (int) – The polling interval. (Defaults to 30 seconds)
- max_attempts (int) – The maximum number of polling attempts. (Defaults to 60 polling attempts)
result(step_name)
Retrieves the output of the provided step if it is a @step
decorated function.
Parameters:
step_name (str) – The name of the pipeline step.
Returns:
The step output.
Raises:
- ValueError if the provided step is not a @step decorated function. –
- RemoteFunctionError if the provided step is not in "Completed" status –
Pipeline Context
class sagemaker.workflow.pipeline_context.PipelineSession(boto_session=None, sagemaker_client=None, default_bucket=None, settings=<sagemaker.session_settings.SessionSettings object>, sagemaker_config=None, default_bucket_prefix=None)
Managing interactions with SageMaker APIs and AWS services needed under Pipeline Context
This class inherits the SageMaker session, it provides convenient methods for manipulating entities and resources that Amazon SageMaker uses, such as training jobs, endpoints, and input datasets in S3. When composing SageMaker Model-Building Pipeline, PipelineSession is recommended over regular SageMaker Session
Initialize a PipelineSession
.
Parameters:
- boto_session (boto3.session.Session) – The underlying Boto3 session which AWS service calls are delegated to (default: None). If not provided, one is created with default AWS configuration chain.
- sagemaker_client (boto3.SageMaker.Client) – Client which makes Amazon SageMaker service calls other than
InvokeEndpoint
(default: None). Estimators created using thisSession
use this client. If not provided, one will be created using this instance’sboto_session
. - default_bucket (str) – The default Amazon S3 bucket to be used by this session. This will be created the next time an Amazon S3 bucket is needed (by calling
default_bucket()
). If not provided, a default bucket will be created based on the following format: “sagemaker-{region}-{aws-account-id}”. Example: “sagemaker-my-custom-bucket”. - settings (sagemaker.session_settings.SessionSettings) – Optional. Set of optional parameters to apply to the session.
- sagemaker_config (dict) – A dictionary containing default values for the SageMaker Python SDK. (default: None). The dictionary must adhere to the schema defined at ~sagemaker.config.config_schema.SAGEMAKER_PYTHON_SDK_CONFIG_SCHEMA. If sagemaker_config is not provided and configuration files exist (at the default paths for admins and users, or paths set through the environment variables SAGEMAKER_ADMIN_CONFIG_OVERRIDE and SAGEMAKER_USER_CONFIG_OVERRIDE), a new dictionary will be generated from those configuration files. Alternatively, this dictionary can be generated by calling
load_sagemaker_config()
and then be provided to the Session. - default_bucket_prefix (str) – The default prefix to use for S3 Object Keys. When objects are saved to the Session’s default_bucket, the Object Key used will start with the default_bucket_prefix. If not provided here or within sagemaker_config, no additional prefix will be added.
property context
Hold contextual information useful to the session
init_model_step_arguments(model)
Create a _ModelStepArguments (if not exist) as pipeline context
Parameters:
model (Model or PipelineModel) – A sagemaker.model.Modelor sagemaker.pipeline.PipelineModel instance
class sagemaker.workflow.pipeline_context.LocalPipelineSession(boto_session=None, default_bucket=None, s3_endpoint_url=None, disable_local_code=False, default_bucket_prefix=None)
Managing a session that executes Sagemaker pipelines and jobs locally in a pipeline context.
This class inherits from the LocalSession and PipelineSession classes. When running Sagemaker pipelines locally, this class is preferred over LocalSession.
Initialize a LocalPipelineSession
.
Parameters:
- boto_session (boto3.session.Session) – The underlying Boto3 session which AWS service calls are delegated to (default: None). If not provided, one is created with default AWS configuration chain.
- default_bucket (str) – The default Amazon S3 bucket to be used by this session. This will be created the next time an Amazon S3 bucket is needed (by calling
default_bucket()
). If not provided, a default bucket will be created based on the following format: “sagemaker-{region}-{aws-account-id}”. Example: “sagemaker-my-custom-bucket”. - s3_endpoint_url (str) – Override the default endpoint URL for Amazon S3, if set (default: None).
- disable_local_code (bool) – Set to True to override the default AWS configuration chain to disable the local.local_code setting, which may not be supported for some SDK features (default: False).
- default_bucket_prefix (str) – The default prefix to use for S3 Object Keys. When objects are saved to the Session’s default_bucket, the Object Key used will start with the default_bucket_prefix. If not provided here or within sagemaker_config, no additional prefix will be added.
Pipeline Schedule
class sagemaker.workflow.triggers.PipelineSchedule(name=None, enabled=True, start_date=None, at=None, rate=None, cron=None)
Pipeline Schedule trigger type used to create EventBridge Schedules for SageMaker Pipelines.
To create a pipeline schedule, specify a single type using the at
, rate
, or cron
parameters. For more information about EventBridge syntax, seeSchedule types on EventBridge Scheduler.
Parameters:
- start_date (datetime) – The start date of the schedule. Default is
time.now()
. - at (datetime) – An “At” EventBridge expression. Defaults to UTC timezone. Note that if you use
datetime.now()
, the result is a snapshot of your current local time. Eventbridge requires a time in UTC format. You can convert the result ofdatetime.now()
to UTC by usingdatetime.utcnow()
ordatetime.now(tz=pytz.utc)
. For example, you can create a time two minutes from now with the expressiondatetime.now(tz=pytz.utc) + timedelta(0, 120)
. - rate (tuple) – A “Rate” EventBridge expression. Format is (value, unit).
- cron (str) – A “Cron” EventBridge expression. Format is “minutes hours day-of-month month day-of-week year”.
- name (str) – The schedule name. Default is
None
. - enabled (boolean) – If the schedule is enabled. Defaults to
True
.
Method generated by attrs for class PipelineSchedule.
Parallelism Configuration
class sagemaker.workflow.parallelism_config.ParallelismConfiguration(max_parallel_execution_steps)
Parallelism config for SageMaker pipeline.
Create a ParallelismConfiguration
Parameters:
- max_parallel_execution_steps (int) – max number of steps which could be parallelized
- int – max number of steps which could be parallelized
to_request()
Returns: the request structure.
Return type:
Dict[str, Any] | List[Dict[str, Any]]
Pipeline Definition Config
class sagemaker.workflow.pipeline_definition_config.PipelineDefinitionConfig(use_custom_job_prefix)
Pipeline Definition Configuration for SageMaker pipeline.
Create a PipelineDefinitionConfig.
Examples: Use a PipelineDefinitionConfig to turn on custom job prefixing:
PipelineDefinitionConfig(use_custom_job_prefix=True)
Parameters:
use_custom_job_prefix (bool) – A feature flag to toggle on/off custom name prefixing during pipeline orchestration.
Pipeline Experiment Config
class sagemaker.workflow.pipeline_experiment_config.PipelineExperimentConfig(experiment_name, trial_name)
Experiment config for SageMaker pipeline.
Create a PipelineExperimentConfig
Examples: Use pipeline name as the experiment name and pipeline execution id as the trial name:
PipelineExperimentConfig( ExecutionVariables.PIPELINE_NAME, ExecutionVariables.PIPELINE_EXECUTION_ID)
Use a customized experiment name and pipeline execution id as the trial name:
PipelineExperimentConfig( 'MyExperiment', ExecutionVariables.PIPELINE_EXECUTION_ID)
Parameters:
- experiment_name (Union [_str,_ Parameter, ExecutionVariable, Expression]) – the name of the experiment that will be created.
- trial_name (Union [_str,_ Parameter, ExecutionVariable, Expression]) – the name of the trial that will be created.
class sagemaker.workflow.pipeline_experiment_config.PipelineExperimentConfigProperty(name)
Reference to pipeline experiment config property.
Create a reference to pipeline experiment property.
Parameters:
name (str) – The name of the pipeline experiment config property.
Selective Execution Config
class sagemaker.workflow.selective_execution_config.SelectiveExecutionConfig(selected_steps, reference_latest_execution=True, source_pipeline_execution_arn=None)
The selective execution configuration, which defines a subset of pipeline steps to run in
another SageMaker pipeline run.
Create a SelectiveExecutionConfig.
Parameters:
- source_pipeline_execution_arn (str) – The ARN from a reference execution of the current pipeline. Used to copy input collaterals needed for the selected steps to run. The execution status of the pipeline can be Stopped, Failed, orSucceeded.
- selected_steps (List _[_str]) – A list of pipeline steps to run. All step(s) in all path(s) between two selected steps should be included.
- reference_latest_execution (bool) – Whether to reference the latest execution ifsource_pipeline_execution_arn is not provided.
Properties
class sagemaker.workflow.properties.PropertiesMeta(*args, **kwargs)
Load an internal shapes attribute from the botocore service model
for sagemaker and emr service.
Loads up the shapes from the botocore service model.
class sagemaker.workflow.properties.Properties(step_name, path=None, shape_name=None, shape_names=None, service_name='sagemaker', step=None)
Properties for use in workflow expressions.
Create a Properties instance representing the given shape.
Parameters:
- step_name (str) – The name of the Step this Property belongs to.
- path (str) – The relative path of this Property value.
- shape_name (str) – The botocore service model shape name.
- shape_names (str) – A List of the botocore service model shape name.
- step (Step) – The Step object this Property belongs to.
- service_name (str) –
class sagemaker.workflow.properties.PropertiesList(step_name, path, shape_name=None, service_name='sagemaker', step=None)
PropertiesList for use in workflow expressions.
Create a PropertiesList instance representing the given shape.
Parameters:
- step_name (str) – The name of the Step this Property belongs to.
- path (str) – The relative path of this Property value.
- shape_name (str) – The botocore service model shape name.
- service_name (str) – The botocore service name.
- step (Step) –
class sagemaker.workflow.properties.PropertyFile(name, output_name, path)
Provides a property file struct.
Parameters:
name
The name of the property file for reference with JsonGet functions.
Type:
output_name
The name of the processing job output channel.
Type:
path
The path to the file at the output channel location.
Type:
Method generated by attrs for class PropertyFile.
Step Collections
class sagemaker.workflow.step_collections.StepCollection(name, steps=_Nothing.NOTHING, depends_on=None)
A wrapper of pipeline steps for workflow.
Parameters:
- name (str) –
- steps (List_[_Step]) –
- depends_on (List_[_str | Step | StepCollection | StepOutput]) –
name
The name of the StepCollection.
Type:
steps
A list of steps.
Type:
List[Step]
depends_on
The list of Step/StepCollection names or Step/StepCollection/StepOutputinstances that the current Step depends on.
Type:
List[Union[str, Step, StepCollection, StepOutput]]
Method generated by attrs for class StepCollection.
class sagemaker.workflow.step_collections.RegisterModel(name, content_types, response_types, inference_instances=None, transform_instances=None, estimator=None, model_data=None, depends_on=None, repack_model_step_retry_policies=None, register_model_step_retry_policies=None, model_package_group_name=None, model_metrics=None, approval_status=None, image_uri=None, compile_model_family=None, display_name=None, description=None, tags=None, model=None, drift_check_baselines=None, customer_metadata_properties=None, domain=None, sample_payload_url=None, task=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, skip_model_validation=None, source_uri=None, model_card=None, model_life_cycle=None, **kwargs)
Register Model step collection for workflow.
Construct steps _RepackModelStep and _RegisterModelStep based on the estimator.
Parameters:
- name (str) – The name of the training step.
- estimator (EstimatorBase | None) – The estimator instance.
- model_data – The S3 uri to the model data from training.
- content_types (list) – The supported MIME types for the input data (default: None).
- response_types (list) – The supported MIME types for the output data (default: None).
- inference_instances (list) – A list of the instance types that are used to generate inferences in real-time (default: None).
- transform_instances (list) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – The list of Step/StepCollectionnames or Step instances or StepCollection instances that the first step in the collection depends on (default: None).
- repack_model_step_retry_policies (List [ RetryPolicy ]) – The list of retry policies for the repack model step
- register_model_step_retry_policies (List [ RetryPolicy ]) – The list of retry policies for register model step
- model_package_group_name (str) – The Model Package Group name or Arn, exclusive tomodel_package_name, using model_package_group_name makes the Model Package versioned (default: None).
- model_metrics (ModelMetrics) – ModelMetrics object (default: None).
- approval_status (str) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval” (default: “PendingManualApproval”).
- image_uri (str) – The container image uri for Model Package, if not specified, Estimator’s training container image is used (default: None).
- compile_model_family (str) – The instance family for the compiled model. If specified, a compiled model is used (default: None).
- description (str) – Model Package description (default: None).
- tags (Optional [ Tags ]) – The list of tags to attach to the model package group. Note that tags will only be applied to newly created model package groups; if the name of an existing group is passed to “model_package_group_name”, tags will not be applied.
- model (object or Model) – A PipelineModel object that comprises a list of models which gets executed as a serial inference pipeline or a Model object.
- drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).
- customer_metadata_properties (dict[_str,_ str]) – A dictionary of key-value paired metadata properties (default: None).
- domain (str) – Domain values can be “COMPUTER_VISION”, “NATURAL_LANGUAGE_PROCESSING”, “MACHINE_LEARNING” (default: None).
- sample_payload_url (str) – The S3 path where the sample payload is stored (default: None).
- task (str) – Task values which are supported by Inference Recommender are “FILL_MASK”, “IMAGE_CLASSIFICATION”, “OBJECT_DETECTION”, “TEXT_GENERATION”, “IMAGE_SEGMENTATION”, “CLASSIFICATION”, “REGRESSION”, “OTHER” (default: None).
- framework (str) – Machine learning framework of the model package container image (default: None).
- framework_version (str) – Framework version of the Model Package Container Image (default: None).
- nearest_model_name (str) – Name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender (default: None).
- data_input_configuration (str) – Input object for the model (default: None).
- skip_model_validation (str) – Indicates if you want to skip model validation. Values can be “All” or “None” (default: None).
- source_uri (str) – The URI of the source for the model package (default: None).
- model_card (ModeCard or ModelPackageModelCard) – document contains qualitative and quantitative information about a model (default: None).
- model_life_cycle (ModelLifeCycle) – ModelLifeCycle object (default: None).
- **kwargs – additional arguments to create_model.
class sagemaker.workflow.step_collections.EstimatorTransformer(name, estimator, model_data, model_inputs, instance_count, instance_type, transform_inputs, description=None, display_name=None, image_uri=None, predictor_cls=None, env=None, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, max_concurrent_transforms=None, max_payload=None, tags=None, volume_kms_key=None, depends_on=None, repack_model_step_retry_policies=None, model_step_retry_policies=None, transform_step_retry_policies=None, **kwargs)
Creates a Transformer step collection for workflow.
Construct steps required for a Transformer step collection:
An estimator-centric step collection. It models what happens in workflows when invoking the transform() method on an estimator instance: First, if custom model artifacts are required, a _RepackModelStep is included. Second, aCreateModelStep with the model data passed in from a training step or other training job output. Finally, a TransformerStep.
If repacking the model artifacts is not necessary, only the CreateModelStep and TransformerStep are in the step collection.
Parameters:
- name (str) – The name of the Transform Step.
- estimator (EstimatorBase) – The estimator instance.
- instance_count (int) – The number of EC2 instances to use.
- instance_type (str) – The type of EC2 instance to use.
- strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.
- assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
- output_path (str) – The S3 location for saving the transform result. If not specified, results are stored to a default bucket.
- output_kms_key (str) – Optional. A KMS key ID for encrypting the transform output (default: None).
- accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
- env (dict) – The Environment variables to be set for use during the transform job (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – The list of Step/StepCollectionnames or Step instances or StepCollection instances that the first step in the collection depends on (default: None).
- repack_model_step_retry_policies (List [ RetryPolicy ]) – The list of retry policies for the repack model step
- model_step_retry_policies (List [ RetryPolicy ]) – The list of retry policies for model step
- transform_step_retry_policies (List [ RetryPolicy ]) – The list of retry policies for transform step
- description (str | None) –
- display_name (str | None) –
class sagemaker.workflow.model_step.ModelStep(name, step_args, depends_on=None, retry_policies=None, display_name=None, description=None, repack_model_step_settings=None)
ModelStep for SageMaker Pipelines Workflows.
Constructs a ModelStep.
Parameters:
- name (str) – The name of the ModelStep. A name is required and must be unique within a pipeline.
- step_args (_ModelStepArguments) –
The arguments for the ModelStep definition, generated by invoking the register() orcreate()under the PipelineSession. Example:
model = Model(sagemaker_session=PipelineSession())
model_step = ModelStep(step_args=model.register()) - depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step or StepCollectionnames or Step instances or StepCollection instances that it depends on. If a listed Step name does not exist, an error is returned (default: None).
- retry_policies (List [ RetryPolicy ] or Dict [_str,_ List [ RetryPolicy ] ]) –
The list of retry policies for the ModelStep (default: None).
If a list of retry policies is provided, it would be applied to all steps in theModelStep collection. Note: in this case, SageMakerJobStepRetryPolicyis not allowed, since create/register model step does not support it. Please find the example below:
ModelStep(
...
retry_policies=[
StepRetryPolicy(...),
],
)
If a dict is provided, it can specify different retry policies for different types of steps in the ModelStep collection. Similarly,SageMakerJobStepRetryPolicy is not allowed for create/register model step. See examples below:
ModelStep(
...
retry_policies=dict(
register_model_retry_policies=[
StepRetryPolicy(...),
],
repack_model_retry_policies=[
SageMakerJobStepRetryPolicy(...),
],
)
)
or
ModelStep(
...
retry_policies=dict(
create_model_retry_policies=[
StepRetryPolicy(...),
],
repack_model_retry_policies=[
SageMakerJobStepRetryPolicy(...),
],
)
) - display_name (str) – The display name of the ModelStep. The display name provides better UI readability. (default: None).
- description (str) – The description of the ModelStep (default: None).
- repack_model_step_settings (Dict [_str,_ any ]) –
The kwargs passed to the _RepackModelStep to customize the configuration of the underlying repack model job (default: None). .. rubric:: Notes- If the _RepackModelStep is unnecessary, the settings will be ignored.
- If the _RepackModelStep is added, the repack_model_step_settings
is honored if set. - In repack_model_step_settings, the arguments with misspelled keys will be
ignored. Please refer to the expected parameters of repack model job inSKLearn and its base classes.
class sagemaker.workflow.monitor_batch_transform_step.MonitorBatchTransformStep(name, transform_step_args, monitor_configuration, check_job_configuration, monitor_before_transform=False, fail_on_violation=True, supplied_baseline_statistics=None, supplied_baseline_constraints=None, display_name=None, description=None)
Creates a Transformer step with Quality or Clarify check step
Used to monitor the inputs and outputs of the batch transform job.
Construct a step collection of TransformStep, QualityCheckStep or ClarifyCheckStep
Parameters:
- name (str) – The name of the MonitorBatchTransformStep. The corresponding transform step will be named {name}-transform; and the corresponding check step will be named {name}-monitoring
- transform_step_args (_JobStepArguments) – the transform step transform arguments.
- ( Union [ (monitor_configuration) – sagemaker.workflow.quality_check_step.QualityCheckConfig,sagemaker.workflow.quality_check_step.ClarifyCheckConfig
- ] ) – the monitoring configuration used for run model monitoring.
- check_job_configuration (sagemaker.workflow.check_job_config.CheckJobConfig) – the check job (processing job) cluster resource configuration.
- monitor_before_transform (bool) – If to run data quality or model explainability monitoring type, a true value of this flag indicates running the check step before the transform job.
- fail_on_violation (Union [_bool,_ PipelineVariable]) – A opt-out flag to not to fail the check step when a violation is detected.
- supplied_baseline_statistics (Union [_str,_ PipelineVariable]) – The S3 path to the supplied statistics object representing the statistics JSON file which will be used for drift to check (default: None).
- supplied_baseline_constraints (Union [_str,_ PipelineVariable]) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
- display_name (str) – The display name of the MonitorBatchTransformStep. The display name provides better UI readability. The corresponding transform step will be named {display_name}-transform; and the corresponding check step will be named {display_name}-monitoring (default: None).
- description (str) – The description of the MonitorBatchTransformStep (default: None).
- monitor_configuration (QualityCheckConfig | ClarifyCheckConfig) –
Steps
class sagemaker.workflow.steps.StepTypeEnum(value, names=, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Enum of Step types.
class sagemaker.workflow.steps.Step(name, display_name=None, description=None, step_type=None, depends_on=None)
Pipeline Step for SageMaker Pipelines Workflows.
Initialize a Step
Parameters:
- name (str) – The name of the Step.
- display_name (str) – The display name of the Step.
- description (str) – The description of the Step.
- step_type (StepTypeEnum) – The type of the Step.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – The list of Step/StepCollectionnames or Step or StepCollection, StepOutput instances that the current Stepdepends on.
class sagemaker.workflow.steps.TrainingStep(name, step_args=None, estimator=None, display_name=None, description=None, inputs=None, cache_config=None, depends_on=None, retry_policies=None)
TrainingStep for SageMaker Pipelines Workflows.
Construct a TrainingStep, given an EstimatorBase instance.
In addition to the EstimatorBase instance, the other arguments are those that are supplied to the fit method of the sagemaker.estimator.Estimator.
Parameters:
- name (str) – The name of the TrainingStep.
- step_args (_JobStepArguments) – The arguments for the TrainingStep definition.
- estimator (EstimatorBase) – A sagemaker.estimator.EstimatorBase instance.
- display_name (str) – The display name of the TrainingStep.
- description (str) – The description of the TrainingStep.
- inputs (Union [_str,_ dict, TrainingInput, FileSystemInput]) –
Information about the training data. This can be one of three types:- (str) the S3 location where training data is saved, or a file:// path in local mode.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) If using multiple channels for training data, you can specify a dictionary mapping channel names to strings or TrainingInput() objects.
- (sagemaker.inputs.TrainingInput) - channel configuration for S3 data sources that can provide additional information as well as the path to the training dataset. See sagemaker.inputs.TrainingInput() for full details.
- (sagemaker.inputs.FileSystemInput) - channel configuration for a file system data source that can provide additional information as well as the path to the training dataset.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this TrainingStepdepends on.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies.
class sagemaker.workflow.steps.TuningStep(name, step_args=None, tuner=None, display_name=None, description=None, inputs=None, job_arguments=None, cache_config=None, depends_on=None, retry_policies=None)
TuningStep for SageMaker Pipelines Workflows.
Construct a TuningStep, given a HyperparameterTuner instance.
In addition to the HyperparameterTuner instance, the other arguments are those that are supplied to the fit method of the sagemaker.tuner.HyperparameterTuner.
Parameters:
- name (str) – The name of the TuningStep.
- step_args (_JobStepArguments) – The arguments for the TuningStep definition.
- tuner (HyperparameterTuner) – A sagemaker.tuner.HyperparameterTuner instance.
- display_name (str) – The display name of the TuningStep.
- description (str) – The description of the TuningStep.
- inputs –
Information about the training data. Please refer to thefit() method of the associated estimator, as this can take any of the following forms:- (str) - The S3 location where training data is saved.
- (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) -
If using multiple channels for training data, you can specify a dictionary mapping channel names to strings orTrainingInput() objects. - (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources
that can provide additional information about the training dataset. See sagemaker.inputs.TrainingInput() for full details. - (sagemaker.session.FileSystemInput) - channel configuration for
a file system data source that can provide additional information as well as the path to the training dataset. - (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of
Amazon :class:~`Record` objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. - (sagemaker.amazon.amazon_estimator.FileSystemRecordSet) -
Amazon SageMaker channel configuration for a file system data source for Amazon algorithms. - (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.RecordSet` objects, where each instance is a different channel of training data. - (list[sagemaker.amazon.amazon_estimator.FileSystemRecordSet]) - A list of
:class:~`sagemaker.amazon.amazon_estimator.FileSystemRecordSet` objects, where each instance is a different channel of training data.
- job_arguments (List _[_str]) – A list of strings to be passed into the processing job. Defaults to None.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this TuningStepdepends on.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies.
sagemaker.workflow.steps.TuningStep.get_top_model_s3_uri(self, top_k, s3_bucket, prefix='')
Get the model artifact S3 URI from the top performing training jobs.
Parameters:
- top_k (int) – The index of the top performing training job tuning step stores up to 50 top performing training jobs. A valid top_k value is from 0 to 49. The best training job model is at index 0.
- s3_bucket (str) – The S3 bucket to store the training job output artifact.
- prefix (str) – The S3 key prefix to store the training job output artifact.
Return type:
class sagemaker.workflow.steps.TransformStep(name, step_args=None, transformer=None, inputs=None, display_name=None, description=None, cache_config=None, depends_on=None, retry_policies=None)
TransformStep for SageMaker Pipelines Workflows.
Constructs a TransformStep, given a Transformer instance.
In addition to the Transformer instance, the other arguments are those that are supplied to the transform method of the sagemaker.transformer.Transformer.
Parameters:
- name (str) – The name of the TransformStep.
- step_args (_JobStepArguments) – The arguments for the TransformStep definition.
- transformer (Transformer) – A sagemaker.transformer.Transformer instance.
- inputs (TransformInput) – A sagemaker.inputs.TransformInput instance.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- display_name (str) – The display name of the TransformStep.
- description (str) – The description of the TransformStep.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this TransformStepdepends on.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies.
class sagemaker.workflow.steps.ProcessingStep(name, step_args=None, processor=None, display_name=None, description=None, inputs=None, outputs=None, job_arguments=None, code=None, property_files=None, cache_config=None, depends_on=None, retry_policies=None, kms_key=None)
ProcessingStep for SageMaker Pipelines Workflows.
Construct a ProcessingStep, given a Processor instance.
In addition to the Processor instance, the other arguments are those that are supplied to the process method of the sagemaker.processing.Processor.
Parameters:
- name (str) – The name of the ProcessingStep.
- step_args (_JobStepArguments) – The arguments for the ProcessingStep definition.
- processor (Processor) – A sagemaker.processing.Processor instance.
- display_name (str) – The display name of the ProcessingStep.
- description (str) – The description of the ProcessingStep
- inputs (List _[_ProcessingInput]) – A list of sagemaker.processing.ProcessorInputinstances. Defaults to None.
- outputs (List _[_ProcessingOutput]) – A list of sagemaker.processing.ProcessorOutputinstances. Defaults to None.
- job_arguments (List _[_str]) – A list of strings to be passed into the processing job. Defaults to None.
- code (str) – This can be an S3 URI or a local path to a file with the framework script to run. Defaults to None.
- property_files (List _[_PropertyFile]) – A list of property files that workflow looks for and resolves from the configured processing output list.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this ProcessingStepdepends on.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies.
- kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file. Defaults to None.
class sagemaker.workflow.notebook_job_step.NotebookJobStep(input_notebook, image_uri, kernel_name, name=None, display_name=None, description=None, notebook_job_name=None, role=None, s3_root_uri=None, parameters=None, environment_variables=None, initialization_script=None, s3_kms_key=None, instance_type='ml.m5.large', volume_size=30, volume_kms_key=None, encrypt_inter_container_traffic=True, security_group_ids=None, subnets=None, max_retry_attempts=1, max_runtime_in_seconds=172800, tags=None, additional_dependencies=None, retry_policies=None, depends_on=None)
NotebookJobStep for SageMaker Pipelines Workflows.
For more details about SageMaker Notebook Jobs, see SageMaker Notebook Jobs.
Constructs a NotebookJobStep.
Parameters:
- name (Optional _[_str]) – The name of the NotebookJobStep. If not provided, it is derived from the notebook file name.
- display_name (Optional _[_str]) – The display name of the NotebookJobStep. Default is
None
. - description (Optional _[_str]) – The description of the NotebookJobStep. Default is
None
. - notebook_job_name (Optional _[_str]) – An optional user-specified descriptive name for the notebook job. If provided, the sanitized notebook job name is used as a prefix for the underlying training job. If not provided, it is derived from the notebook file name.
- input_notebook (str) – A required local path pointing to the notebook that needs to be executed. The notebook file is uploaded to
{s3_root_uri}/{pipeline_name}/{step_name}/input-{timestamp}
in the job preparation step. - image_uri (str) –
A required universal resource identifier (URI) location of a Docker image on Amazon Elastic Container Registry (ECR). Use the following images:- SageMaker Distribution Images. See Available Amazon SageMaker Imagesto get the image ECR URIs.
- Custom images with required dependencies installed. For information about notebook job image requirements, see Image Constraints.
- kernel_name (str) – A required name of the kernel that is used to run the notebook. The kernelspec of the specified kernel needs to be registered in the image.
- role (str) –
An IAM role (either name or full ARN) used to run your SageMaker training job. Defaults to one of the following:- The SageMaker default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks.
- Otherwise, a
ValueError
is thrown.
- s3_root_uri (str) –
The root S3 folder to which the notebook job input and output are uploaded. The inputs and outputs are uploaded to the following folders, respectively:{s3_root_uri}/{pipeline_name}/{step_name}/input-{timestamp}
{s3_root_uri}/{pipeline_name}/{execution_id}/{step_name}/{job_name}/output
Note thatjob_name
is the name of the underlying SageMaker training job. - parameters (Dict [_str,_ Union [_str,_ PipelineVariable] ]) – Key-value pairs passed to the notebook execution for parameterization. Defaults to
None
. - environment_variables (Dict [_str,_ Union [_str,_ PipelineVariable] ]) – The environment variables used inside the job image container. They could be existing environment variables that you want to override, or new environment variables that you want to introduce and use in your notebook. Defaults to
None
. - initialization_script (str) – A path to a local script you can run when your notebook starts up. An initialization script is sourced from the same shell as the notebook job. This script is uploaded to
{s3_root_uri}/{pipeline_name}/{step_name}/input-{timestamp}
in the job preparation step. Defaults toNone
. - s3_kms_key (str, PipelineVariable) – A KMS key to use if you want to customize the encryption key used for your notebook job input and output. If you do not specify this field, your notebook job outputs are encrypted with SSE-KMS using the default Amazon S3 KMS key. Defaults to
None
. - instance_type (str, PipelineVariable) – The Amazon Elastic Compute Cloud (EC2) instance type to use to run the notebook job. The notebook job uses a SageMaker Training Job as a computing layer, so the specified instance type should be a SageMaker Training supported instance type. Defaults to
ml.m5.large
. - volume_size (int, PipelineVariable) – The size in GB of the storage volume for storing input and output data during training. Defaults to
30
. - volume_kms_key (str, PipelineVariable) – An Amazon Key Management Service (KMS) key used to encrypt an Amazon Elastic Block Storage (EBS) volume attached to the training instance. Defaults to
None
. - encrypt_inter_container_traffic (bool, PipelineVariable) – A flag that specifies whether traffic between training containers is encrypted for the training job. Defaults to
True
. - security_group_ids (List [_str,_ PipelineVariable]) – A list of security group IDs. Defaults to
None
and the training job is created without a VPC config. - subnets (List [_str,_ PipelineVariable]) – A list of subnet IDs. Defaults to
None
and the job is created without a VPC config. - max_retry_attempts (int) – The max number of times the job is retried after an
InternalServerFailure
error configured in the underlying SageMaker training job. Defaults to 1. - max_runtime_in_seconds (int) – The maximum length of time, in seconds, that a notebook job can run before it is stopped. If you configure both the max run time and max retry attempts, the run time applies to each retry. If a job does not complete in this time, its status is set to
Failed
. Defaults to172800 seconds(2 days)
. - tags (Optional [ Tags ]) –
Tags attached to the job. Defaults toNone
and the training
job is created without tags. Your tags control how the Studio UI captures and displays the job created by the pipeline in the following ways:- If you only attach the domain tag, then the notebook job is displayed to all user profiles and spaces.
- If the domain and user profile/space tags are attached, then the notebook job is displayed to those specific user profiles and spaces.
- If you do not attach any domain or user profile/space tags, the Studio UI does not show the notebook job created by pipeline step. You have to use the training job console to view the underlying training job.
- additional_dependencies (List_[_str] | None) – (List[str]): The list of dependencies for the notebook job. The list contains the local files or folder paths. The dependent files or folders are uploaded to
{s3_root_uri}/{pipeline_name}/{step_name}/input-{timestamp}
. If a path is pointing to a directory, the subfolders are uploaded recursively. Defaults toNone
. - sagemaker_session (sagemaker.session.Session) – The underlying SageMaker session to which SageMaker service calls are delegated. Default is
None
. If not provided, one is created using a default configuration chain. - retry_policies (List [ RetryPolicy ]) – A list of retry policies for the notebook job step.
- depends_on (List [_ _Union_ _[_Step,_ StepCollection, StepOutput] ]) – A list of Step/StepCollection/StepOutput instances on which this NotebookJobStep depends.
class sagemaker.workflow.steps.CreateModelStep(name, step_args=None, model=None, inputs=None, depends_on=None, retry_policies=None, display_name=None, description=None)
CreateModelStep for SageMaker Pipelines Workflows.
Construct a CreateModelStep, given an sagemaker.model.Model instance.
In addition to the Model instance, the other arguments are those that are supplied to the _create_sagemaker_model method of the sagemaker.model.Model._create_sagemaker_model.
Parameters:
- name (str) – The name of the CreateModelStep.
- step_args (dict) – The arguments for the CreateModelStep definition (default: None).
- model (Model or PipelineModel) – A sagemaker.model.Modelor sagemaker.pipeline.PipelineModel instance (default: None).
- inputs (CreateModelInput) – A sagemaker.inputs.CreateModelInput instance. (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this CreateModelStepdepends on (default: None).
- retry_policies (List [ RetryPolicy ]) – A list of retry policies (default: None).
- display_name (str) – The display name of the CreateModelStep (default: None).
- description (str) – The description of the CreateModelStep (default: None).
class sagemaker.workflow.callback_step.CallbackStep(name, sqs_queue_url, inputs, outputs, display_name=None, description=None, cache_config=None, depends_on=None)
Callback step for workflow.
Constructs a CallbackStep.
Parameters:
- name (str) – The name of the callback step.
- sqs_queue_url (str) – An SQS queue URL for receiving callback messages.
- inputs (dict) – Input arguments that will be provided in the SQS message body of callback messages.
- outputs (List [ CallbackOutput ]) – Outputs that can be provided when completing a callback.
- display_name (str) – The display name of the callback step.
- description (str) – The description of the callback step.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this CallbackStepdepends on.
class sagemaker.workflow.steps.CacheConfig(enable_caching=False, expire_after=None)
Configuration class to enable caching in SageMaker Pipelines Workflows.
If caching is enabled, the pipeline attempts to find a previous execution of a Stepthat was called with the same arguments. Step caching only considers successful execution. If a successful previous execution is found, the pipeline propagates the values from the previous execution rather than recomputing the Step. When multiple successful executions exist within the timeout period, it uses the result for the most recent successful execution.
Parameters:
enable_caching (bool) –
enable_caching
To enable Step caching. Defaults to False.
Type:
expire_after
If Step caching is enabled, a timeout also needs to defined. It defines how old a previous execution can be to be considered for reuse. Value should be an ISO 8601 duration string. Defaults to None.
Examples:
'p30d' # 30 days 'P4DT12H' # 4 days and 12 hours 'T12H' # 12 hours
Type:
Method generated by attrs for class CacheConfig.
class sagemaker.workflow.lambda_step.LambdaStep(name, lambda_func, display_name=None, description=None, inputs=None, outputs=None, cache_config=None, depends_on=None)
Lambda step for workflow.
Constructs a LambdaStep.
Parameters:
- name (str) – The name of the lambda step.
- display_name (str) – The display name of the Lambda step.
- description (str) – The description of the Lambda step.
- lambda_func (str) – An instance of sagemaker.lambda_helper.Lambda. If lambda arn is specified in the instance, LambdaStep just invokes the function, else lambda function will be created while creating the pipeline.
- inputs (dict) – Input arguments that will be provided to the lambda function.
- outputs (List [ LambdaOutput ]) – List of outputs from the lambda function.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this LambdaStepdepends on.
class sagemaker.workflow.quality_check_step.QualityCheckConfig(baseline_dataset, dataset_format, *, output_s3_uri=None, post_analytics_processor_script=None)
Quality Check Config.
Parameters:
- baseline_dataset (str | PipelineVariable) –
- dataset_format (dict) –
- output_s3_uri (str | PipelineVariable) –
- post_analytics_processor_script (str) –
baseline_dataset
The path to the baseline_dataset file. This can be a local path or an S3 uri string
Type:
dataset_format
The format of the baseline_dataset.
Type:
output_s3_uri
Desired S3 destination of the constraint_violations and statistics json files (default: None). If not specified an auto generated path will be used: “s3://<default_session_bucket>/model-monitor/baselining/<job_name>/results”
Type:
post_analytics_processor_script
The path to the record post-analytics processor script (default: None). This can be a local path or an S3 uri string but CANNOT be any type of the PipelineVariable.
Type:
Method generated by attrs for class QualityCheckConfig.
class sagemaker.workflow.quality_check_step.QualityCheckStep(name, quality_check_config, check_job_config, skip_check=False, fail_on_violation=True, register_new_baseline=False, model_package_group_name=None, supplied_baseline_statistics=None, supplied_baseline_constraints=None, display_name=None, description=None, cache_config=None, depends_on=None)
QualityCheck step for workflow.
Constructs a QualityCheckStep.
To understand the skip_check, fail_on_violation, register_new_baseline,supplied_baseline_constraints and supplied_baseline_constraints parameters, check the following documentation:https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html
Parameters:
- name (str) – The name of the QualityCheckStep step.
- quality_check_config (QualityCheckConfig) – A QualityCheckConfig instance.
- check_job_config (CheckJobConfig) – A CheckJobConfig instance.
- skip_check (bool or PipelineVariable) – Whether the check should be skipped (default: False).
- fail_on_violation (bool or PipelineVariable) – Whether to fail the step if violation detected (default: True).
- register_new_baseline (bool or PipelineVariable) – Whether the new baseline should be registered (default: False).
- model_package_group_name (str or PipelineVariable) – The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
- supplied_baseline_statistics (str or PipelineVariable) – The S3 path to the supplied statistics object representing the statistics JSON file which will be used for drift to check (default: None).
- supplied_baseline_constraints (str or PipelineVariable) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
- display_name (str) – The display name of the QualityCheckStep step (default: None).
- description (str) – The description of the QualityCheckStep step (default: None).
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this QualityCheckStepdepends on (default: None).
class sagemaker.workflow.clarify_check_step.ClarifyCheckConfig(data_config, *, kms_key=None, monitoring_analysis_config_uri=None)
Clarify Check Config
Parameters:
- data_config (DataConfig) –
- kms_key (str) –
- monitoring_analysis_config_uri (str) –
data_config
Config of the input/output data.
Type:
kms_key
The ARN of the KMS key that is used to encrypt the user code file (default: None). This field CANNOT be any type of the PipelineVariable.
Type:
monitoring_analysis_config_uri
(str): The uri of monitoring analysis config. This field does not take input. It will be generated once uploading the created analysis config file.
Type:
Method generated by attrs for class ClarifyCheckConfig.
class sagemaker.workflow.clarify_check_step.ClarifyCheckStep(name, clarify_check_config, check_job_config, skip_check=False, fail_on_violation=True, register_new_baseline=False, model_package_group_name=None, supplied_baseline_constraints=None, display_name=None, description=None, cache_config=None, depends_on=None)
ClarifyCheckStep step for workflow.
Constructs a ClarifyCheckStep.
To understand the skip_check, fail_on_violation, register_new_baselineand supplied_baseline_constraints parameters, check the following documentation:https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html
Parameters:
- name (str) – The name of the ClarifyCheckStep step.
- clarify_check_config (ClarifyCheckConfig) – A ClarifyCheckConfig instance.
- check_job_config (CheckJobConfig) – A CheckJobConfig instance.
- skip_check (bool or PipelineVariable) – Whether the check should be skipped (default: False).
- fail_on_violation (bool or PipelineVariable) – Whether to fail the step if violation detected (default: True).
- register_new_baseline (bool or PipelineVariable) – Whether the new baseline should be registered (default: False).
- model_package_group_name (str or PipelineVariable) – The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
- supplied_baseline_constraints (str or PipelineVariable) – The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
- display_name (str) – The display name of the ClarifyCheckStep step (default: None).
- description (str) – The description of the ClarifyCheckStep step (default: None).
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this ClarifyCheckStepdepends on (default: None).
class sagemaker.workflow.fail_step.FailStep(name, error_message=None, display_name=None, description=None, depends_on=None)
FailStep for SageMaker Pipelines Workflows.
Constructs a FailStep.
Parameters:
- name (str) – The name of the FailStep. A name is required and must be unique within a pipeline.
- error_message (str or PipelineVariable) – An error message defined by the user. Once the FailStep is reached, the execution fails and the error message is set as the failure reason (default: None).
- display_name (str) – The display name of the FailStep. The display name provides better UI readability. (default: None).
- description (str) – The description of the FailStep (default: None).
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this FailStepdepends on. If a listed Step name does not exist, an error is returned (default: None).
class sagemaker.workflow.emr_step.EMRStepConfig(jar, args=None, main_class=None, properties=None)
Config for a Hadoop Jar step.
Create a definition for input data used by an EMR cluster(job flow) step.
See AWS documentation for more information about the StepConfig API parameters.
Parameters:
- args (List _[_str]) – A list of command line arguments passed to the JAR file’s main function when executed.
- jar (str) – A path to a JAR file run during the step.
- main_class (str) – The name of the main class in the specified Java file.
- properties (List (dict)) – A list of key-value pairs that are set when the step runs.
class sagemaker.workflow.emr_step.EMRStep(name, display_name, description, cluster_id, step_config, depends_on=None, cache_config=None, cluster_config=None, execution_role_arn=None)
EMR step for workflow.
Constructs an EMRStep.
Parameters:
- name (str) – The name of the EMR step.
- display_name (str) – The display name of the EMR step.
- description (str) – The description of the EMR step.
- cluster_id (str) – The ID of the running EMR cluster.
- step_config (EMRStepConfig) – One StepConfig to be executed by the job flow.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this EMRStepdepends on.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- cluster_config (Dict [_str,_ Any ]) –
The recipe of the EMR cluster, passed as a dictionary. The elements are defined in the request syntax for RunJobFlow. However, the following elements are not recognized as part of the cluster configuration and you should not include them in the dictionary:cluster_config[Name]
cluster_config[Steps]
cluster_config[AutoTerminationPolicy]
cluster_config[Instances][KeepJobFlowAliveWhenNoSteps]
cluster_config[Instances][TerminationProtected]
For more information about the fields you can include in your cluster configuration, seehttps://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html. Note that if you want to usecluster_config
, then you have to setcluster_id
as None.
- execution_role_arn (str) – The ARN of the runtime role assumed by this EMRStep. The job submitted to your EMR cluster uses this role to access AWS resources. This value is passed as ExecutionRoleArn to the AddJobFlowSteps request (an EMR request) called on the cluster specified by
cluster_id
, so you can only include this field ifcluster_id
is not None.
class sagemaker.workflow.automl_step.AutoMLStep(name, step_args, display_name=None, description=None, cache_config=None, depends_on=None, retry_policies=None)
AutoMLStep for SageMaker Pipelines Workflows.
Construct a AutoMLStep, given a AutoML instance.
In addition to the AutoML instance, the other arguments are those that are supplied to the fit method of the sagemaker.automl.automl.AutoML.
Parameters:
- name (str) – The name of the AutoMLStep.
- step_args (_JobStepArguments) – The arguments for the AutoMLStep definition.
- display_name (str) – The display name of the AutoMLStep.
- description (str) – The description of the AutoMLStep.
- cache_config (CacheConfig) – A sagemaker.workflow.steps.CacheConfig instance.
- depends_on (List [_ _Union_ _[_str,_ Step, StepCollection] ]) – A list of Step/StepCollectionnames or Step instances or StepCollection instances that this AutoMLStepdepends on.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies.
@step decorator
function_step.step(*, name=None, display_name=None, description=None, retry_policies=None, dependencies=None, pre_execution_commands=None, pre_execution_script=None, environment_variables=None, image_uri=None, instance_count=1, instance_type=None, job_conda_env=None, job_name_prefix=None, keep_alive_period_in_seconds=0, max_retry_attempts=1, max_runtime_in_seconds=86400, role=None, security_group_ids=None, subnets=None, tags=None, volume_kms_key=None, volume_size=30, encrypt_inter_container_traffic=None, spark_config=None, use_spot_instances=False, max_wait_time_in_seconds=None)
Decorator for converting a python function to a pipeline step.
This decorator wraps the annotated code into a DelayedReturn object which can then be passed to a pipeline as a step. This creates a new pipeline that proceeds from the step of theDelayedReturn object.
If the value for a parameter is not set, the decorator first looks up the value from the SageMaker configuration file. If no value is specified in the configuration file or no configuration file is found, the decorator selects the default as specified in the following list. For more information, see Configuring and using defaults with the SageMaker Python SDK.
Parameters:
- _func – A Python function to run as a SageMaker pipeline step.
- name (str) – Name of the pipeline step. Defaults to a generated name using function name and uuid4 identifier to avoid duplicates.
- display_name (str) – The display name of the pipeline step. Defaults to the function name.
- description (str) – The description of the pipeline step. Defaults to the function docstring. If there is no docstring, then it defaults to the function file path.
- retry_policies (List [ RetryPolicy ]) – A list of retry policies configured for this step. Defaults to
None
. - dependencies (str) –
The path to a dependencies file. Defaults toNone
. Ifdependencies
is provided, the value must be one of the following:- A path to a conda environment.yml file. The following conditions apply:
* Ifjob_conda_env
is set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.
* If the environment variableSAGEMAKER_JOB_CONDA_ENV
is set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set withSAGEMAKER_JOB_CONDA_ENV
, andSAGEMAKER_JOB_CONDA_ENV
must already exist in the image.
* If none of the previous conditions are met, a new conda environment namedsagemaker-runtime-env
is created and the function annotated with the remote decorator is invoked in that conda environment. - A path to a requirements.txt file. The following conditions apply:
* Ifjob_conda_env
is set in the remote decorator, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same conda environment. For this to succeed, the specified conda environment must already exist in the image.
* If an environment variableSAGEMAKER_JOB_CONDA_ENV
is set in the image, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the environment. For this to succeed, the conda environment name must already be set inSAGEMAKER_JOB_CONDA_ENV
, andSAGEMAKER_JOB_CONDA_ENV
must already exist in the image.
* If none of the above conditions are met, conda is not used. Dependencies are installed at the system level without any virtual environment, and the function annotated with the remote decorator is invoked using the Python runtime available in the system path. None
. SageMaker assumes that there are no dependencies to install while executing the remote annotated function in the training job.
- A path to a conda environment.yml file. The following conditions apply:
- pre_execution_commands (List _[_str]) – A list of commands to be executed prior to executing the pipeline step. Only one of
pre_execution_commands
orpre_execution_script
can be specified at the same time. Defaults toNone
. - pre_execution_script (str) – A path to a script file to be executed prior to executing the pipeline step. Only one of
pre_execution_commands
orpre_execution_script
can be specified at the same time. Defaults toNone
. - environment_variables (dict[_str,_ str] or dict[_str,_ PipelineVariable]) – Environment variables to be used inside the step. Defaults to
None
. - image_uri (str, PipelineVariable) –
The universal resource identifier (URI) location of a Docker image on Amazon Elastic Container Registry (ECR). Defaults to the following, based on where the SDK is running:- If you specify
spark_config
and want to run the step in a Spark application, theimage_uri
should beNone
. A SageMaker Spark image is used for training, otherwise aValueError
is thrown. - If you use SageMaker Studio notebooks, the image used as the kernel image for the notebook is used.
- Otherwise, it is resolved to a base python image with the same python version as the environment running the local code.
If no compatible image is found, aValueError
is thrown.
- If you specify
- instance_count (int, PipelineVariable) – The number of instances to use. Defaults to 1. Note that pipeline steps do not support values of
instance_count
greater than 1 for non-Spark jobs. - instance_type (str, PipelineVariable) – The Amazon Elastic Compute Cloud (EC2) instance type to use to run the SageMaker job. For example,
ml.c4.xlarge
. If not provided, aValueError
is thrown. - job_conda_env (str, PipelineVariable) – The name of the conda environment to activate during the job’s runtime. Defaults to
None
. - job_name_prefix (str) – The prefix used to create the underlying SageMaker job.
- keep_alive_period_in_seconds (int, PipelineVariable) – The duration in seconds to retain and reuse provisioned infrastructure after the completion of a training job. This infrastructure is also known as SageMaker managed warm pools. The use of warm pools reduces the latency time spent to provision new resources. The default value for
keep_alive_period_in_seconds
is 0. Note that additional charges associated with warm pools may apply. Using this parameter also activates a new persistent cache feature which reduces job start up latency more than if you were to use SageMaker managed warm pools alone. This occurs because the package source downloaded in the previous runs are cached. - max_retry_attempts (int, PipelineVariable) – The max number of times the job is retried after an
InternalServerFailure
error from the SageMaker service. Defaults to 1. - max_runtime_in_seconds (int, PipelineVariable) – The upper limit in seconds to be used for training. After this specified amount of time, SageMaker terminates the job regardless of its current status. Defaults to 1 day or (86400 seconds).
- role (str) –
The IAM role (either name or full ARN) used to run your SageMaker training job. Defaults to one of the following:- The SageMaker default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks.
- Otherwise, a
ValueError
is thrown.
- security_group_ids (List [_str,_ PipelineVariable]) – A list of security group IDs. Defaults to
None
and the training job is created without a VPC config. - subnets (List [_str,_ PipelineVariable]) – A list of subnet IDs. Defaults to
None
and the job is created without a VPC config. - tags (Optional [ Tags ]) – Tags attached to the job. Defaults to
None
and the training job is created without tags. - volume_kms_key (str, PipelineVariable) – An Amazon Key Management Service (KMS) key used to encrypt an Amazon Elastic Block Storage (EBS) volume attached to the training instance. Defaults to
None
. - volume_size (int, PipelineVariable) – The size in GB of the storage volume that stores input and output data during training. Defaults to
30
. - encrypt_inter_container_traffic (bool, PipelineVariable) – A flag that specifies whether traffic between training containers is encrypted for the training job. Defaults to
False
. - spark_config (SparkConfig) – Configurations of the Spark application that runs on the Spark image. If
spark_config
is specified, a SageMaker Spark image URI is used for training. Note thatimage_uri
can not be specified at the same time, otherwise aValueError
is thrown. Defaults toNone
. - use_spot_instances (bool, PipelineVariable) – Specifies whether to use SageMaker Managed Spot instances for training. If enabled, then
max_wait_time_in_seconds
argument should also be set. Defaults toFalse
. - max_wait_time_in_seconds (int, PipelineVariable) – Timeout in seconds waiting for the spot training job. After this amount of time, Amazon SageMaker stops waiting for the managed spot training job to complete. Defaults to
None
.
class sagemaker.workflow.function_step.DelayedReturn(function_step, reference_path=())
A proxy to the function returns of arbitrary type.
When a function decorated with @step
is invoked, the return of that function is of type DelayedReturn. If the DelayedReturn object represents a Python collection, such as a tuple, list, or dict, you can reference the child items in the following ways:
a_member = a_delayed_return[2]
a_member = a_delayed_return["a_key"]
a_member = a_delayed_return[2]["a_key"]
Initializes a DelayedReturn object.
Parameters:
- function_step (_FunctionStep) – A sagemaker.workflow.step._FunctionStep instance.
- reference_path (tuple) – A tuple that represents the path to the child member.
class sagemaker.workflow.step_outputs.StepOutput(step=None)
Base class representing @step
decorator outputs.
Initializes a StepOutput object.
Parameters:
step (Step) – A sagemaker.workflow.steps.Step instance.
sagemaker.workflow.step_outputs.get_step(step_output)
Get the step associated with this output.
Parameters:
step_output (StepOutput) – A sagemaker.workflow.steps.StepOutput instance.
Returns:
A sagemaker.workflow.steps.Step instance.