PyTorch — sagemaker 2.247.0 documentation (original) (raw)

PyTorch Estimator

class sagemaker.pytorch.estimator.PyTorch(entry_point=None, framework_version=None, py_version=None, source_dir=None, hyperparameters=None, image_uri=None, distribution=None, compiler_config=None, training_recipe=None, recipe_overrides=None, **kwargs)

Bases: Framework

Handle end-to-end training and deployment of custom PyTorch code.

This Estimator executes a PyTorch script in a managed PyTorch execution environment.

The managed PyTorch environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script within a SageMaker Training Job.

Training is started by callingfit() on this Estimator. After training is complete, callingdeploy() creates a hosted SageMaker endpoint and returns anPyTorchPredictor instance that can be used to perform inference against the hosted model.

Technical documentation on preparing PyTorch scripts for SageMaker training and using the PyTorch Estimator is available on the project home-page: https://github.com/aws/sagemaker-python-sdk

Parameters:

       "enabled": True,  
       "parameters": {  
           "tensor_parallel_degree": 8,  
           "hybrid_shard_degree": 1,  
           ...  
       },  
   }  

},
}

Beside activating the SMP library v2 through this parameter, you also need to add few lines of code in your training script for initializing PyTorch Distributed with the SMP setups. To learn how to configure your training job with the SMP library v2, seeRun distributed training with the SageMaker model parallelism library v2in the Amazon SageMaker User Guide.

Note

The SageMaker distributed model parallel library v2 requires withtorch_distributed.
To enable PyTorch DDP:

To enable Torch Distributed:
This is available for general distributed training on GPU instances from PyTorch v1.13.1 and later.

{
"torch_distributed": {
"enabled": True
}
}

This option also supports distributed training on Trn1. To learn more, see Distributed PyTorch Training on Trainium.
To enable MPI:

To enable parameter server:

To enable distributed training with SageMaker Training Compiler:
{
"pytorchxla": {
"enabled": True
}
}

To learn more, see SageMaker Training Compilerin the Amazon SageMaker Developer Guide.

Note

When you use this PyTorch XLA option for distributed training strategy, you must add the compiler_config parameter and activate SageMaker Training Compiler.
compiler_config (TrainingCompilerConfig): Configures SageMaker Training Compiler to accelerate training.

LAUNCH_PYTORCH_DDP_ENV_NAME = 'sagemaker_pytorch_ddp_enabled'

LAUNCH_TORCH_DISTRIBUTED_ENV_NAME = 'sagemaker_torch_distributed_enabled'

INSTANCE_TYPE_ENV_NAME = 'sagemaker_instance_type'

hyperparameters()

Return hyperparameters used by your custom PyTorch code during model training.

create_model(model_server_workers=None, role=None, vpc_config_override='VPC_CONFIG_DEFAULT', entry_point=None, source_dir=None, dependencies=None, **kwargs)

Create a SageMaker PyTorchModel object that can be deployed to an Endpoint.

Parameters:

Returns:

A SageMaker PyTorchModelobject. See PyTorchModel() for full details.

Return type:

sagemaker.pytorch.model.PyTorchModel

PyTorch Model

class sagemaker.pytorch.model.PyTorchModel(model_data, role=None, entry_point=None, framework_version='1.3', py_version=None, image_uri=None, predictor_cls=<class 'sagemaker.pytorch.model.PyTorchPredictor'>, model_server_workers=None, **kwargs)

Bases: FrameworkModel

An PyTorch SageMaker Model that can be deployed to a SageMaker Endpoint.

Initialize a PyTorchModel.

Parameters:

Tip

You can find additional parameters for initializing this class atFrameworkModel andModel.

register(content_types=None, response_types=None, inference_instances=None, transform_instances=None, model_package_name=None, model_package_group_name=None, image_uri=None, model_metrics=None, metadata_properties=None, marketplace_cert=False, approval_status=None, description=None, drift_check_baselines=None, customer_metadata_properties=None, domain=None, sample_payload_url=None, task=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, skip_model_validation=None, source_uri=None, model_card=None, model_life_cycle=None)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters:

Returns:

A sagemaker.model.ModelPackage instance.

prepare_container_def(instance_type=None, accelerator_type=None, serverless_inference_config=None, accept_eula=None, model_reference_arn=None)

A container definition with framework configuration set in model environment variables.

Parameters:

Returns:

A container definition object usable with the CreateModel API.

Return type:

dict[str, str]

serving_image_uri(region_name, instance_type, accelerator_type=None, serverless_inference_config=None)

Create a URI for the serving image.

Parameters:

Returns:

The appropriate image URI based on the given parameters.

Return type:

str

PyTorch Predictor

class sagemaker.pytorch.model.PyTorchPredictor(endpoint_name, sagemaker_session=None, serializer=<sagemaker.base_serializers.NumpySerializer object>, deserializer=<sagemaker.base_deserializers.NumpyDeserializer object>, component_name=None)

Bases: Predictor

A Predictor for inference against PyTorch Endpoints.

This is able to serialize Python lists, dictionaries, and numpy arrays to multidimensional tensors for PyTorch inference.

Initialize an PyTorchPredictor.

Parameters:

PyTorch Processor

class sagemaker.pytorch.processing.PyTorchProcessor(framework_version, role=None, instance_count=None, instance_type=None, py_version='py3', image_uri=None, command=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, code_location=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: FrameworkProcessor

Handles Amazon SageMaker processing tasks for jobs using PyTorch containers.

This processor executes a Python script in a PyTorch execution environment.

Unless image_uri is specified, the PyTorch environment is an Amazon-built Docker container that executes functions defined in the suppliedcode Python script.

The arguments have the exact same meaning as in FrameworkProcessor.

Parameters:

estimator_cls

alias of PyTorch