Processing — sagemaker 2.247.0 documentation (original) (raw)

This module contains code related to the Processor class.

which is used for Amazon SageMaker Processing Jobs. These jobs let users perform data pre-processing, post-processing, feature engineering, data validation, and model evaluation, and interpretation on Amazon SageMaker.

class sagemaker.processing.Processor(role=None, image_uri=None, instance_count=None, instance_type=None, entrypoint=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: object

Handles Amazon SageMaker Processing tasks.

Initializes a Processor instance.

The Processor handles Amazon SageMaker Processing tasks.

Parameters:

JOB_CLASS_NAME = 'processing-job'

run(inputs=None, outputs=None, arguments=None, wait=True, logs=True, job_name=None, experiment_config=None, kms_key=None)

Runs a processing job.

Parameters:

Returns:

None or pipeline step arguments in case the Processor instance is built withPipelineSession

Raises:

ValueError – if logs is True but wait is False.

class sagemaker.processing.ScriptProcessor(role=None, image_uri=None, command=None, instance_count=None, instance_type=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: Processor

Handles Amazon SageMaker processing tasks for jobs using a machine learning framework.

Initializes a ScriptProcessor instance.

The ScriptProcessor handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, which allows for providing a script to be run as part of the Processing Job.

Parameters:

get_run_args(code, inputs=None, outputs=None, arguments=None)

Returns a RunArgs object.

For processors (PySparkProcessor,SparkJar) that have special run() arguments, this object contains the normalized arguments for passing toProcessingStep.

Parameters:

run(code, inputs=None, outputs=None, arguments=None, wait=True, logs=True, job_name=None, experiment_config=None, kms_key=None)

Runs a processing job.

Parameters:

Returns:

None or pipeline step arguments in case the Processor instance is built withPipelineSession

class sagemaker.processing.ProcessingJob(sagemaker_session, job_name, inputs, outputs, output_kms_key=None)

Bases: _Job

Provides functionality to start, describe, and stop processing jobs.

Initializes a Processing job.

Parameters:

classmethod start_new(processor, inputs, outputs, experiment_config)

Starts a new processing job using the provided inputs and outputs.

Parameters:

Returns:

The instance of ProcessingJob created

using the Processor.

Return type:

ProcessingJob

classmethod from_processing_name(sagemaker_session, processing_job_name)

Initializes a ProcessingJob from a processing job name.

Parameters:

Returns:

The instance of ProcessingJob created

from the job name.

Return type:

ProcessingJob

classmethod from_processing_arn(sagemaker_session, processing_job_arn)

Initializes a ProcessingJob from a Processing ARN.

Parameters:

Returns:

The instance of ProcessingJob created

from the processing job’s ARN.

Return type:

ProcessingJob

wait(logs=True)

Waits for the processing job to complete.

Parameters:

logs (bool) – Whether to show the logs produced by the job (default: True).

describe()

Prints out a response from the DescribeProcessingJob API call.

stop()

Stops the processing job.

static prepare_app_specification(container_arguments, container_entrypoint, image_uri)

Prepares a dict that represents a ProcessingJob’s AppSpecification.

Parameters:

Returns:

Represents AppSpecification which configures the processing job to run a specified Docker container image.

Return type:

dict

static prepare_output_config(kms_key_id, outputs)

Prepares a dict that represents a ProcessingOutputConfig.

Parameters:

Returns:

Represents output configuration for the processing job.

Return type:

dict

static prepare_processing_resources(instance_count, instance_type, volume_kms_key_id, volume_size_in_gb)

Prepares a dict that represents the ProcessingResources.

Parameters:

Returns:

Represents ProcessingResources which identifies the resources,

ML compute instances, and ML storage volumes to deploy for a processing job.

Return type:

dict

static prepare_stopping_condition(max_runtime_in_seconds)

Prepares a dict that represents the job’s StoppingCondition.

Parameters:

max_runtime_in_seconds (int) – Specifies the maximum runtime in seconds.

Returns:

dict

class sagemaker.processing.ProcessingInput(source=None, destination=None, input_name=None, s3_data_type='S3Prefix', s3_input_mode='File', s3_data_distribution_type='FullyReplicated', s3_compression_type='None', s3_input=None, dataset_definition=None, app_managed=False)

Bases: object

Accepts parameters that specify an Amazon S3 input for a processing job.

Also provides a method to turn those parameters into a dictionary.

Initializes a ProcessingInput instance.

ProcessingInput accepts parameters that specify an Amazon S3 input for a processing job and provides a method to turn those parameters into a dictionary.

Parameters:

class sagemaker.processing.ProcessingOutput(source=None, destination=None, output_name=None, s3_upload_mode='EndOfJob', app_managed=False, feature_store_output=None)

Bases: object

Accepts parameters that specify an Amazon S3 output for a processing job.

It also provides a method to turn those parameters into a dictionary.

Initializes a ProcessingOutput instance.

ProcessingOutput accepts parameters that specify an Amazon S3 output for a processing job and provides a method to turn those parameters into a dictionary.

Parameters:

class sagemaker.processing.RunArgs(code, inputs=None, outputs=None, arguments=None)

Bases: object

Accepts parameters that correspond to ScriptProcessors.

An instance of this class is returned from the get_run_args() method on processors, and is used for normalizing the arguments so that they can be passed toProcessingStep

Parameters:

Method generated by attrs for class RunArgs.

class sagemaker.processing.FeatureStoreOutput(**kwargs)

Bases: ApiObject

Configuration for processing job outputs in Amazon SageMaker Feature Store.

Init ApiObject.

feature_group_name_: str | None_ = None

class sagemaker.processing.FrameworkProcessor(estimator_cls, framework_version, role=None, instance_count=None, instance_type=None, py_version='py3', image_uri=None, command=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, code_location=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: ScriptProcessor

Handles Amazon SageMaker processing tasks for jobs using a machine learning framework.

Initializes a FrameworkProcessor instance.

The FrameworkProcessor handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, which allows for a set of Python scripts to be run as part of the Processing Job.

Parameters:

framework_entrypoint_command = ['/bin/bash']

get_run_args(code, source_dir=None, dependencies=None, git_config=None, inputs=None, outputs=None, arguments=None, job_name=None)

Returns a RunArgs object.

This object contains the normalized inputs, outputs and arguments needed when using a FrameworkProcessor in a ProcessingStep.

Parameters:

run(code, source_dir=None, dependencies=None, git_config=None, inputs=None, outputs=None, arguments=None, wait=True, logs=True, job_name=None, experiment_config=None, kms_key=None, codeartifact_repo_arn=None)

Runs a processing job.

Parameters:

Returns:

None or pipeline step arguments in case the Processor instance is built withPipelineSession

This module is the entry to run spark processing script.

This module contains code related to Spark Processors, which are used for Processing jobs. These jobs let customers perform data pre-processing, post-processing, feature engineering, data validation, and model evaluation on SageMaker using Spark and PySpark.

class sagemaker.spark.processing.PySparkProcessor(role=None, instance_type=None, instance_count=None, framework_version=None, py_version=None, container_version=None, image_uri=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, configuration_location=None, dependency_location=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: _SparkProcessorBase

Handles Amazon SageMaker processing tasks for jobs using PySpark.

Initialize an PySparkProcessor instance.

The PySparkProcessor handles Amazon SageMaker processing tasks for jobs using SageMaker PySpark.

Parameters:

get_run_args(submit_app, submit_py_files=None, submit_jars=None, submit_files=None, inputs=None, outputs=None, arguments=None, job_name=None, configuration=None, spark_event_logs_s3_uri=None)

Returns a RunArgs object.

This object contains the normalized inputs, outputs and arguments needed when using a PySparkProcessor in aProcessingStep.

Parameters:

run(submit_app, submit_py_files=None, submit_jars=None, submit_files=None, inputs=None, outputs=None, arguments=None, wait=True, logs=True, job_name=None, experiment_config=None, configuration=None, spark_event_logs_s3_uri=None, kms_key=None)

Runs a processing job.

Parameters:

class sagemaker.spark.processing.SparkJarProcessor(role=None, instance_type=None, instance_count=None, framework_version=None, py_version=None, container_version=None, image_uri=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, configuration_location=None, dependency_location=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)

Bases: _SparkProcessorBase

Handles Amazon SageMaker processing tasks for jobs using Spark with Java or Scala Jars.

Initialize a SparkJarProcessor instance.

The SparkProcessor handles Amazon SageMaker processing tasks for jobs using SageMaker Spark.

Parameters:

get_run_args(submit_app, submit_class=None, submit_jars=None, submit_files=None, inputs=None, outputs=None, arguments=None, job_name=None, configuration=None, spark_event_logs_s3_uri=None)

Returns a RunArgs object.

This object contains the normalized inputs, outputs and arguments needed when using a SparkJarProcessor in aProcessingStep.

Parameters:

run(submit_app, submit_class, submit_jars=None, submit_files=None, inputs=None, outputs=None, arguments=None, wait=True, logs=True, job_name=None, experiment_config=None, configuration=None, spark_event_logs_s3_uri=None, kms_key=None)

Runs a processing job.

Parameters:

class sagemaker.spark.processing.FileType(value, names=, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum of file type

JAR = 1

PYTHON = 2

FILE = 3

class sagemaker.spark.processing.SparkConfigUtils

Bases: object

Util class for spark configurations

static validate_configuration(configuration)

Validates the user-provided Hadoop/Spark/Hive configuration.

This ensures that the list or dictionary the user provides will serialize to JSON matching the schema of EMR’s application configuration

Parameters:

configuration (Dict) – A dict that contains the configuration overrides to the default values. For more information, please visit:https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

static validate_s3_uri(spark_output_s3_path)

Validate whether the URI uses an S3 scheme.

In the future, this validation will perform deeper S3 validation.

Parameters:

spark_output_s3_path (str) – The URI of the Spark output S3 Path.

This module configures the SageMaker Clarify bias and model explainability processor jobs.

SageMaker Clarify

class sagemaker.clarify.DatasetType(value, names=, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum to store different dataset types supported in the Analysis config file

TEXTCSV = 'text/csv'

JSONLINES = 'application/jsonlines'

JSON = 'application/json'

PARQUET = 'application/x-parquet'

IMAGE = 'application/x-image'

class sagemaker.clarify.TimeSeriesJSONDatasetFormat(value, names=, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Possible dataset formats for JSON time series data files.

Below is an example COLUMNS dataset for time series explainability:

{ "ids": [1, 2], "timestamps": [3, 4], "target_ts": [5, 6], "rts1": [0.25, 0.5], "rts2": [1.25, 1.5], "scv1": [10, 20], "scv2": [30, 40] }

For this example, JMESPaths are specified when creating TimeSeriesDataConfig as follows:

item_id="ids" timestamp="timestamps" target_time_series="target_ts" related_time_series=["rts1", "rts2"] static_covariates=["scv1", "scv2"]

Below is an example ITEM_RECORDS dataset for time series explainability:

[ { "id": 1, "scv1": 10, "scv2": "red", "timeseries": [ {"timestamp": 1, "target_ts": 5, "rts1": 0.25, "rts2": 10}, {"timestamp": 2, "target_ts": 6, "rts1": 0.35, "rts2": 20}, {"timestamp": 3, "target_ts": 4, "rts1": 0.45, "rts2": 30} ] }, { "id": 2, "scv1": 20, "scv2": "blue", "timeseries": [ {"timestamp": 1, "target_ts": 4, "rts1": 0.25, "rts2": 40}, {"timestamp": 2, "target_ts": 2, "rts1": 0.35, "rts2": 50} ] } ]

For this example, JMESPaths are specified when creating TimeSeriesDataConfig as follows:

item_id="[].id" timestamp="[].timeseries[].timestamp" target_time_series="[].timeseries[].target_ts" related_time_series=["[].timeseries[].rts1", "[].timeseries[].rts2"] static_covariates=["[].scv1", "[*].scv2"]

Below is an example TIMESTAMP_RECORDS dataset for time series explainability:

[ {"id": 1, "timestamp": 1, "target_ts": 5, "scv1": 10, "rts1": 0.25}, {"id": 1, "timestamp": 2, "target_ts": 6, "scv1": 10, "rts1": 0.5}, {"id": 1, "timestamp": 3, "target_ts": 3, "scv1": 10, "rts1": 0.75}, {"id": 2, "timestamp": 5, "target_ts": 10, "scv1": 20, "rts1": 1} ]

For this example, JMESPaths are specified when creating TimeSeriesDataConfig as follows:

item_id="[].id" timestamp="[].timestamp" target_time_series="[].target_ts" related_time_series=["[].rts1"] static_covariates=["[*].scv1"]

COLUMNS = 'columns'

ITEM_RECORDS = 'item_records'

TIMESTAMP_RECORDS = 'timestamp_records'

class sagemaker.clarify.SegmentationConfig(name_or_index, segments, config_name=None, display_aliases=None)

Bases: object

Config object that defines segment(s) of the dataset on which metrics are computed.

Initializes a segmentation configuration for a dataset column.

Parameters:

Raises:

ValueError – when the name_or_index is None, segments is invalid, or a wrong number of display_aliases are specified.

to_dict()

Returns SegmentationConfig as a dict.

Return type:

Dict[str, Any]

class sagemaker.clarify.TimeSeriesDataConfig(target_time_series, item_id, timestamp, related_time_series=None, static_covariates=None, dataset_format=None)

Bases: object

Config object for TimeSeries explainability data configuration fields.

Initialises TimeSeries explainability data configuration fields.

Parameters:

Raises:

ValueError – If any required arguments are not provided or are the wrong type.

get_time_series_data_config()

Returns part of an analysis config dictionary.

class sagemaker.clarify.DataConfig(s3_data_input_path, s3_output_path, s3_analysis_config_output_path=None, label=None, headers=None, features=None, dataset_type='text/csv', s3_compression_type='None', joinsource=None, facet_dataset_uri=None, facet_headers=None, predicted_label_dataset_uri=None, predicted_label_headers=None, predicted_label=None, excluded_columns=None, segmentation_config=None, time_series_data_config=None)

Bases: object

Config object related to configurations of the input and output dataset.

Initializes a configuration of both input and output datasets.

Parameters:

Raises:

ValueError – when the dataset_type is invalid, predicted label dataset parameters are used with un-supported dataset_type, or facet dataset parameters are used with un-supported dataset_type

get_config()

Returns part of an analysis config dictionary.

class sagemaker.clarify.BiasConfig(label_values_or_threshold, facet_name, facet_values_or_threshold=None, group_name=None)

Bases: object

Config object with user-defined bias configurations of the input dataset.

Initializes a configuration of the sensitive groups in the dataset.

Parameters:

Raises:

ValueError – If the number of facet_names doesn’t equal number of facet values

get_config()

Returns a dictionary of bias detection configurations, part of the analysis config

class sagemaker.clarify.TimeSeriesModelConfig(forecast)

Bases: object

Config object for TimeSeries predictor configuration fields.

Initializes model configuration fields for TimeSeries explainability use cases.

Parameters:

forecast (str) – JMESPath expression to extract the forecast result.

Raises:

ValueError – when forecast is not a string or not provided

get_time_series_model_config()

Returns TimeSeries model config dictionary

class sagemaker.clarify.ModelConfig(model_name=None, instance_count=None, instance_type=None, accept_type=None, content_type=None, content_template=None, record_template=None, custom_attributes=None, accelerator_type=None, endpoint_name_prefix=None, target_model=None, endpoint_name=None, time_series_model_config=None)

Bases: object

Config object related to a model and its endpoint to be created.

Initializes a configuration of a model and the endpoint to be created for it.

Parameters:

Raises:

ValueError – when the - endpoint_name_prefix is invalid, - accept_type is invalid, - content_type is invalid, - content_template has no placeholder “features” - both [endpoint_name] AND [model_name, instance_count, instance_type] are set - both [endpoint_name] AND [endpoint_name_prefix] are set

get_predictor_config()

Returns part of the predictor dictionary of the analysis config.

class sagemaker.clarify.ModelPredictedLabelConfig(label=None, probability=None, probability_threshold=None, label_headers=None)

Bases: object

Config object to extract a predicted label from the model output.

Initializes a model output config to extract the predicted label or predicted score(s).

The following examples show different parameter configurations depending on the endpoint:

Parameters:

Raises:

TypeError – when the probability_threshold cannot be cast to a float

get_predictor_config()

Returns probability_threshold and predictor config dictionary.

class sagemaker.clarify.ExplainabilityConfig

Bases: ABC

Abstract config class to configure an explainability method.

abstract get_explainability_config()

Returns config.

class sagemaker.clarify.PDPConfig(features=None, grid_resolution=15, top_k_features=10)

Bases: ExplainabilityConfig

Config class for Partial Dependence Plots (PDP).

PDPsshow the marginal effect (the dependence) a subset of features has on the predicted outcome of an ML model.

When PDP is requested (by passing in a PDPConfig to theexplainability_config parameter of SageMakerClarifyProcessor), the Partial Dependence Plots are included in the outputreportand the corresponding values are included in the analysis output.

Initializes PDP config.

Parameters:

get_explainability_config()

Returns PDP config dictionary.

class sagemaker.clarify.TextConfig(granularity, language)

Bases: object

Config object to handle text features for text explainability

SHAP analysisbreaks down longer text into chunks (e.g. tokens, sentences, or paragraphs) and replaces them with the strings specified in the baseline for that feature. The shap valueof a chunk then captures how much replacing it affects the prediction.

Initializes a text configuration.

Parameters:

Raises:

ValueError – when granularity is not in list of supported values or language is not in list of supported values

get_text_config()

Returns a text config dictionary, part of the analysis config dictionary.

class sagemaker.clarify.ImageConfig(model_type, num_segments=None, feature_extraction_method=None, segment_compactness=None, max_objects=None, iou_threshold=None, context=None)

Bases: object

Config object for handling images

Initializes a config object for Computer Vision (CV) Image explainability.

SHAP for CV explainability. generating heat maps that visualize feature attributions for input images. These heat maps highlight the image’s features according to how much they contribute to the CV model prediction.

"IMAGE_CLASSIFICATION" and "OBJECT_DETECTION" are the two supported CV use cases.

Parameters:

get_image_config()

Returns the image config part of an analysis config dictionary.

class sagemaker.clarify.SHAPConfig(baseline=None, num_samples=None, agg_method=None, use_logit=False, save_local_shap_values=True, seed=None, num_clusters=None, text_config=None, image_config=None, features_to_explain=None)

Bases: ExplainabilityConfig

Config class for SHAP.

The SHAP algorithm calculates feature attributions by computing the contribution of each feature to the prediction outcome, using the concept ofShapley values.

These attributions can be provided for specific predictions (locally) and at a global level for the model as a whole.

Initializes config for SHAP analysis.

Parameters:

Raises:

ValueError – when agg_method is invalid, baseline and num_clusters are provided together, or features_to_explain is specified when text_config orimage_config is provided

get_explainability_config()

Returns a shap config dictionary.

class sagemaker.clarify.AsymmetricShapleyValueConfig(direction='chronological', granularity='timewise', num_samples=None, baseline=None)

Bases: ExplainabilityConfig

Config class for Asymmetric Shapley value algorithm for time series explainability.

Asymmetric Shapley Values are a variant of the Shapley Value that drop the symmetry axiom [1]. We use these to determine how features contribute to the forecasting outcome. Asymmetric Shapley values can take into account the temporal dependencies of the time series that forecasting models take as input.

[1] Frye, Christopher, Colin Rowat, and Ilya Feige. “Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.” NeurIPS (2020).https://doi.org/10.48550/arXiv.1910.06358

Initialises config for time series explainability with Asymmetric Shapley Values.

AsymmetricShapleyValueConfig is used specifically and only for TimeSeries explainability purposes.

Parameters:

Raises:

ValueError – when direction or granularity are not valid, num_samples is not provided for fine-grained explanations, num_samples is provided for non fine-grained explanations, or when direction is not "chronological" whilegranularity is "fine_grained".

get_explainability_config()

Returns an asymmetric shap config dictionary.

class sagemaker.clarify.SageMakerClarifyProcessor(role=None, instance_count=None, instance_type=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, sagemaker_session=None, env=None, tags=None, network_config=None, job_name_prefix=None, version=None, skip_early_validation=False)

Bases: Processor

Handles SageMaker Processing tasks to compute bias metrics and model explanations.

Initializes a SageMakerClarifyProcessor to compute bias metrics and model explanations.

Instance of Processor.

Parameters:

run(**_)

Overriding the base class method but deferring to specific run_* methods.

run_pre_training_bias(data_config, data_bias_config, methods='all', wait=True, logs=True, job_name=None, kms_key=None, experiment_config=None)

Runs a ProcessingJob to compute pre-training bias methods

Computes the requested methods on the input data. The methods compare metrics (e.g. fraction of examples) for the sensitive group(s) vs. the other examples.

Parameters:

run_post_training_bias(data_config, data_bias_config, model_config=None, model_predicted_label_config=None, methods='all', wait=True, logs=True, job_name=None, kms_key=None, experiment_config=None)

Runs a ProcessingJob to compute posttraining bias

Spins up a model endpoint and runs inference over the input dataset in the s3_data_input_path (from the DataConfig) to obtain predicted labels. Using model predictions, computes the requested posttraining biasmethods that compare metrics (e.g. accuracy, precision, recall) for the sensitive group(s) versus the other examples.

Parameters:

run_bias(data_config, bias_config, model_config=None, model_predicted_label_config=None, pre_training_methods='all', post_training_methods='all', wait=True, logs=True, job_name=None, kms_key=None, experiment_config=None)

Runs a ProcessingJob to compute the requested bias methods

Computes metrics for both the pre-training and the post-training methods. To calculate post-training methods, it spins up a model endpoint and runs inference over the input examples in ‘s3_data_input_path’ (from the DataConfig) to obtain predicted labels.

Parameters:

run_explainability(data_config, model_config, explainability_config, model_scores=None, wait=True, logs=True, job_name=None, kms_key=None, experiment_config=None)

Runs a ProcessingJob computing feature attributions.

Spins up a model endpoint.

Currently, only SHAP and Partial Dependence Plots (PDP) are supported as explainability methods. You can request both methods or one at a time with the explainability_config parameter.

When SHAP is requested in the explainability_config, the SHAP algorithm calculates the feature importance for each input example in the s3_data_input_path of the DataConfig, by creating num_samples copies of the example with a subset of features replaced with values from the baseline. It then runs model inference to see how the model’s prediction changes with the replaced features. If the model output returns multiple scores importance is computed for each score. Across examples, feature importance is aggregated using agg_method.

When PDP is requested in the explainability_config, the PDP algorithm calculates the dependence of the target response on the input features and marginalizes over the values of all other input features. The Partial Dependence Plots are included in the outputreportand the corresponding values are included in the analysis output.

Parameters:

run_bias_and_explainability(data_config, model_config, explainability_config, bias_config, pre_training_methods='all', post_training_methods='all', model_predicted_label_config=None, wait=True, logs=True, job_name=None, kms_key=None, experiment_config=None)

Runs a ProcessingJob computing feature attributions.

For bias: Computes metrics for both the pre-training and the post-training methods. To calculate post-training methods, it spins up a model endpoint and runs inference over the input examples in ‘s3_data_input_path’ (from the DataConfig) to obtain predicted labels.

For Explainability: Spins up a model endpoint.

Currently, only SHAP and Partial Dependence Plots (PDP) are supported as explainability methods. You can request both methods or one at a time with the explainability_config parameter.

When SHAP is requested in the explainability_config, the SHAP algorithm calculates the feature importance for each input example in the s3_data_input_path of the DataConfig, by creating num_samples copies of the example with a subset of features replaced with values from the baseline. It then runs model inference to see how the model’s prediction changes with the replaced features. If the model output returns multiple scores importance is computed for each score. Across examples, feature importance is aggregated using agg_method.

When PDP is requested in the explainability_config, the PDP algorithm calculates the dependence of the target response on the input features and marginalizes over the values of all other input features. The Partial Dependence Plots are included in the outputreportand the corresponding values are included in the analysis output.

Parameters:

class sagemaker.clarify.ProcessingOutputHandler

Bases: object

Class to handle the parameters for SagemakerProcessor.Processingoutput

class S3UploadMode(value, names=, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enum values for different uplaod modes to s3 bucket

CONTINUOUS = 'Continuous'

ENDOFJOB = 'EndOfJob'

classmethod get_s3_upload_mode(analysis_config)

Fetches s3_upload mode based on the shap_config values

Parameters:

analysis_config (dict) – dict Config following the analysis_config.json format

Returns:

The s3_upload_mode type for the processing output.

Return type:

str