Random Cut Forest — sagemaker 2.199.0 documentation (original) (raw)

sagemaker

The Amazon SageMaker Random Cut Forest algorithm.

class sagemaker. RandomCutForest(role=None, instance_count=None, instance_type=None, num_samples_per_tree=None, num_trees=None, eval_metrics=None, **kwargs)

Bases: sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase

An unsupervised algorithm for detecting anomalous data points within a data set.

These are observations which diverge from otherwise well-structured or patterned data. Anomalies can manifest as unexpected spikes in time series data, breaks in periodicity, or unclassifiable data points.

An Estimator class implementing a Random Cut Forest.

Typically used for anomaly detection, this Estimator may be fit via calls tofit(). It requires Amazon Record protobuf serialized data to be stored in S3. There is an utilityrecord_set()that can be used to upload data to S3 and createsRecordSet to be passed to the fit call.

To learn more about the Amazon protobuf Record class and how to prepare bulk data in this format, please consult AWS technical documentation:https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

After this Estimator is fit, model data is stored in S3. The model may be deployed to an Amazon SageMaker Endpoint by invokingdeploy(). As well as deploying an Endpoint, deploy returns aRandomCutForestPredictor object that can be used for inference calls using the trained model hosted in the SageMaker Endpoint.

RandomCutForest Estimators can be configured by setting hyperparameters. The available hyperparameters for RandomCutForest are documented below.

For further information on the AWS Random Cut Forest algorithm, please consult AWS technical documentation:https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html

Parameters

Tip

You can find additional parameters for initializing this class atAmazonAlgorithmEstimatorBase andEstimatorBase.

repo_name: str = 'randomcutforest'

repo_version: str = '1'

create_model(vpc_config_override='VPC_CONFIG_DEFAULT', **kwargs)

Return a RandomCutForestModel.

It references the latest s3 model data produced by this Estimator.

Parameters

CONTAINER_CODE_CHANNEL_SOURCEDIR_PATH = '/opt/ml/input/data/code/sourcedir.tar.gz'

DEFAULT_MINI_BATCH_SIZE = None

INSTANCE_TYPE = 'sagemaker_instance_type'

JOB_CLASS_NAME = 'training-job'

LAUNCH_MPI_ENV_NAME = 'sagemaker_mpi_enabled'

LAUNCH_MWMS_ENV_NAME = 'sagemaker_multi_worker_mirrored_strategy_enabled'

LAUNCH_PS_ENV_NAME = 'sagemaker_parameter_server_enabled'

LAUNCH_PT_XLA_ENV_NAME = 'sagemaker_pytorch_xla_multi_worker_enabled'

LAUNCH_SM_DDP_ENV_NAME = 'sagemaker_distributed_dataparallel_enabled'

MPI_CUSTOM_MPI_OPTIONS = 'sagemaker_mpi_custom_mpi_options'

MPI_NUM_PROCESSES_PER_HOST = 'sagemaker_mpi_num_of_processes_per_host'

SM_DDP_CUSTOM_MPI_OPTIONS = 'sagemaker_distributed_dataparallel_custom_mpi_options'

classmethod attach(training_job_name, sagemaker_session=None, model_channel_name='model')

Attach to an existing training job.

Create an Estimator bound to an existing training job, each subclass is responsible to implement_prepare_init_params_from_job_description() as this method delegates the actual conversion of a training job description to the arguments that the class constructor expects. After attaching, if the training job has a Complete status, it can be deploy() ed to create a SageMaker Endpoint and return a Predictor.

If the training job is in progress, attach will block until the training job completes, but logs of the training job will not display. To see the logs content, please call logs()

Examples

my_estimator.fit(wait=False) training_job_name = my_estimator.latest_training_job.name Later on: attached_estimator = Estimator.attach(training_job_name) attached_estimator.logs() attached_estimator.deploy()

Parameters

Returns

Instance of the calling Estimator Class with the attached training job.

compile_model(target_instance_family, input_shape, output_path, framework=None, framework_version=None, compile_max_run=900, tags=None, target_platform_os=None, target_platform_arch=None, target_platform_accelerator=None, compiler_options=None, **kwargs)

Compile a Neo model using the input model.

Parameters

Returns

A SageMaker Model object. SeeModel() for full details.

Return type

sagemaker.model.Model

property data_location

Placeholder docstring

delete_endpoint(**kwargs)

deploy(initial_instance_count=None, instance_type=None, serializer=None, deserializer=None, accelerator_type=None, endpoint_name=None, use_compiled_model=False, wait=True, model_name=None, kms_key=None, data_capture_config=None, tags=None, serverless_inference_config=None, async_inference_config=None, volume_size=None, model_data_download_timeout=None, container_startup_health_check_timeout=None, inference_recommendation_id=None, explainer_config=None, **kwargs)

Deploy the trained model to an Amazon SageMaker endpoint.

And then return sagemaker.Predictor object.

More information:http://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html

Parameters

Returns

A predictor that provides a predict() method,

which can be used to send requests to the Amazon SageMaker endpoint and obtain inferences.

Return type

sagemaker.predictor.Predictor

disable_profiling()

Update the current training job in progress to disable profiling.

Debugger stops collecting the system and framework metrics and turns off the Debugger built-in monitoring and profiling rules.

enable_default_profiling()

Update training job to enable Debugger monitoring.

This method enables Debugger monitoring with the default profiler_config parameter to collect system metrics and the default built-in profiler_report rule. Framework metrics won’t be saved. To update training job to emit framework metrics, you can useupdate_profilermethod and specify the framework metrics you want to enable.

This method is callable when the training job is in progress while Debugger monitoring is disabled.

enable_network_isolation()

Return True if this Estimator will need network isolation to run.

Returns

Whether this Estimator needs network isolation or not.

Return type

bool

fit(records, mini_batch_size=None, wait=True, logs=True, job_name=None, experiment_config=None)

Fit this Estimator on serialized Record objects, stored in S3.

records should be an instance of RecordSet. This defines a collection of S3 data files to train this Estimator on.

Training data is expected to be encoded as dense or sparse vectors in the “values” feature on each Record. If the data is labeled, the label is expected to be encoded as a list of scalas in the “values” feature of the Record label.

More information on the Amazon Record format is available at:https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html

See record_set() to construct aRecordSet object from ndarray arrays.

Parameters

get_app_url(app_type, open_in_default_web_browser=True, create_presigned_domain_url=False, domain_id=None, user_profile_name=None, optional_create_presigned_url_kwargs=None)

Generate a URL to help access the specified app hosted in Amazon SageMaker Studio.

Parameters

Returns

A URL for the requested app in SageMaker Studio.

Return type

str

get_vpc_config(vpc_config_override='VPC_CONFIG_DEFAULT')

Returns VpcConfig dict either from this Estimator’s subnets and security groups.

Or else validate and return an optional override value.

Parameters

vpc_config_override

hyperparameters()

Placeholder docstring

latest_job_debugger_artifacts_path()

Gets the path to the DebuggerHookConfig output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

latest_job_profiler_artifacts_path()

Gets the path to the profiling output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

latest_job_tensorboard_artifacts_path()

Gets the path to the TensorBoardOutputConfig output artifacts.

Returns

An S3 path to the output artifacts.

Return type

str

logs()

Display the logs for Estimator’s training job.

If the output is a tty or a Jupyter cell, it will be color-coded based on which instance the log entry is from.

property model_data

The model location in S3. Only set if Estimator has been fit().

Type

Str or dict

prepare_workflow_for_training(records=None, mini_batch_size=None, job_name=None)

Calls _prepare_for_training. Used when setting up a workflow.

Parameters

record_set(train, labels=None, channel='train', encrypt=False)

Build a RecordSet from a numpy ndarray matrix and label vector.

For the 2D ndarray train, each row is converted to aRecord object. The vector is stored in the “values” entry of the features property of each Record. If labels is not None, each corresponding label is assigned to the “values” entry of thelabels property of each Record.

The collection of Record objects are protobuf serialized and uploaded to new S3 locations. A manifest file is generated containing the list of objects created and also stored in S3.

The number of S3 objects created is controlled by theinstance_count property on this Estimator. One S3 object is created per training instance.

Parameters

Returns

A RecordSet referencing the encoded, uploading training and label data.

Return type

RecordSet

register(content_types=None, response_types=None, inference_instances=None, transform_instances=None, image_uri=None, model_package_name=None, model_package_group_name=None, model_metrics=None, metadata_properties=None, marketplace_cert=False, approval_status=None, description=None, compile_model_family=None, model_name=None, drift_check_baselines=None, customer_metadata_properties=None, domain=None, sample_payload_url=None, task=None, framework=None, framework_version=None, nearest_model_name=None, data_input_configuration=None, skip_model_validation=None, **kwargs)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters

Returns

A string of SageMaker Model Package ARN.

Return type

str

training_image_uri()

Placeholder docstring

property training_job_analytics

Return a TrainingJobAnalytics object for the current training job.

transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, role=None, volume_kms_key=None, vpc_config_override='VPC_CONFIG_DEFAULT', enable_network_isolation=None, model_name=None)

Return a Transformer that uses a SageMaker Model based on the training job.

It reuses the SageMaker Session and base job name used by the Estimator.

Parameters

update_profiler(rules=None, system_monitor_interval_millis=None, s3_output_path=None, framework_profile_params=None, disable_framework_metrics=False)

Update training jobs to enable profiling.

This method updates the profiler_config parameter and initiates Debugger built-in rules for profiling.

Parameters

Attention

Updating the profiling configuration for TensorFlow dataloader profiling is currently not available. If you started a TensorFlow training job only with monitoring and want to enable profiling while the training job is running, the dataloader profiling cannot be updated.

class sagemaker. RandomCutForestModel(model_data, role=None, sagemaker_session=None, **kwargs)

Bases: sagemaker.model.Model

Reference RandomCutForest s3 model data.

Calling deploy() creates an Endpoint and returns a Predictor that calculates anomaly scores for datapoints.

Initialization for RandomCutForestModel class.

Parameters

class sagemaker. RandomCutForestPredictor(endpoint_name, sagemaker_session=None, serializer=<sagemaker.amazon.common.RecordSerializer object>, deserializer=<sagemaker.amazon.common.RecordDeserializer object>, component_name=None)

Bases: sagemaker.base_predictor.Predictor

Assigns an anomaly score to each of the datapoints provided.

The implementation ofpredict() in thisPredictor requires a numpy ndarray as input. The array should contain the same number of columns as the feature-dimension of the data used to fit the model this Predictor performs inference on.

predict() returns a list ofRecord objects (assuming the default recordio-protobuf deserializer is used), one for each row in the input. Each row’s score is stored in the key score of theRecord.label field.

Initialization for RandomCutForestPredictor class.

Parameters