tft.TFTransformOutput | TFX | TensorFlow (original) (raw)

A wrapper around the output of the tf.Transform.

tft.TFTransformOutput(
    transform_output_dir: str
)

Used in the notebooks

Used in the tutorials
Preprocessing data with TensorFlow Transform Preprocess data with TensorFlow Transform TFX Estimator Component Tutorial TFX Keras Component Tutorial Graph-based Neural Structured Learning in TFX

Args
transform_output_dir	The directory containig tf.Transform output.

Attributes
post_transform_statistics_path	Returns the path to the post-transform datum statistics.
pre_transform_statistics_path	Returns the path to the pre-transform datum statistics.
raw_metadata	A DatasetMetadata.
transform_savedmodel_dir	A python str.
transformed_metadata	A DatasetMetadata.

Methods

`load_transform_graph`

View source

load_transform_graph()

Load the transform graph without replacing any placeholders.

This is necessary to ensure that variables in the transform graph are included in the training checkpoint when using tf.Estimator. This should be called in the training input_fn.

`num_buckets_for_transformed_feature`

View source

num_buckets_for_transformed_feature(
    name: str
) -> int

Returns the number of buckets for an integerized transformed feature.

`raw_domains`

View source

raw_domains() -> Dict[str, common_types.DomainType]

Returns domains for the raw features.

Returns
A dict from feature names to one of schema_pb2.IntDomain, schema_pb2.StringDomain or schema_pb2.FloatDomain.

`raw_feature_spec`

View source

raw_feature_spec() -> Dict[str, common_types.FeatureSpecType]

Returns a feature_spec for the raw features.

Returns
A dict from feature names to FixedLenFeature/SparseFeature/VarLenFeature.

`transform_features_layer`

View source

transform_features_layer() -> tf_keras.Model

Creates a TransformFeaturesLayer from this transform output.

If a TransformFeaturesLayer has already been created for self, the same one will be returned.

Returns
A TransformFeaturesLayer instance.

`transform_raw_features`

View source

transform_raw_features(
    raw_features: Mapping[str, common_types.TensorType],
    drop_unused_features: bool = True
) -> Dict[str, common_types.TensorType]

Takes a dict of tensors representing raw features and transforms them.

Takes a dictionary of Tensor, SparseTensor, or RaggedTensors that represent the raw features, and applies the transformation defined by tf.Transform.

If False it returns all transformed features defined by tf.Transform. To only return features transformed from the given 'raw_features', setdrop_unused_features to True.

Args
raw_features	A dict whose keys are feature names and values areTensors, SparseTensors, or RaggedTensors.
drop_unused_features	If True, the result will be filtered. Only the features that are transformed from 'raw_features' will be included in the returned result. If a feature is transformed from multiple raw features (e.g, feature cross), it will only be included if all its base raw features are present in raw_features.

Returns
A dict whose keys are feature names and values are Tensors,SparseTensors, or RaggedTensors representing transformed features.

`transformed_domains`

View source

transformed_domains() -> Dict[str, common_types.DomainType]

Returns domains for the transformed features.

Returns
A dict from feature names to one of schema_pb2.IntDomain, schema_pb2.StringDomain or schema_pb2.FloatDomain.

`transformed_feature_spec`

View source

transformed_feature_spec() -> Dict[str, common_types.FeatureSpecType]

Returns a feature_spec for the transformed features.

Returns
A dict from feature names to FixedLenFeature/SparseFeature/VarLenFeature.

`vocabulary_by_name`

View source

vocabulary_by_name(
    vocab_filename: str
) -> List[bytes]

Like vocabulary_file_by_name but returns a list.

`vocabulary_file_by_name`

View source

vocabulary_file_by_name(
    vocab_filename: str
) -> Optional[str]

Returns the vocabulary file path created in the preprocessing function.

vocab_filename must either be (i) the name used as the vocab_filename argument to tft.compute_and_apply_vocabulary / tft.vocabulary or (ii) the key used in tft.annotate_asset.

When a mapping has been specified by calls to tft.annotate_asset, it will be checked first for the provided filename. If present, this filename will be used directly to construct a path.

If the mapping does not exist or vocab_filename is not present within it, we will default to sanitizing vocab_filename and searching for files matching it within the assets directory.

In either case, if the constructed path does not point to an existing file within the assets subdirectory, we will return a None.

Args
vocab_filename	The vocabulary name to lookup.

`vocabulary_size_by_name`

View source

vocabulary_size_by_name(
    vocab_filename: str
) -> int

Like vocabulary_file_by_name, but returns the size of vocabulary.

Class Variables
ASSET_MAP	'asset_map'
POST_TRANSFORM_FEATURE_STATS_PATH	'post_transform_feature_stats/FeatureStats.pb'
PRE_TRANSFORM_FEATURE_STATS_PATH	'pre_transform_feature_stats/FeatureStats.pb'
RAW_METADATA_DIR	'metadata'
TRANSFORMED_METADATA_DIR	'transformed_metadata'
TRANSFORM_FN_DIR	'transform_fn'