tf.keras.layers.CategoryEncoding | TensorFlow v2.16.1 (original) (raw)

A preprocessing layer which encodes integer features.

tf.keras.layers.CategoryEncoding(
    num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs
)

Used in the notebooks

Used in the guide	Used in the tutorials
Migrate `tf.feature_column`s to Keras preprocessing layers Working with preprocessing layers	Load CSV data Classify structured data using Keras preprocessing layers

This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use keras.layers.IntegerLookup instead.

Examples:

One-hot encoding data

layer = keras.layers.CategoryEncoding( num_tokens=4, output_mode="one_hot") layer([3, 2, 0, 1]) array([[0., 0., 0., 1.], [0., 0., 1., 0.], [1., 0., 0., 0.], [0., 1., 0., 0.]]>

Multi-hot encoding data

layer = keras.layers.CategoryEncoding( num_tokens=4, output_mode="multi_hot") layer([[0, 1], [0, 0], [1, 2], [3, 1]]) array([[1., 1., 0., 0.], [1., 0., 0., 0.], [0., 1., 1., 0.], [0., 1., 0., 1.]]>

Using weighted inputs in "count" mode

layer = keras.layers.CategoryEncoding( num_tokens=4, output_mode="count") count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]]) layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights) array([[0.1, 0.2, 0. , 0. ], [0.2, 0. , 0. , 0. ], [0. , 0.2, 0.3, 0. ], [0. , 0.2, 0. , 0.4]]>

Args
num_tokens	The total number of tokens the layer should support. All inputs to the layer must integers in the range 0 <= value < num_tokens, or an error will be thrown.
output_mode	Specification for the output of the layer. Values can be "one_hot", "multi_hot" or "count", configuring the layer as follows:- `"one_hot"`: Encodes each individual element in the input into an array of `num_tokens` size, containing a 1 at the element index. If the last dimension is size 1, will encode on that dimension. If the last dimension is not size 1, will append a new dimension for the encoded output. - `"multi_hot"`: Encodes each sample in the input into a single array of `num_tokens` size, containing a 1 for each vocabulary term present in the sample. Treats the last dimension as the sample dimension, if input shape is `(..., sample_length)`, output shape will be `(..., num_tokens)`. - `"count"`: Like `"multi_hot"`, but the int array contains a count of the number of times the token at that index appeared in the sample. For all output modes, currently only output up to rank 2 is supported. Defaults to "multi_hot".
sparse	Whether to return a sparse tensor; for backends that support sparse tensors.

Call arguments
inputs	A 1D or 2D tensor of integer inputs.
count_weights	A tensor in the same shape as inputs indicating the weight for each sample value when summing up in count mode. Not used in "multi_hot" or "one_hot" modes.

Attributes
input	Retrieves the input tensor(s) of a symbolic operation.Only returns the tensor(s) corresponding to the _first time_the operation was called.
output	Retrieves the output tensor(s) of a layer.Only returns the tensor(s) corresponding to the _first time_the operation was called.

Methods

`from_config`

View source

@classmethod from_config( config )

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Args
config	A Python dictionary, typically the output of get_config.

Returns
A layer instance.

`symbolic_call`

View source

symbolic_call(
    *args, **kwargs
)