tf.feature_column.categorical_column_with_hash_bucket | TensorFlow v2.16.1 (original) (raw)
tf.feature_column.categorical_column_with_hash_bucket
Stay organized with collections Save and categorize content based on your preferences.
Represents sparse feature where ids are set by hashing. (deprecated)
View aliases
Compat aliases for migration
SeeMigration guide for more details.
tf.compat.v1.feature_column.categorical_column_with_hash_bucket
tf.feature_column.categorical_column_with_hash_bucket(
key,
hash_bucket_size,
dtype=tf.dtypes.string
)
Used in the notebooks
Used in the guide | Used in the tutorials |
---|---|
Estimators | Classify structured data with feature columns |
Use this when your sparse features are in string or integer format, and you want to distribute your inputs into a finite number of buckets by hashing. output_id = Hash(input_feature_string) % bucket_size for string type input. For int type input, the value is converted to its string representation first and then hashed by the same formula.
For input dictionary features
, features[key]
is either Tensor
orSparseTensor
. If Tensor
, missing values can be represented by -1
for int and ''
for string, which will be dropped by this feature column.
Example:
import tensorflow as tf
keywords = tf.feature_column.categorical_column_with_hash_bucket("keywords",
10000)
columns = [keywords]
features = {'keywords': tf.constant([['Tensorflow', 'Keras', 'RNN', 'LSTM',
'CNN'], ['LSTM', 'CNN', 'Tensorflow', 'Keras', 'RNN'], ['CNN', 'Tensorflow',
'LSTM', 'Keras', 'RNN']])}
linear_prediction, _, _ = tf.compat.v1.feature_column.linear_model(features,
columns)
# or
import tensorflow as tf
keywords = tf.feature_column.categorical_column_with_hash_bucket("keywords",
10000)
keywords_embedded = tf.feature_column.embedding_column(keywords, 16)
columns = [keywords_embedded]
features = {'keywords': tf.constant([['Tensorflow', 'Keras', 'RNN', 'LSTM',
'CNN'], ['LSTM', 'CNN', 'Tensorflow', 'Keras', 'RNN'], ['CNN', 'Tensorflow',
'LSTM', 'Keras', 'RNN']])}
input_layer = tf.keras.layers.DenseFeatures(columns)
dense_tensor = input_layer(features)
Args | |
---|---|
key | A unique string identifying the input feature. It is used as the column name and the dictionary key for feature parsing configs, feature Tensorobjects, and feature columns. |
hash_bucket_size | An int > 1. The number of buckets. |
dtype | The type of features. Only string and integer types are supported. |
Returns |
---|
A HashedCategoricalColumn. |
Raises | |
---|---|
ValueError | hash_bucket_size is not greater than 1. |
ValueError | dtype is neither string nor integer. |