tft.ngrams | TFX | TensorFlow (original) (raw)

tft.ngrams

Stay organized with collections Save and categorize content based on your preferences.

Create a SparseTensor of n-grams.

tft.ngrams(
    tokens: tf.SparseTensor,
    ngram_range: Tuple[int, int],
    separator: str,
    name: Optional[str] = None
) -> tf.SparseTensor

Given a SparseTensor of tokens, returns a SparseTensor containing the ngrams that can be constructed from each row.

separator is inserted between each pair of tokens, so " " would be an appropriate choice if the tokens are words, while "" would be an appropriate choice if they are characters.

Example:

tokens = tf.SparseTensor( indices=[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [1, 3]], values=['One', 'was', 'Johnny', 'Two', 'was', 'a', 'rat'], dense_shape=[2, 4]) print(tft.ngrams(tokens, ngram_range=(1, 3), separator=' ')) SparseTensor(indices=tf.Tensor( [[0 0] [0 1] [0 2] [0 3] [0 4] [0 5] [1 0] [1 1] [1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [1 8]], shape=(15, 2), dtype=int64), values=tf.Tensor( [b'One' b'One was' b'One was Johnny' b'was' b'was Johnny' b'Johnny' b'Two' b'Two was' b'Two was a' b'was' b'was a' b'was a rat' b'a' b'a rat' b'rat'], shape=(15,), dtype=string), dense_shape=tf.Tensor([2 9], shape=(2,), dtype=int64))

Args
tokens	a two-dimensionalSparseTensor of dtype tf.string containing tokens that will be used to construct ngrams.
ngram_range	A pair with the range (inclusive) of ngram sizes to return.
separator	a string that will be inserted between tokens when ngrams are constructed.
name	(Optional) A name for this operation.

Returns
A SparseTensor containing all ngrams from each row of the input. Note: if an ngram appears multiple times in the input row, it will be present the same number of times in the output. For unique ngrams, see tft.bag_of_words.

Raises
ValueError	if tokens is not 2D.
ValueError	if ngram_range[0] < 1 or ngram_range[1] < ngram_range[0]