HashingTF (Spark 3.5.5 JavaDoc) (original) (raw)


public class HashingTF
extends Transformer
implements HasInputCol, HasOutputCol, HasNumFeatures, DefaultParamsWritable
Maps a sequence of terms to their term frequencies using the hashing trick. Currently we use Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32) to calculate the hash code value for the term object. Since a simple modulo is used to transform the hash function to a column index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the columns.
See Also:
Serialized Form