MinHashLSH (Spark 3.5.5 JavaDoc) (original) (raw)


public class MinHashLSH
extends Estimator
implements HasSeed
LSH class for Jaccard distance.
The input can be dense or sparse vectors, but it is more efficient if it is sparse. For example,Vectors.sparse(10, Array((2, 1.0), (3, 1.0), (5, 1.0))) means there are 10 elements in the space. This set contains elements 2, 3, and 5. Also, any input vector must have at least 1 non-zero index, and all non-zero values are treated as binary "1" values.
References:Wikipedia on MinHash
See Also:
Serialized Form