pandas.util.hash_array — pandas 3.0.0.dev0+2106.gb2b2d04e41 documentation (original) (raw)

pandas.util.hash_array(vals, encoding='utf8', hash_key='0123456789123456', categorize=True)[source]#

Given a 1d array, return an array of deterministic integers.

Parameters:

valsndarray or ExtensionArray

The input array to hash.

encodingstr, default ‘utf8’

Encoding for data & key when strings.

hash_keystr, default _default_hash_key

Hash_key for string key to encode.

categorizebool, default True

Whether to first categorize object arrays before hashing. This is more efficient when the array contains duplicate values.

Returns:

ndarray[np.uint64, ndim=1]

Hashed values, same length as the vals.

See also

util.hash_pandas_object

Return a data hash of the Index/Series/DataFrame.

util.hash_tuples

Hash an MultiIndex / listlike-of-tuples efficiently.

Examples

pd.util.hash_array(np.array([1, 2, 3])) array([ 6238072747940578789, 15839785061582574730, 2185194620014831856], dtype=uint64)