Python | Pandas.factorize() (original) (raw)
Last Updated : 27 Sep, 2018
pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize()
and Series.factorize()
.
Parameters:
values : 1D sequence.
sort : [bool, Default is False] Sort uniques and shuffle labels.
na_sentinel : [ int, default -1] Missing Values to mark ‘not found’.Return: Numeric representation of array
Code: Explaining the working of factorize() method
import
numpy as np
import
pandas as pd
from
pandas.api.types
import
CategoricalDtype
labels, uniques
=
pd.factorize([
'b'
,
'd'
,
'd'
,
'c'
,
'a'
,
'c'
,
'a'
,
'b'
])
print
(
"Numeric Representation : \n"
, labels)
print
(
"Unique Values : \n"
, uniques)
label1, unique1
=
pd.factorize([
'b'
,
'd'
,
'd'
,
'c'
,
'a'
,
'c'
,
'a'
,
'b'
],
`` sort
=
True
)
print
(
"\n\nNumeric Representation : \n"
, label1)
print
(
"Unique Values : \n"
, unique1)
label2, unique2
=
pd.factorize([
'b'
,
None
,
'd'
,
'c'
,
None
,
'a'
, ],
`` na_sentinel
=
-
101
)
print
(
"\n\nNumeric Representation : \n"
, label2)
print
(
"Unique Values : \n"
, unique2)
a
=
pd.Categorical([
'a'
,
'a'
,
'c'
], categories
=
[
'a'
,
'b'
,
'c'
])
label3, unique3
=
pd.factorize(a)
print
(
"\n\nNumeric Representation : \n"
, label3)
print
(
"Unique Values : \n"
, unique3)