ENH: guarantee pandas.Series.value_counts "sort=False" to be original ordering · Issue #12679 · pandas-dev/pandas (original) (raw)

Hello,

I'm trying to make a new DataFrame that contains the value counts of a column of an existing DataFrame (spreadsheet.xlsx), but I want the rows in the new DataFrame to be in the same order as the old one.

When I do:

import pandas as pd df = pd.read_excel('./spreadsheet.xlsx') print(df[0].value_counts(sort=False))

I get the DataFrame:

h4 8 ct1 6 f2 2 s1 2 EST2 2 f5 2 E4 8 h2 8 hd2 7 f3 2 ART1 2 s2 2 f1 2 h3 8 EST1 2 s3 2 E6 8 ART2 2 DGT2 2 ct2 6 s4 2 ct3 6 f4 2 DGT1 2 s5 2

When what I really want is:

h2 8 h3 8 h4 8 hd2 7 E4 8 E6 8 ct1 6 . . .

...because that's the order in which the values occur in the original DataFrame.

I can't tell how it's sorting them, but it is somehow. Is this the expected behavior?

Thanks

Installed versions: commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None

pandas: 0.18.0 nose: 1.3.7 pip: 8.1.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: None xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.0 pytz: 2016.1 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.4.6 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: 0.9.2 apiclient: 1.5.0 sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0