BUG: Inconsistent Sorting using DataFrame.from_dict() method · Issue #55683 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
print(pd.version)
d1 = { "value2": 123, "value1": 532, "animal": 222, "plant": False, "name": "bunny" }
d2 = { "value2": 321, "value1": 235, "animal": 555, "plant": False, "name": "lamb" }
d3 = { "value2": 777, "value1": 999, "animal": 333, "plant": True, "name": "rosebush" }
combined = {}
combined["alpha"] = d1 combined["bravo"] = d2 combined["charlie"] = d3
combined1 = {} combined1["alpha"] = d1
All four of these situations should produce their keys in the same order...
test_df = pd.DataFrame.from_dict(data=combined, orient = "columns") print(test_df.to_markdown())
print("")
#this one is different; it only has one column in final so gets sorted alphabetically by row headers test_df1 = pd.DataFrame.from_dict(data=combined1, orient = "columns") print(test_df1.to_markdown())
print("")
test_df = pd.DataFrame.from_dict(data=combined, orient="index") test_df = test_df.T print(test_df.to_markdown())
print("")
test_df1 = pd.DataFrame.from_dict(data=combined1, orient="index") test_df1 = test_df1.T print(test_df1.to_markdown())
Recommendation 1:
all should produce the same behavior, whatever that is.
In particular, orient = columns, should produce same order whether there are 1 data column or many.
Recommendation 2:
add a keyword argument for sort in the pandas.DataFrame.from_dict method so that user can decide at
run time which sorting they'd like
reproducible in pandas 2.0.3, 2.1.1
Issue Description
When using the pandas.DataFrame.from_dict() method to create a data frame, function produces different results when using orient="columns" depending on whether the number of indices is 1 or more. If index count == 1, columns will be sorted alphabetically. If index count >= 2, columns will be sorted as provided in the dict.
Expected Behavior
Expected behavior should be that, regardless of the index count, the .from_dict() method should produce columns in the same order every time. Even better would be to provide a sort argument to allow the user to sort in a certain way when they call the function. Barring that, choose a way and function should perform that same way every time. Recommend using "as provided" as default sorting (i.e. no sorting); this is the same behavior as from_dict(orient="index") and would give consistency across the function.
Installed Versions
INSTALLED VERSIONS
commit : e86ed37
python : 3.10.12.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:28 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.1
numpy : 1.25.1
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : 0.19.0
tzdata : 2023.3
qtpy : None
pyqt5 : None
None