API: Should DataFrame constructor allow sets for index and columns? · Issue #47215 · pandas-dev/pandas (original) (raw)

While looking into some typing issues, I found that the following code works:

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=set(["a", "b", "c"])) print(df)

The result:

We document that the columns and index arguments should be Index or "array-like". Certainly a set is not array-like!

Internally, we type those arguments as Axes, which is defined as Collection[Any], but in the pandas-stubs, we are more restrictive, not allowing sets to be passed.

I think we should disallow set as a type in the constructor, since it doesn't follow the documentation, and if you were to use it, the behavior is unclear. We could do one or more of the following:

  1. Add code in the DataFrame constructor to raise an error if index or columns are sets
  2. Change the type declaration for Axes to be more restrictive, i.e., not use Collection

I think we should definitely do (2), but I'm not sure if (1) would be considered a change in the API, since we never documented that sets are acceptable, but they did work (and possibly give funky behavior).