Creating a column with a set replicates the set n times · Issue #32582 · pandas-dev/pandas (original) (raw)

Code Sample

If we try to define a dataframe using a dictionary containing a set, we get:

pd.DataFrame({'a':{1,2,3}})

0 {1, 2, 3} 1 {1, 2, 3} 2 {1, 2, 3}

Problem description

The set is being replicated n times, n being the length of the actual set.
While defining a column with a set directly might not make a lot of sense given that they are by definition unordered collections, the behaviour in any case seems clearly unexpected.

Expected Output

In the case of a list, in order to obtain a single row containing a list, we would have to define a nested list, such as pd.DataFrame({'a':[[1,2,3]]}).
So similarly, with sets I would expect the same behaviour by defining the row with pd.DataFrame({'a':[{1,2,3}]}).

In the case of a single set, even if the order is not guaranteed to be preserved, I'd see more reasonable the same output that we would obtain with:

pd.DataFrame({'a':[1,2,3]})

a 0 1 1 2 2 3

So:

pd.DataFrame({'a':{1,2,3}})

a 0 1 1 2 2 3

Where: