Creating a column with a set replicates the set n times · Issue #32582 · pandas-dev/pandas (original) (raw)
Code Sample
If we try to define a dataframe using a dictionary containing a set, we get:
pd.DataFrame({'a':{1,2,3}})
a
0 {1, 2, 3} 1 {1, 2, 3} 2 {1, 2, 3}
Problem description
The set is being replicated n
times, n
being the length of the actual set.
While defining a column with a set directly might not make a lot of sense given that they are by definition unordered collections, the behaviour in any case seems clearly unexpected.
Expected Output
In the case of a list, in order to obtain a single row containing a list, we would have to define a nested list, such as pd.DataFrame({'a':[[1,2,3]]})
.
So similarly, with sets I would expect the same behaviour by defining the row with pd.DataFrame({'a':[{1,2,3}]})
.
In the case of a single set, even if the order is not guaranteed to be preserved, I'd see more reasonable the same output that we would obtain with:
pd.DataFrame({'a':[1,2,3]})
a 0 1 1 2 2 3
So:
pd.DataFrame({'a':{1,2,3}})
a 0 1 1 2 2 3
Where: