pandas.DataFrame.duplicated to allow take_all · Issue #6511 · pandas-dev/pandas (original) (raw)
When working with external data, I often see rows with primary key violations. Currently, I could not easily select all the violating rows. For example, if I have a massive file with some inconsistent data
datecol,valuecol
...
2014-01-01,12
2014-01-01,13
2014-01-02,10
...
In this use case, it would be good if we can do df[df.duplicated('datecol', take_all=True)]
to directly get the bad rows
2014-01-01,12
2014-01-01,13