DataFrame.finalize not called in pd.concat · Issue #6927 · pandas-dev/pandas (original) (raw)
When I assign metadata to a df
import numpy as np
import pandas as pd
np.random.seed(10)
pd.DataFrame._metadata = ['filename']
df1 = pd.DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab'))
df1.filenames = {'a': 'f1', 'b': 'f2'}
df1
a b
0 1 1
1 0 3
2 0 1
and define a __finalize__
that prints when it's called
def finalize_df(self, other, method=None, **kwargs):
print 'finalize called'
for name in self._metadata:
object.__setattr__(self, name, getattr(other, name, None))
return self
pd.DataFrame.__finalize__ = finalize_df
nothing is preserved when pd.concat
is called:
stacked = pd.concat([df1, df1]) # Nothing printed
stacked
a b
0 1 1
1 0 3
2 0 1
0 1 1
1 0 3
2 0 1
stacked.finalize # => AttributeError
For this specific case it seems reasonable that __finalize__
should be used since all of the elements are from the same dataframe, though I'm not sure about the general use since concat
can also take types other than a DataFrame. But should we/do we have some method to stack dataframes that preserves metadata?
Similar to #6923.