DataFrame.finalize not called in pd.concat · Issue #6927 · pandas-dev/pandas (original) (raw)

When I assign metadata to a df

import numpy as np
import pandas as pd
np.random.seed(10)

pd.DataFrame._metadata = ['filename']
df1 = pd.DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab'))
df1.filenames = {'a': 'f1', 'b': 'f2'}
df1
       a  b
    0  1  1
    1  0  3
    2  0  1

and define a __finalize__ that prints when it's called

def finalize_df(self, other, method=None, **kwargs):
    print 'finalize called'
    for name in self._metadata:
        object.__setattr__(self, name, getattr(other, name, None))
    return self

pd.DataFrame.__finalize__ = finalize_df

nothing is preserved when pd.concat is called:

stacked = pd.concat([df1, df1])  # Nothing printed
stacked
       a  b
    0  1  1
    1  0  3
    2  0  1
    0  1  1
    1  0  3
    2  0  1
stacked.finalize  # => AttributeError 

For this specific case it seems reasonable that __finalize__ should be used since all of the elements are from the same dataframe, though I'm not sure about the general use since concat can also take types other than a DataFrame. But should we/do we have some method to stack dataframes that preserves metadata?

Similar to #6923.