DEPR: GH10623 remove items from msgpack.encode for blocks by kawochen · Pull Request #12129 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation23 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

kawochen

@kawochen

hmm legacy file for 0.17.1 not passing. I think it's tz-related. Will check later.

@kawochen

passing now. had to make changes for DatetimeTZBlock (it wasn't tested?) and skip files packed in P2 with 0.17 :( due to #12142

jreback

'locs': b.mgr_locs.as_array,
'values': convert(b.values),
'blocks': [{'locs': b.mgr_locs.as_array,
'values': convert_if_not_dt_index(b.values),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way hacky. you shouldn't need to specifically do this (test for DatetimeIndex. It should just be converted normally. The entire difference here is the dispatch on the dtype, which in this case would be a string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it gets converted to a numpy array now, wouldn't the whole encode/decode for DatetimeIndex be skipped? Or is there a way to construct DatetimeTZBlock from numpy arrays?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think you could trivially change core/internals.py/make_block to take a DatetimeTZDtype and correctly create it. Then you would have to make to serialize that properly (as a string is prob ok) I think (you would need to fix the reverse lookup as it only deals with numpy atm).

I think the same problem exists with categorical, prob not testing it.

@kawochen

jreback

dtype = kwargs.pop('dtype')
if isinstance(dtype, compat.string_types):
dtype = DatetimeTZDtype.construct_from_string(dtype)
values = values.tz_localize('UTC').tz_convert(dtype.tz)
if not isinstance(values, self._holder):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should do this first, then you can coerce if you have a dtype

@jreback

fyi some of these fixes will prob address: #11455, maybe #11755

@kawochen

updated... but I forgot to squash :( will do that later

jreback

@@ -245,6 +246,10 @@ def unconvert(values, dtype, compress=None):
if dtype == np.object_:
return np.array(values, dtype=object)
if isinstance(dtype, string_types):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that you have to do this, we prob need something like (in core/common.py)

def pandas_dtype(dtype):
     if isinstance(dtype, compat.string_types):
            try:
                return DatetimeTZDtype.construct_from_string(dtype)
            except TypeError:
                pass

            try:
               return CategoricalDtype.construct_from_string(dtype)
            except TypeError:
               pass

           dtype = np.dtype(dtype)

    return dtype

@jreback

pls add a whatsnew as well (put in API changes).
make a note in io.rst that the writing format is changing in 0.18.0 so we won't have forward compat (IOW older version of pandas cannot read newer version)

@jreback

this would close #12142 as well? (should add a warning in the docs about this) (e.g. show your matrix)

@kawochen

I can change all the keys to unicode, and then it will. will update tonight

@kawochen

Ah half way through but have to go. I will get it done tonight but is that too late? :(

@jreback

@kawochen

updated. manually tested with python 2.7 and 3.5 on linux (zlib and no compress).

jreback

@@ -265,28 +269,31 @@ def unconvert(values, dtype, compress=None):
return np.fromstring(values, dtype=dtype)
def _u(x):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use u() from compat/__init__.py (and amend with this).

@kawochen

@kawochen

updated. So this is how I tested P2 & P3 compatibility: use the script to generate legacy data for 0.18.0 in P2 and P3, put them in 0.18.0 folder, and let the tests pick them up.

@jreback

thanks!

I added some more test msgpacks for 0.17.1 to cover the bases

@jreback

2 participants

@kawochen @jreback