DEPR: GH10623 remove items from msgpack.encode for blocks by kawochen · Pull Request #12129 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation23 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
hmm legacy file for 0.17.1 not passing. I think it's tz-related. Will check later.
passing now. had to make changes for DatetimeTZBlock
(it wasn't tested?) and skip files packed in P2 with 0.17 :( due to #12142
'locs': b.mgr_locs.as_array, |
---|
'values': convert(b.values), |
'blocks': [{'locs': b.mgr_locs.as_array, |
'values': convert_if_not_dt_index(b.values), |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is way hacky. you shouldn't need to specifically do this (test for DatetimeIndex
. It should just be converted normally. The entire difference here is the dispatch on the dtype, which in this case would be a string.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it gets converted to a numpy array now, wouldn't the whole encode
/decode
for DatetimeIndex
be skipped? Or is there a way to construct DatetimeTZBlock
from numpy arrays?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think you could trivially change core/internals.py/make_block
to take a DatetimeTZDtype
and correctly create it. Then you would have to make to serialize that properly (as a string is prob ok) I think (you would need to fix the reverse lookup as it only deals with numpy atm).
I think the same problem exists with categorical, prob not testing it.
dtype = kwargs.pop('dtype') |
---|
if isinstance(dtype, compat.string_types): |
dtype = DatetimeTZDtype.construct_from_string(dtype) |
values = values.tz_localize('UTC').tz_convert(dtype.tz) |
if not isinstance(values, self._holder): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should do this first, then you can coerce if you have a dtype
fyi some of these fixes will prob address: #11455, maybe #11755
updated... but I forgot to squash :( will do that later
@@ -245,6 +246,10 @@ def unconvert(values, dtype, compress=None): |
---|
if dtype == np.object_: |
return np.array(values, dtype=object) |
if isinstance(dtype, string_types): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that you have to do this, we prob need something like (in core/common.py)
def pandas_dtype(dtype):
if isinstance(dtype, compat.string_types):
try:
return DatetimeTZDtype.construct_from_string(dtype)
except TypeError:
pass
try:
return CategoricalDtype.construct_from_string(dtype)
except TypeError:
pass
dtype = np.dtype(dtype)
return dtype
pls add a whatsnew as well (put in API changes).
make a note in io.rst
that the writing format is changing in 0.18.0 so we won't have forward compat (IOW older version of pandas cannot read newer version)
this would close #12142 as well? (should add a warning in the docs about this) (e.g. show your matrix)
I can change all the keys to unicode, and then it will. will update tonight
Ah half way through but have to go. I will get it done tonight but is that too late? :(
updated. manually tested with python 2.7 and 3.5 on linux (zlib and no compress).
@@ -265,28 +269,31 @@ def unconvert(values, dtype, compress=None): |
---|
return np.fromstring(values, dtype=dtype) |
def _u(x): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use u()
from compat/__init__.py
(and amend with this).
updated. So this is how I tested P2 & P3 compatibility: use the script to generate legacy data for 0.18.0 in P2 and P3, put them in 0.18.0 folder, and let the tests pick them up.
thanks!
I added some more test msgpacks for 0.17.1 to cover the bases
2 participants