encoding not respected on read_msgpack · Issue #10581 · pandas-dev/pandas (original) (raw)
as discussed on https://groups.google.com/forum/#!topic/pydata/ngROaML_hLI
encoding does not seem to be respected on reading a msgpack, below i am expecting to get back what
I put in as utf8
In [17]: s
Out[17]: u'\u2019'
In [18]: s = pd.Series({'a' : u"\u2019" })
In [19]: s.values[0]
Out[19]: u'\u2019'
In [20]: pd.read_msgpack(s.to_msgpack(encoding='utf8')).values[0]
Out[20]: u'\xe2\x80\x99'
in stepping through, part of the problem seems to be that in the call to unpack on https://github.com/pydata/pandas/blob/master/pandas/io/packers.py#L134 that there is no encoding argument passed and so it defaults to latin1 in https://github.com/pydata/pandas/blob/master/pandas/io/packers.py#L558
changing L134 to :
l = list(unpack(fh, **kwargs))
and passing the encoding like:
pandas.read_msgpack(m, encoding='utf8')
makes it work for me, however i don't have en environment set up to submit this as a pull request via GH, and we're still using 0.14.1 due to compatibility issues.