Roundtrip unicode strings even when written as character arrays by shoyer · Pull Request #1648 · pydata/xarray (original) (raw)
Unicode strings (str
on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when using engine='scipy'
). This is controlled by the _Encoding
attribute convention, which is also understood directly by the netCDF4-Python interface.
This PR also resolves some long-standing technical debt in the test suite related to the hacky use of decode_bytes
in assert_allclose
(recently encountered by @jhamman in #1609). Once we're sure that we don't need it anymore, I'd like to deprecate and eventually remove the decode_bytes
option.
Note that there are still a few unresolved issues with regards to serializing missing values in strings, so I've intentionally held off on documenting the handling of _FillValue
for now. I'd like to resolve those separately after discussion in #1647, but ideally this could make it in for the v0.10 release.
- Closes Unicode strings unexpectedly transformed to byte strings upon open_dataset #1638
- Tests added / passed
- Passes
git diff upstream/master | flake8 --diff
- Fully documented, including
whats-new.rst
for all changes andapi.rst
for new API