Roundtrip unicode strings even when written as character arrays by shoyer · Pull Request #1648 · pydata/xarray (original) (raw)

Unicode strings (str on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when using engine='scipy'). This is controlled by the _Encoding attribute convention, which is also understood directly by the netCDF4-Python interface.

This PR also resolves some long-standing technical debt in the test suite related to the hacky use of decode_bytes in assert_allclose (recently encountered by @jhamman in #1609). Once we're sure that we don't need it anymore, I'd like to deprecate and eventually remove the decode_bytes option.

Note that there are still a few unresolved issues with regards to serializing missing values in strings, so I've intentionally held off on documenting the handling of _FillValue for now. I'd like to resolve those separately after discussion in #1647, but ideally this could make it in for the v0.10 release.