ENH: Add option to read_sas to infer encoding from file, then use encoding · Issue #48048 · pandas-dev/pandas (original) (raw)

Feature Type

Problem Description

Currently, pd.read_sas(f) only accepts and encoding argument that is either the correct python encoding for the file or None, in which case the default encoding, here utf-8, will be used.

This requires the user to know in-before which encoding the file uses. Fixing this is straightforward since the current implementation already reads the current file encoding into self.file_encoding.
However, this self.file_encoding is not used during decoding.

Additionally, the current encoding constants have incorrect names so they can't be used in python's .decode function. The correct names can be sampled by matching the SAS documentation with the Python standard reference.

Feature Description

To not break backwards-compatibility I would add another option called infer to the encoding parameter, such that the interface would look like this:

df = pd.read_sas(f, encoding="infer")

Alternative Solutions

The simplest alternative is to just know the file encoding of the sas file already and pass it as an argument.

Additional Context

No response