ENH: The XLS_SIGNATURE is too restrictive · Issue #41225 · pandas-dev/pandas (original) (raw)

Pandas 1.2.4 fails to open XLS files generated by Lotus 1-2-3 because they have a different header than the one expected in XLS_SIGNATURE (io.excel._base line 1001 and then 1050).

I know I'm likely to be the only person in the world with this issue, but my boss still uses Lotus 1-2-3 🙄

I'm willing to submit a pull request with my fix if people are okay with it. The fix basically involves changing XLS_SIGNATURE to a list:

XLS_SIGNATURES = [
  b"\xD0\xCF\x11\xE0\xA1\xB1\xA1\xE1",
  b"\t\x04\x06",
]
ZIP_SIGNATURE = b"PK\x03\x04"
PEEK_SIZE = max(map(len, XLS_SIGNATURES + [ZIP_SIGNATURE]))

and then later:

if any(peek.startswith(signature) for signature in XLS_SIGNATURES):
  return "xls"

I don't know if the second XLS_SIGNATURES value is "good," but it works. I can attach an XLS that was exported by 1-2-3 if it would be beneficial to others, or if they'd be able to help me find a better second value for XLS_SIGNATURES. I tried looking through the byte-code for anything similar to the "DOCFILE" XLS signature, but didn't see anything.