ENH: The XLS_SIGNATURE is too restrictive · Issue #41225 · pandas-dev/pandas (original) (raw)
Pandas 1.2.4 fails to open XLS files generated by Lotus 1-2-3 because they have a different header than the one expected in XLS_SIGNATURE
(io.excel._base
line 1001 and then 1050).
I know I'm likely to be the only person in the world with this issue, but my boss still uses Lotus 1-2-3 🙄
I'm willing to submit a pull request with my fix if people are okay with it. The fix basically involves changing XLS_SIGNATURE
to a list:
XLS_SIGNATURES = [
b"\xD0\xCF\x11\xE0\xA1\xB1\xA1\xE1",
b"\t\x04\x06",
]
ZIP_SIGNATURE = b"PK\x03\x04"
PEEK_SIZE = max(map(len, XLS_SIGNATURES + [ZIP_SIGNATURE]))
and then later:
if any(peek.startswith(signature) for signature in XLS_SIGNATURES):
return "xls"
I don't know if the second XLS_SIGNATURES
value is "good," but it works. I can attach an XLS that was exported by 1-2-3 if it would be beneficial to others, or if they'd be able to help me find a better second value for XLS_SIGNATURES
. I tried looking through the byte-code for anything similar to the "DOCFILE" XLS signature, but didn't see anything.