Issue 32491: base64.decode: linebreaks are not ignored (original) (raw)

Issue32491

Created on 2018-01-03 23:35 by gregory.p.smith, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg309449 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2018-01-03 23:35
I've tried reading various RFCs around Base64 encoding, but I couldn't make the ends meet. Yet there is an inconsistency between base64.decodebytes() and base64.decode() in that how they handle linebreaks that were used to collate the encoded text. Below is an example of what I'm talking about: >>> import base64 >>> foo = base64.encodebytes(b'123456789') >>> foo b'MTIzNDU2Nzg5\n' >>> foo = b'MTIzND\n' + b'U2Nzg5\n' >>> foo b'MTIzND\nU2Nzg5\n' >>> base64.decodebytes(foo) b'123456789' >>> from io import BytesIO >>> bytes_in = BytesIO(foo) >>> bytes_out = BytesIO() >>> bytes_in.seek(0) 0 >>> base64.decode(bytes_in, bytes_out) Traceback (most recent call last): File "", line 1, in File "/somewhere/lib/python3.6/base64.py", line 512, in decode s = binascii.a2b_base64(line) binascii.Error: Incorrect padding >>> bytes_in = BytesIO(base64.encodebytes(b'123456789')) >>> bytes_in.seek(0) 0 >>> base64.decode(bytes_in, bytes_out) >>> bytes_out.getvalue() b'123456789' Obviously, I'd expect encodebytes() and encode both to either accept or to reject the same input. Thanks. Oleg via Oleg Sivokon on python-dev (who was having trouble getting bugs.python.org account creation to work)
msg309451 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-01-04 00:52
This reduces to the following: >>> from binascii import a2b_base64 as f >>> f(b'MTIzND\nU2Nzg5\n') b'123456789' >>> f(b'MTIzND\n') Traceback (most recent call last): File "", line 1, in binascii.Error: Incorrect padding That is, decode does its decoding line by line, whereas decodebytes passes the entire object to a2b_base64 as a single entity. Apparently a2b_base64 looks at the padding for the entirety of what it is given, which I believe is in accordance with the RFC. This means that decode is fundamentally broken per the RFC, and there is no obvious way to fix it without adding an incremental decoder to binascii. And an incremental decoder probably belongs in codecs (assuming we ever resolved the transcode interface issue, I can't actually remember...). Note that it will work as long as an "integral" number of base64 encoding units are in each line.
msg309454 - (view)	Author: Martin Panter (martin.panter) *	Date: 2018-01-04 03:44
I wrote an incremental base-64 decoder for the "codecs" module in Issue 27799, which you could use. It just does some preprocessing using a regular expression to pick four-character chunks before passing the data to a2b_base64. Or maybe implementing it properly in the "binascii" module is better. Quickly reading RFC 2045, I saw it says "All line breaks or other characters not found in Table 1 [64 alphabet characters plus padding character] must be ignored by decoding software." So this is a real bug, although I think a base-64 encoder that triggers it would be rare.

History
Date	User	Action	Args
2022-04-11 14:58:56	admin	set	github: 76672
2018-01-04 03:44:16	martin.panter	set	nosy: + martin.pantermessages: +
2018-01-04 00:52:25	r.david.murray	set	nosy: + r.david.murraymessages: +
2018-01-03 23:35:25	gregory.p.smith	create