Issue 34736: Confusing base64.b64decode output (original) (raw)

Created on 2018-09-19 13:00 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9563 merged taleinat,2018-09-25 13:43
PR 9614 merged miss-islington,2018-09-28 05:57
Messages (9)
msg325757 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-19 13:00
The following error message in Python 3.7 is confusing and unhelpful: >>> s = "FS9qzW_oliGH_Yo=" >>> base64.b64decode(s) Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Invalid base64-encoded string: length cannot be 1 more than a multiple of 4 >>> len(s) % 4 0 This had me staring at the string s and scratching my head, thinking: "s doesn't have length 1 more than a multiple of 4, either with or without the "=" padding byte. What's this talking about?" The actual user error here is a failure to use the "altchars" argument correctly: >>> base64.b64decode(s, altchars="-_") b'\x15/j\xcdo\xe8\x96!\x87\xfd\x8a' The error message in versions prior to Python 3.7 is no more helpful, but it's not quite as misleading, either. Python 3.6.6 (default, Jun 28 2018, 05:43:53) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import base64 >>> s = "FS9qzW_oliGH_Yo=" >>> base64.b64decode(s) Traceback (most recent call last): File "", line 1, in File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding Related: #1466065, #33770
msg325844 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-20 05:42
I welcome suggestions for improvement! We could either find a better error message covering all cases, or emit a more specific message when non-base64 characters have been skipped. Unfortunately, skipping non-base64 characters is a basic assumption of a2b_base64 which is used under the hood, and it currently provides no indication whether such characters are skipped, and if so how many. Also, it is common in some contexts to have line breaks in base64-encoded data, which are skipped when decoding.
msg325845 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-20 06:22
Perhaps something such as "number of base64 data characters cannot be 1 more than a multiple of 4"?
msg326106 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-22 18:25
Yes, I'm having trouble thinking of an alternative message that's both accurate and helpful. I think what I _really_ want as a user is for b64decode to reject strings containing "_" and/or "-" as invalid (assuming altchars has been provided), but that would be a backwards-incompatible change, and has already been discussed in #1466065. Not sure it's worth revisiting that discussion. Maybe just something like "invalid number of base64 characters"? We could even embed the actual number of base64 characters in the error message, which would give the user a clue that some characters are not being considered base64 characters. I find the "1 more than a multiple of 4" wording a bit clunky, and potentially misleading.
msg326111 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-22 20:08
> I think what I _really_ want as a user is for b64decode to reject strings containing... Do you mean you'd like to have this behavior by default? One can already use validate=True to have invalid characters cause an exception. I too find it surprising the False is the default, but changing this would be backwards incompatible. > I find the "1 more than a multiple of 4" wording a bit clunky, and potentially misleading. I chose that to avoid mentioning "modulu" or "remainder". I find it straightforward and clear, though admittedly a bit long and clumsy. I don't believe it is inherently misleading, though. I like your idea of including the number of base64 characters in the error message. I find the phrase "base64 characters" ambiguous, though. I suggest: "Invalid base64-encoded string: number of data characters (13) cannot be 1 more than a multiple of 4"
msg326243 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-24 15:03
> Do you mean you'd like to have this behavior by default? Ideally, yes, but I think that's out of scope for this bug report. > I don't believe it is inherently misleading, though. Yes, the "1 more than a multiple of 4" bit is clunky, but not misleading per se. It becomes misleading in combination with "length cannot be", since the obvious meaning of "length" here is the length of the string passed to the "b64decode" function, and that's not the correct interpretation in this context. Your suggested rewording gets rid of the "length" reference, and looks like an improvement to me. "data characters" seems fairly clear, too.
msg326606 - (view) Author: miss-islington (miss-islington) Date: 2018-09-28 05:57
New changeset 1fba2ffc37da52c08db51fe4360459990b0311c9 by Miss Islington (bot) (Tal Einat) in branch 'master': bpo-34736: improve error message for invalid length b64decode inputs (GH-9563) https://github.com/python/cpython/commit/1fba2ffc37da52c08db51fe4360459990b0311c9
msg326608 - (view) Author: miss-islington (miss-islington) Date: 2018-09-28 06:12
New changeset 7e35081bc828291da5793db49ab45dee4fda5043 by Miss Islington (bot) in branch '3.7': bpo-34736: improve error message for invalid length b64decode inputs (GH-9563) https://github.com/python/cpython/commit/7e35081bc828291da5793db49ab45dee4fda5043
msg326609 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-28 06:14
Thanks, Mark!
History
Date User Action Args
2022-04-11 14:59:06 admin set github: 78917
2018-09-28 06:14:40 taleinat set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2018-09-28 06:12:58 miss-islington set messages: +
2018-09-28 05:57:36 miss-islington set pull_requests: + <pull%5Frequest9014>
2018-09-28 05:57:33 miss-islington set nosy: + miss-islingtonmessages: +
2018-09-25 13:43:16 taleinat set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest8965>
2018-09-24 15:03:03 mark.dickinson set messages: +
2018-09-22 20:08:14 taleinat set messages: +
2018-09-22 18:25:20 mark.dickinson set messages: +
2018-09-22 17:28:08 taleinat set nosy: + ned.deily
2018-09-20 06:22:39 taleinat set messages: +
2018-09-20 05:42:17 taleinat set messages: + versions: + Python 3.8
2018-09-19 13:58:18 serhiy.storchaka set nosy: + taleinatcomponents: + Library (Lib)
2018-09-19 13:00:53 mark.dickinson create