msg318691 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-04 19:53 |
All base64 decoding methods fail to decode a valid base64 string, throwing 'incorrect padding' regardless of the string padding. Here's an example: >>> base64.urlsafe_b64decode('AQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ===') Traceback (most recent call last): File "", line 1, in File "/export/apps/python/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode return b64decode(s) File "/export/apps/python/3.6/lib/python3.6/base64.py", line 87, in b64decode return binascii.a2b_base64(s) binascii.Error: Incorrect padding The same string gets decoded without any issues using Perl's MIME::Base64 module or Java. So far Python's base64 module is the only one that fails to decode it. |
|
|
msg318697 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2018-06-04 20:29 |
This doesn't look like a valid base64 string to me: the padding (if present) at the end of the string should be either "=" or "==", never "===". Is the length of the decoded string equal to 58? If so, what's the last character of that decoded string? Whatever it is, it should end up being encoded as "xx==" (for some values of "x"), not as "Q===". |
|
|
msg318701 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-04 20:45 |
It doesn’t matter how many “=“ characters are appended - it’s always throwing the same exception :( |
|
|
msg318704 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-04 21:04 |
Apologies: apparently this particular string was given with one character missing in the beginning - should be "AAQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ". In this case, the problem is the exception itself: it's not an "incorrect padding" issue. |
|
|
msg318716 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-06-04 22:15 |
I always assumed that any string composed of valid base64 characters could be decoded to *something* if you added some padding characters, but apparently that was an invalid assumption. I agree that the message is incorrect for this case. |
|
|
msg318721 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-04 22:45 |
Ideally there needs to be an option to ignore non-fatal errors. Indeed, pretty much any string should be base64-decodeable. Take a look at Perl's MIME::Base64 - it can decode pretty much any random string (so long as it only contains valid base64 characters). |
|
|
msg318727 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-05 08:41 |
A base64-encoded string's length can only have a remainder of 0, 2 or 3, but never 1. This is why the padding you get when encoding can only be '=' or '==' but never '==='. |
|
|
msg318728 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-05 08:43 |
Oops: when I wrote "length can only have a remainder of ..." I meant remainder upon division by 4. |
|
|
msg318729 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-05 08:55 |
I'm not sure that "Invalid length" is better than "Invalid padding". Doesn't it mean the same? |
|
|
msg318733 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-05 10:43 |
They're not the same. When the encoded string's length modulu 4 is 2 or 3, there just need to be (at least) 2 or 1 padding characters ('=') for decoding to be successful, due to our decoder being rather strict. Less strict decoders may ignore the missing padding and successfully decode the encoded string. When the remainder is 0, no padding is needed and everything is fine. When the remainder is 1, the encoded string is simply invalid. It is not a padding issue. There is no valid way to decode the encoded string. |
|
|
msg318752 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-05 13:07 |
I think something like “invalid length” message does make more sense in this case than the misleading “incorrect padding”. It would also be great if there was a way to force it to try its best at decoding the string. :) |
|
|
msg318755 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-06-05 13:30 |
I agree, but that would be a separate enhancement PR. The email library would use that capability if it existed...currently it adds padding in a loop trying to do the decode if it gets the invalid padding message. Which of course is going to fail for this case. |
|
|
msg318762 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-05 15:00 |
Would an exception message as following be acceptable? "Invalid base64-encoded string: length cannot be 1 more than a multiple of 4" |
|
|
msg318789 - (view) |
Author: Dmitry (dniq) |
Date: 2018-06-05 23:12 |
@taleinat - yes, that does look much better! |
|
|
msg319093 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-06-08 18:52 |
Is the new message a clarification, in which case we would typically not backport, or a correction, which we usually would? |
|
|
msg319094 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-08 18:58 |
It is more a correction than a clarification. After looking through the module some more, I'm considering using binascii.Incomplete for this case, since it seems appropriate enough that it's worth using it rather than adding another, separate exception. |
|
|
msg319101 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-06-08 19:50 |
I think that would move it even more into the realm of a bugfix, then, since code that cared about specific binascii exceptions could be looking for that one already. |
|
|
msg319102 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-08 19:55 |
Code using only a2b_base64() would likely not be expecting this exception. We have at least one such example in the stdlib: decode_b() in Lib/email/_encoded_words.py (which needs to be fixed regardless; see #27397). |
|
|
msg319104 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-08 20:09 |
PR is ready with an improved exception message and raising binascii.Incomplete rather than binascii.Error. A final review would be welcome! |
|
|
msg319202 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-10 07:01 |
New changeset 1b85c71a2136d3fa6a1da05b27b1fe4e4b8ee45e by Tal Einat in branch 'master': bpo-33770: improve base64 exception message for encoded inputs of invalid length (#7416) https://github.com/python/cpython/commit/1b85c71a2136d3fa6a1da05b27b1fe4e4b8ee45e |
|
|
msg319248 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-06-10 21:37 |
New changeset 053d6c5ce246e6ba9c046467b02a0b6ba4abb8bf by Ned Deily (Miss Islington (bot)) in branch '3.7': bpo-33770: improve base64 exception message for encoded inputs of invalid length (GH-7416) (GH-7602) https://github.com/python/cpython/commit/053d6c5ce246e6ba9c046467b02a0b6ba4abb8bf |
|
|
msg319249 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-06-10 21:43 |
I backported Tal's fix for 3.7.0rc1. I am less certain about backporting to 3.6 and 2.7 at this stage of their lives but I don't have a strong feeling about it so I'll leave the issue open for that. |
|
|
msg319272 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-11 05:26 |
The change is not entirely backward-compatible, so not back-porting before 3.7 seems good to me. IMO this should be closed. |
|
|