bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. by serhiy-storchaka · Pull Request #14304 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation5 Commits2 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- The UTF-8 incremental decoders fails now fast if encounter
a sequence that can't be handled by the error handler. - The UTF-16 incremental decoders with the surrogatepass error
handler decodes now a lone low surrogate with final=False.
https://bugs.python.org/issue24214
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for UnicodeDecodeError
? I could see the below test raising UnicodeDecodeError
like older behavior with the PR where as it returns 'f' on master.
from codecs import getincrementaldecoder decoder = getincrementaldecoder("utf-8")() print(decoder.decode(b'f\xf1\xf6rd', False))
I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited.
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
miss-islington added a commit that referenced this pull request
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
vstinner pushed a commit that referenced this pull request
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
ned-deily pushed a commit to ned-deily/cpython that referenced this pull request
…thonGH-14304) (pythonGH-14369)
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304)
The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com
lisroach pushed a commit to lisroach/cpython that referenced this pull request
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
DinoV pushed a commit to DinoV/cpython that referenced this pull request
- The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
- The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
Labels
An unexpected behavior, bug, or error