bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. by serhiy-storchaka · Pull Request #14304 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation5 Commits2 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

The UTF-8 incremental decoders fails now fast if encounter
a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error
handler decodes now a lone low surrogate with final=False.

https://bugs.python.org/issue24214

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.

Is there a test case similar to the one in wsproto project's test present in test_codecs.py to check for UnicodeDecodeError ? I could see the below test raising UnicodeDecodeError like older behavior with the PR where as it returns 'f' on master.

from codecs import getincrementaldecoder decoder = getincrementaldecoder("utf-8")() print(decoder.decode(b'f\xf1\xf6rd', False))

I was not sure that we should guarantee this behavior. But new tests helped to make the fix more limited.

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

Jun 25, 2019

…-14304)

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka storchaka@gmail.com

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

Jun 25, 2019

…-14304)

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka storchaka@gmail.com

miss-islington added a commit that referenced this pull request

Jun 25, 2019

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka storchaka@gmail.com

vstinner pushed a commit that referenced this pull request

Jun 25, 2019

…-14304) (GH-14369)

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka storchaka@gmail.com

ned-deily pushed a commit to ned-deily/cpython that referenced this pull request

Jul 2, 2019

…thonGH-14304) (pythonGH-14369)

bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (pythonGH-14304)
The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False. (cherry picked from commit 894263b)

Co-authored-by: Serhiy Storchaka storchaka@gmail.com

lisroach pushed a commit to lisroach/cpython that referenced this pull request

Sep 10, 2019

…-14304)

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.

DinoV pushed a commit to DinoV/cpython that referenced this pull request

Jan 14, 2020

…-14304)

The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler.
The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.

Labels

type-bug

An unexpected behavior, bug, or error