Issue 18572: Remove redundant note about surrogates in string escape doc (original) (raw)

Created on 2013-07-27 16:12 by steven.daprano, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg193787 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-07-27 16:12
The documentation for string escapes suggests that \uxxxx escapes can be used to generate characters in the Supplementary Multilingual Planes by using surrogate pairs: "Individual code units which form parts of a surrogate pair can be encoded using this escape sequence." http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals E.g. in Python 3.2: py> '\uD80C\uDC80' == '\U00013080' True but that is no longer the case in Python 3.3. I suggest the documentation should just remove that note.
msg193790 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-07-27 20:03
3.3.2: >>> '\uD80C\uDC80' == '\U00013080' False The statement that surrogate code units can be encoded this way is still true. Indeed, it is now the only way to get such code units into a string. The suggestion that a pair will make an astral char is now false. The sentence could be changed to "Individual surrogate code units can be encoded using this escape sequence." On the other hand, the same is true of *any* BMP char, including all the *other* non-graphic chars that can only be entered this way. So I think the sentence, if not deleted, should be replaced by what seems to me a more useful (complete) statement. "Any Basic Multilingual Plane (BMP) codepoint can be encoded using this escape sequence."
msg193860 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-29 12:27
Python 3.2.3 (default, Jun 15 2013, 14:13:52) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> '\uD80C\uDC80' '\ud80c\udc80' >>> '\uD80C\uDC80' == '\U00013080' False
msg193870 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-07-29 15:03
On 29/07/13 22:27, R. David Murray wrote: >>>> '\uD80C\uDC80' == '\U00013080' > False Are you running a wide build? In a narrow build, it returns True.
msg193881 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-29 16:58
Probably. I think the default build on Gentoo is wide. That seems to make the existing text even more incorrect :)
msg194671 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-08-08 13:34
I think it's OK to remove the sentence. Converting a surrogate pair to a non-BMP char is something that works only while decoding a UTF-16 byte sequence. Surrogates are invalid in UTF-8/32, and while dealing with Unicode strings, surrogates have no special meaning and are no different from any other codepoint, whether they are lone or paired.
msg264080 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-24 00:13
New changeset 79e7808c3941 by Berker Peksag in branch '3.5': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/79e7808c3941 New changeset ee815d3535f5 by Berker Peksag in branch 'default': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/ee815d3535f5
msg264081 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-24 00:14
I removed the sentence in 3.5 and default branches.
History
Date User Action Args
2022-04-11 14:57:48 admin set github: 62772
2016-04-24 00:14:43 berker.peksag set status: open -> closedversions: + Python 3.5, Python 3.6, - Python 3.3, Python 3.4nosy: + berker.peksagmessages: + resolution: fixedstage: needs patch -> resolved
2016-04-24 00:13:50 python-dev set nosy: + python-devmessages: +
2013-08-08 13:34:14 ezio.melotti set nosy: + ezio.melottimessages: +
2013-07-29 16:58:04 r.david.murray set messages: +
2013-07-29 15:03:29 steven.daprano set messages: +
2013-07-29 12:27:06 r.david.murray set nosy: + r.david.murraymessages: +
2013-07-27 20:03:56 terry.reedy set nosy: + terry.reedymessages: +
2013-07-27 19:05:45 terry.reedy set stage: needs patchtype: behaviorversions: + Python 3.4
2013-07-27 16:12:13 steven.daprano create