Issue 18572: Remove redundant note about surrogates in string escape doc (original) (raw)

Created on 2013-07-27 16:12 by steven.daprano, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg193787 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2013-07-27 16:12
The documentation for string escapes suggests that \uxxxx escapes can be used to generate characters in the Supplementary Multilingual Planes by using surrogate pairs: "Individual code units which form parts of a surrogate pair can be encoded using this escape sequence." http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals E.g. in Python 3.2: py> '\uD80C\uDC80' == '\U00013080' True but that is no longer the case in Python 3.3. I suggest the documentation should just remove that note.
msg193790 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2013-07-27 20:03
3.3.2: >>> '\uD80C\uDC80' == '\U00013080' False The statement that surrogate code units can be encoded this way is still true. Indeed, it is now the only way to get such code units into a string. The suggestion that a pair will make an astral char is now false. The sentence could be changed to "Individual surrogate code units can be encoded using this escape sequence." On the other hand, the same is true of any BMP char, including all the other non-graphic chars that can only be entered this way. So I think the sentence, if not deleted, should be replaced by what seems to me a more useful (complete) statement. "Any Basic Multilingual Plane (BMP) codepoint can be encoded using this escape sequence."
msg193860 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-07-29 12:27
Python 3.2.3 (default, Jun 15 2013, 14:13:52) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> '\uD80C\uDC80' '\ud80c\udc80' >>> '\uD80C\uDC80' == '\U00013080' False
msg193870 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2013-07-29 15:03
On 29/07/13 22:27, R. David Murray wrote: >>>> '\uD80C\uDC80' == '\U00013080' > False Are you running a wide build? In a narrow build, it returns True.
msg193881 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-07-29 16:58
Probably. I think the default build on Gentoo is wide. That seems to make the existing text even more incorrect :)
msg194671 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-08-08 13:34
I think it's OK to remove the sentence. Converting a surrogate pair to a non-BMP char is something that works only while decoding a UTF-16 byte sequence. Surrogates are invalid in UTF-8/32, and while dealing with Unicode strings, surrogates have no special meaning and are no different from any other codepoint, whether they are lone or paired.
msg264080 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-04-24 00:13
New changeset 79e7808c3941 by Berker Peksag in branch '3.5': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/79e7808c3941 New changeset ee815d3535f5 by Berker Peksag in branch 'default': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/ee815d3535f5
msg264081 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2016-04-24 00:14
I removed the sentence in 3.5 and default branches.

History
Date	User	Action	Args
2022-04-11 14:57:48	admin	set	github: 62772
2016-04-24 00:14:43	berker.peksag	set	status: open -> closedversions: + Python 3.5, Python 3.6, - Python 3.3, Python 3.4nosy: + berker.peksagmessages: + resolution: fixedstage: needs patch -> resolved
2016-04-24 00:13:50	python-dev	set	nosy: + python-devmessages: +
2013-08-08 13:34:14	ezio.melotti	set	nosy: + ezio.melottimessages: +
2013-07-29 16:58:04	r.david.murray	set	messages: +
2013-07-29 15:03:29	steven.daprano	set	messages: +
2013-07-29 12:27:06	r.david.murray	set	nosy: + r.david.murraymessages: +
2013-07-27 20:03:56	terry.reedy	set	nosy: + terry.reedymessages: +
2013-07-27 19:05:45	terry.reedy	set	stage: needs patchtype: behaviorversions: + Python 3.4
2013-07-27 16:12:13	steven.daprano	create