Issue 23088: Document that PyUnicode_AsUTF8() returns a null-terminated string (original) (raw)

Created on 2014-12-19 04:42 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
utf8-null.patch	martin.panter,2014-12-19 04:42	review
utf8-null.v2.patch	martin.panter,2015-03-10 10:47	review
utf8-null.v3.patch	martin.panter,2015-03-12 00:51	review
utf8-null.v4.patch	martin.panter,2015-03-31 04:11	review
utf8-null.v5.patch	martin.panter,2015-05-13 12:08	review

Messages (15)
msg232925 - (view)	Author: Martin Panter (martin.panter) *	Date: 2014-12-19 04:42
As discussed in , and later confirmed in the code
msg233028 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-12-22 23:00
This looks good to me.
msg236878 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-28 11:33
May be mention that the result of PyUnicode_AsUTF8() can contain null bytes? And the same for PyBytes_AS_STRING()/PyBytes_AsString()?
msg237743 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-03-10 10:47
Posting a new patch that says that the NUL is always appended for both Unicode and Bytes, and explicitly says that internal NULs are allowed.
msg237747 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-03-10 11:16
There are other functions that returns null terminated data: PyByteArray_AsString(), PyBytes_AsStringAndSize(), PyUnicode_AS_UNICODE(), PyUnicode_AsUCS4Copy() PyUnicode_AsUnicode(), PyUnicode_AsUnicodeAndSize(), PyUnicode_AsWideCharString() and may be more. See also examples of notes about embedded null characters. And for consistency with all other documentation this should be written as "null byte/character", not NUL.
msg237750 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-03-10 11:27
Serhiy Storchaka added the comment: > And for consistency with all other documentation this should be written as "null byte/character", not NUL. Agreed!
msg237751 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-03-10 11:29
Yes, and for agreement with Victor. ;-)
msg237909 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-03-12 00:51
Posting a new patch updating the documentation for some of the extra functions Serhiy mentioned. Also changed references of “NUL”, “nul” and “0” characters to “null”. I’m not very familiar with Python’s C API, so I am mainly relying on what you guys say without much of my own verification. But if there are other related doc fixes you can think of, I’m happy to include them. The PyUnicode_AsWideCharString() function already seems to document null termination well enough, so I did not change it. Let me know if you had a specific change in mind.
msg239661 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-03-31 04:11
utf8-null.v4.patch: * Clarified some mentions of “string” and “character” as bytes or code points * Copied the warning about embedded nulls to PyUnicode_AS_UNICODE()
msg239670 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-03-31 05:49
The patch LGTM, but someone other should look on it. David, could you please make a look?
msg242418 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-02 17:53
Added some review comments.
msg243076 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-05-13 12:08
Thanks for looking at this David. I am posting utf8-null.v5.patch, which tweaks some of the wording.
msg243132 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-05-14 00:32
New changeset 99d2f83290c0 by R David Murray in branch '3.4': #23088: Clarify null termination of bytes and strings in C API. https://hg.python.org/cpython/rev/99d2f83290c0 New changeset 863f7c57081b by R David Murray in branch 'default': Merge: #23088: Clarify null termination of bytes and strings in C API. https://hg.python.org/cpython/rev/863f7c57081b
msg243134 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-14 00:35
Oh, I just realized I committed this without checking how it rendered...
msg243135 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-05-14 00:41
OK, I didn't see anything obvious at least :) Thanks, Martin.

History
Date	User	Action	Args
2022-04-11 14:58:11	admin	set	github: 67277
2015-05-14 00:41:12	r.david.murray	set	status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2015-05-14 00:35:21	r.david.murray	set	messages: +
2015-05-14 00:32:33	python-dev	set	nosy: + python-devmessages: +
2015-05-13 12:08:49	martin.panter	set	files: + utf8-null.v5.patchmessages: +
2015-05-02 17:53:42	r.david.murray	set	messages: +
2015-03-31 05:49:03	serhiy.storchaka	set	nosy: + r.david.murraymessages: +
2015-03-31 04:11:11	martin.panter	set	files: + utf8-null.v4.patchmessages: +
2015-03-12 00:51:22	martin.panter	set	files: + utf8-null.v3.patch
2015-03-12 00:51:09	martin.panter	set	messages: +
2015-03-10 11:29:31	serhiy.storchaka	set	messages: +
2015-03-10 11:27:57	vstinner	set	messages: +
2015-03-10 11:16:40	serhiy.storchaka	set	messages: +
2015-03-10 10:47:52	martin.panter	set	files: + utf8-null.v2.patchmessages: +
2015-02-28 11:33:06	serhiy.storchaka	set	nosy: + serhiy.storchakamessages: +
2014-12-22 23:00:53	pitrou	set	versions: + Python 3.5nosy: + pitroumessages: + type: behaviorstage: commit review
2014-12-20 22:33:25	vstinner	set	nosy: + vstinner
2014-12-19 04:42:34	martin.panter	create