Issue 23088: Document that PyUnicode_AsUTF8() returns a null-terminated string (original) (raw)

Created on 2014-12-19 04:42 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8-null.patch martin.panter,2014-12-19 04:42 review
utf8-null.v2.patch martin.panter,2015-03-10 10:47 review
utf8-null.v3.patch martin.panter,2015-03-12 00:51 review
utf8-null.v4.patch martin.panter,2015-03-31 04:11 review
utf8-null.v5.patch martin.panter,2015-05-13 12:08 review
Messages (15)
msg232925 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-19 04:42
As discussed in , and later confirmed in the code
msg233028 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-22 23:00
This looks good to me.
msg236878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-28 11:33
May be mention that the result of PyUnicode_AsUTF8() can contain null bytes? And the same for PyBytes_AS_STRING()/PyBytes_AsString()?
msg237743 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-10 10:47
Posting a new patch that says that the NUL is always appended for both Unicode and Bytes, and explicitly says that internal NULs are allowed.
msg237747 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-10 11:16
There are other functions that returns null terminated data: PyByteArray_AsString(), PyBytes_AsStringAndSize(), PyUnicode_AS_UNICODE(), PyUnicode_AsUCS4Copy() PyUnicode_AsUnicode(), PyUnicode_AsUnicodeAndSize(), PyUnicode_AsWideCharString() and may be more. See also examples of notes about embedded null characters. And for consistency with all other documentation this should be written as "null byte/character", not NUL.
msg237750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-10 11:27
Serhiy Storchaka added the comment: > And for consistency with all other documentation this should be written as "null byte/character", not NUL. Agreed!
msg237751 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-10 11:29
Yes, and for agreement with Victor. ;-)
msg237909 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-12 00:51
Posting a new patch updating the documentation for some of the extra functions Serhiy mentioned. Also changed references of “NUL”, “nul” and “0” characters to “null”. I’m not very familiar with Python’s C API, so I am mainly relying on what you guys say without much of my own verification. But if there are other related doc fixes you can think of, I’m happy to include them. The PyUnicode_AsWideCharString() function already seems to document null termination well enough, so I did not change it. Let me know if you had a specific change in mind.
msg239661 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-31 04:11
utf8-null.v4.patch: * Clarified some mentions of “string” and “character” as bytes or code points * Copied the warning about embedded nulls to PyUnicode_AS_UNICODE()
msg239670 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-31 05:49
The patch LGTM, but someone other should look on it. David, could you please make a look?
msg242418 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-02 17:53
Added some review comments.
msg243076 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-13 12:08
Thanks for looking at this David. I am posting utf8-null.v5.patch, which tweaks some of the wording.
msg243132 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-14 00:32
New changeset 99d2f83290c0 by R David Murray in branch '3.4': #23088: Clarify null termination of bytes and strings in C API. https://hg.python.org/cpython/rev/99d2f83290c0 New changeset 863f7c57081b by R David Murray in branch 'default': Merge: #23088: Clarify null termination of bytes and strings in C API. https://hg.python.org/cpython/rev/863f7c57081b
msg243134 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-14 00:35
Oh, I just realized I committed this without checking how it rendered...
msg243135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-14 00:41
OK, I didn't see anything obvious at least :) Thanks, Martin.
History
Date User Action Args
2022-04-11 14:58:11 admin set github: 67277
2015-05-14 00:41:12 r.david.murray set status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2015-05-14 00:35:21 r.david.murray set messages: +
2015-05-14 00:32:33 python-dev set nosy: + python-devmessages: +
2015-05-13 12:08:49 martin.panter set files: + utf8-null.v5.patchmessages: +
2015-05-02 17:53:42 r.david.murray set messages: +
2015-03-31 05:49:03 serhiy.storchaka set nosy: + r.david.murraymessages: +
2015-03-31 04:11:11 martin.panter set files: + utf8-null.v4.patchmessages: +
2015-03-12 00:51:22 martin.panter set files: + utf8-null.v3.patch
2015-03-12 00:51:09 martin.panter set messages: +
2015-03-10 11:29:31 serhiy.storchaka set messages: +
2015-03-10 11:27:57 vstinner set messages: +
2015-03-10 11:16:40 serhiy.storchaka set messages: +
2015-03-10 10:47:52 martin.panter set files: + utf8-null.v2.patchmessages: +
2015-02-28 11:33:06 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2014-12-22 23:00:53 pitrou set versions: + Python 3.5nosy: + pitroumessages: + type: behaviorstage: commit review
2014-12-20 22:33:25 vstinner set nosy: + vstinner
2014-12-19 04:42:34 martin.panter create