cpython: 863f7c57081b (original) (raw)
Mercurial > cpython
changeset 96042:863f7c57081b
Merge: #23088: Clarify null termination of bytes and strings in C API. [#23088]
R David Murray rdmurray@bitdance.com | |
---|---|
date | Wed, 13 May 2015 20:32:19 -0400 |
parents | 1e1bb3eb6f93(current diff)99d2f83290c0(diff) |
children | d56a941865fb |
files | Doc/c-api/unicode.rst |
diffstat | 3 files changed, 44 insertions(+), 31 deletions(-)[+] [-] Doc/c-api/bytearray.rst 3 Doc/c-api/bytes.rst 32 Doc/c-api/unicode.rst 40 |
line wrap: on
line diff
--- a/Doc/c-api/bytearray.rst +++ b/Doc/c-api/bytearray.rst @@ -64,7 +64,8 @@ Direct API functions .. c:function:: char* PyByteArray_AsString(PyObject *bytearray) Return the contents of bytearray as a char array after checking for a
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
--- a/Doc/c-api/bytes.rst
+++ b/Doc/c-api/bytes.rst
@@ -69,8 +69,8 @@ called with a non-bytes parameter.
+===================+===============+================================+
| :attr:%%
| n/a | The literal % character. |
+-------------------+---------------+--------------------------------+
- | :attr:
%c
| int | A single byte, | - | | | represented as a C int. |
+-------------------+---------------+--------------------------------+
| :attr:
%d
| int | Exactly equivalent to | | | |printf("%d")
. | @@ -109,7 +109,7 @@ called with a non-bytes parameter. +-------------------+---------------+--------------------------------+ An unrecognized format character causes all the rest of the format string to be
.. c:function:: PyObject* PyBytes_FromFormatV(const char format, va_list vargs) @@ -136,11 +136,13 @@ called with a non-bytes parameter. .. c:function:: char PyBytes_AsString(PyObject *o)
- Return a NUL-terminated representation of the contents of o. The pointer
- refers to the internal buffer of o, not a copy. The data must not be
- modified in any way, unless the string was just created using
- Return a pointer to the contents of o. The pointer
- refers to the internal buffer of o, which consists of
len(o) + 1
- bytes. The last byte in the buffer is always null, regardless of
- whether there are any other null bytes. The data must not be
- modified in any way, unless the object was just created using
PyBytes_FromStringAndSize(NULL, size)
. It must not be deallocated. If
@@ -151,16 +153,18 @@ called with a non-bytes parameter. .. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
- Return the null-terminated contents of the object obj through the output variables buffer and length.
- If length is NULL, the bytes object
- may not contain embedded null bytes;
if it does, the function returns
-1
and a :exc:TypeError
is raised.
- The buffer refers to an internal string buffer of obj, not a copy. The data
- must not be modified in any way, unless the string was just created using
- The buffer refers to an internal buffer of obj, which includes an
- additional null byte at the end (not counted in length). The data
- must not be modified in any way, unless the object was just created using
PyBytes_FromStringAndSize(NULL, size)
. It must not be deallocated. If
- obj is not a bytes object at all, :c:func:
PyBytes_AsStringAndSize
returns-1
and raises :exc:TypeError
.
@@ -168,14 +172,14 @@ called with a non-bytes parameter. Create a new bytes object in *bytes containing the contents of newpart appended to bytes; the caller will own the new reference. The reference to
- the old value of bytes will be stolen. If the new object cannot be created, the old reference to bytes will still be discarded and the value of *bytes will be set to NULL; the appropriate exception will be set.
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
- Create a new bytes object in *bytes containing the contents of newpart appended to bytes. This version decrements the reference count of newpart.
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -227,7 +227,10 @@ access internal read-only data of Unicod
const char* PyUnicode_AS_DATA(PyObject *o)
Return a pointer to a :c:type:Py_UNICODE
representation of the object. The
- returned buffer is always terminated with an extra null code point. It
- may also contain embedded null code points, which would cause the string
- to be truncated when used in most C functions. The
AS_DATA
form - casts the pointer to :c:type:
const char *
. The o argument has to be a Unicode object (not checked). .. versionchanged:: 3.3 @@ -650,7 +653,8 @@ APIs: Copy the string u into a new UCS4 buffer that is allocated using :c:func:PyMem_Malloc
. If this fails, NULL is returned with a
- :exc:
MemoryError
set. The returned buffer always has an extra - null code point appended. .. versionadded:: 3.3
@@ -689,8 +693,9 @@ 3.x, but need to be aware that their use
Return a read-only pointer to the Unicode object's internal
:c:type:Py_UNICODE
buffer, or NULL on error. This will create the
:c:type:Py_UNICODE*
representation of the object if it is not yet
- available. Note that the resulting :c:type:
Py_UNICODE
string may contain - embedded null characters, which would cause the string to be truncated when
- available. The buffer is always terminated with an extra null code point.
- Note that the resulting :c:type:
Py_UNICODE
string may also contain - embedded null code points, which would cause the string to be truncated when
used in most C functions.
Please migrate to using :c:func:
PyUnicode_AsUCS4
, @@ -708,8 +713,9 @@ 3.x, but need to be aware that their use .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) Like :c:func:PyUnicode_AsUnicode
, but also saves the :c:func:Py_UNICODE
- array length in size. Note that the resulting :c:type:
Py_UNICODE*
string - may contain embedded null characters, which would cause the string to be
- array length (excluding the extra null terminator) in size.
- Note that the resulting :c:type:
Py_UNICODE*
string - may contain embedded null code points, which would cause the string to be truncated when used in most C functions. .. versionadded:: 3.3 @@ -717,11 +723,11 @@ 3.x, but need to be aware that their use
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
- Create a copy of a Unicode string ending with a null code point. Return NULL
and raise a :exc:
MemoryError
exception on memory allocation failure, otherwise return a new allocated buffer (use :c:func:PyMem_Free
to free the buffer). Note that the resulting :c:type:Py_UNICODE*
string may
- contain embedded null code points, which would cause the string to be
truncated when used in most C functions.
.. versionadded:: 3.2
@@ -902,10 +908,10 @@ wchar_t Support
Copy the Unicode object contents into the :c:type:
wchar_t
buffer w. At most size :c:type:wchar_t
characters are copied (excluding a possibly trailing
- null termination character). Return the number of :c:type:
wchar_t
characters copied or -1 in case of an error. Note that the resulting :c:type:wchar_t*
- string may or may not be 0-terminated. It is the responsibility of the caller
- to make sure that the :c:type:
wchar_t*
string is 0-terminated in case this is
- string may or may not be null-terminated. It is the responsibility of the caller
- to make sure that the :c:type:
wchar_t*
string is null-terminated in case this is required by the application. Also, note that the :c:type:wchar_t*
string might contain null characters, which would cause the string to be truncated when used with most C functions. @@ -914,8 +920,8 @@ wchar_t Support .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) Convert the Unicode object to a wide character string. The output string
- always ends with a nul character. If size is not NULL, write the number
- of wide characters (excluding the trailing 0-termination character) into
- always ends with a null character. If size is not NULL, write the number
- of wide characters (excluding the trailing null termination character) into
*size.
Returns a buffer allocated by :c:func:
PyMem_Alloc
(use @@ -1045,9 +1051,11 @@ These are the UTF-8 codec APIs:
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
- Return a pointer to the default encoding (UTF-8) of the Unicode object, and
- store the size of the encoded representation (in bytes) in size. size
- can be NULL, in this case no size will be stored.
- Return a pointer to the UTF-8 encoding of the Unicode object, and
- store the size of the encoded representation (in bytes) in size. The
- size argument can be NULL; in this case no size will be stored. The
- returned buffer always has an extra null byte appended (not included in
- size), regardless of whether there are any other null code points. In the case of an error, NULL is returned with an exception set and no size is stored.