Issue 19424: _warnings: patch to avoid conversions from/to UTF-8 (original) (raw)

Created on 2013-10-28 17:43 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
warnings.patch	vstinner,2013-10-28 17:43	review
remove_compiler_warning_pyunicode_compare_with_ascii_string.patch	vajrasky,2013-10-30 02:19	review
issue19424-windows-warnings.diff	zach.ware,2013-10-30 17:15	review

Messages (16)
msg201558 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-10-28 17:43
Attached patch removes usage of _PyUnicode_AsString() to not convert strings from/to UTF-8.
msg201655 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-29 18:53
I don't see a benefit from this patch. _PyUnicode_AsString() is very fast in most cases (because source lines are mostly ASCII only). On other hand, the patch replaces one-time _PyUnicode_AsString() by multiple less effective (even for ASCII strings) operations.
msg201657 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-10-29 19:19
> I don't see a benefit from this patch. Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character. >>> warnings.warn_explicit("text", RuntimeError, "filename", 5) filename:5: RuntimeError: text >>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later: PyObject *to_str = PyObject_Str(item); err_str = _PyUnicode_AsString(to_str); ... PyErr_Format(PyExc_RuntimeError, "...%s", err_str); Using "%R" avoids any encoding conversion.
msg201680 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-29 21:47
Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy().
msg201688 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-10-29 22:32
New changeset 34e166d60f37 by Victor Stinner in branch 'default': Issue #19424: Optimize PyUnicode_CompareWithASCIIString() http://hg.python.org/cpython/rev/34e166d60f37
msg201689 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-10-29 22:32
"Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy()." Such optimizations in warnings seem overkill, performances are not critical in this module. I optimized PyUnicode_CompareWithASCIIString() instead.
msg201691 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-10-29 22:44
New changeset c7326aa0b69c by Victor Stinner in branch 'default': Issue #19424: Fix the warnings module to accept filename containing surrogate http://hg.python.org/cpython/rev/c7326aa0b69c
msg201692 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-10-29 22:46
@Serhiy: Thanks for your review. I modified warnings.warn_explicit() to reject types different than str for the filename. I also removed the useless optimization for PyUnicode_Substring() when i=0.
msg201694 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-10-29 22:58
New changeset 05e8dde3229c by Victor Stinner in branch 'default': Issue #19424: Fix test_warnings for locale encoding unable to encode http://hg.python.org/cpython/rev/05e8dde3229c
msg201697 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-10-29 23:37
Oops, while fixing this issue, I found a new issue related to warnings: I opened issue #19442 "Python crashes when a warning is emitted during shutdown". This issue can now be fixed.
msg201703 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-10-30 02:19
Victor, I found out that this commit http://hg.python.org/cpython/rev/34e166d60f37 gives me compiler warning. Objects/unicodeobject.c: In function ‘PyUnicode_CompareWithASCIIString’: Objects/unicodeobject.c:10583:22: warning: pointer targets in initialization differ in signedness [-Wpointer-sign] gcc -pthread -c -Wno-unused-result -Werror=declaration-after-statement -g -O0 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE \ Attached the patch to remove the compiler warning.
msg201749 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2013-10-30 17:15
Adding to Vajrasky's report, the same commit also adds 3 warnings when building on Windows: ..\Objects\unicodeobject.c(10588): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(10592): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(10594): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] Vajrasky's patch doesn't fix the Windows warnings (but doesn't create any new ones either); the attached patch does (but likely doesn't do anything for the other). I don't know nearly enough C to say whether my patch is any good or not, just enough to say it compiles, doesn't break anything immediately obvious, and removes the warnings on Windows.
msg201752 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-10-30 17:27
New changeset 52ec6a3eeda5 by Victor Stinner in branch 'default': Issue #19424: Fix a compiler warning http://hg.python.org/cpython/rev/52ec6a3eeda5
msg202013 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2013-11-03 11:55
Py_ssize_t is signed long. size_it is unsigned long. In this case, I suppose we should avoid unsigned as much as possible in comparison with signed. So I think Zachary's patch is reasonable. What do you think, Victor?
msg202022 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-03 12:56
New changeset 2ed8d500e113 by Victor Stinner in branch 'default': Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t http://hg.python.org/cpython/rev/2ed8d500e113
msg202113 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-11-04 10:29
New changeset 494f736f5945 by Victor Stinner in branch 'default': Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result http://hg.python.org/cpython/rev/494f736f5945

History
Date	User	Action	Args
2022-04-11 14:57:52	admin	set	github: 63623
2013-11-04 10:29:09	python-dev	set	messages: +
2013-11-03 12:56:27	python-dev	set	messages: +
2013-11-03 11:55:25	vajrasky	set	messages: +
2013-10-30 17:27:24	python-dev	set	messages: +
2013-10-30 17:15:12	zach.ware	set	files: + issue19424-windows-warnings.diffnosy: + zach.waremessages: +
2013-10-30 02:19:29	vajrasky	set	files: + remove_compiler_warning_pyunicode_compare_with_ascii_string.patchnosy: + vajraskymessages: +
2013-10-29 23:38:04	Arfrever	set	stage: resolved
2013-10-29 23:37:08	vstinner	set	status: open -> closedresolution: fixedmessages: +
2013-10-29 22:58:35	python-dev	set	messages: +
2013-10-29 22:46:47	vstinner	set	messages: +
2013-10-29 22:44:44	python-dev	set	messages: +
2013-10-29 22:32:58	vstinner	set	messages: +
2013-10-29 22:32:22	python-dev	set	nosy: + python-devmessages: +
2013-10-29 21:47:00	serhiy.storchaka	set	messages: +
2013-10-29 19:19:01	vstinner	set	messages: +
2013-10-29 18:53:23	serhiy.storchaka	set	nosy: + serhiy.storchakamessages: +
2013-10-28 23:10:23	Arfrever	set	nosy: + Arfrever
2013-10-28 17:43:22	vstinner	create