msg201558 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-28 17:43 |
Attached patch removes usage of _PyUnicode_AsString() to not convert strings from/to UTF-8. |
|
|
msg201655 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-29 18:53 |
I don't see a benefit from this patch. _PyUnicode_AsString() is very fast in most cases (because source lines are mostly ASCII only). On other hand, the patch replaces one-time _PyUnicode_AsString() by multiple less effective (even for ASCII strings) operations. |
|
|
msg201657 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-29 19:19 |
> I don't see a benefit from this patch. Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character. >>> warnings.warn_explicit("text", RuntimeError, "filename", 5) filename:5: RuntimeError: text >>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later: PyObject *to_str = PyObject_Str(item); err_str = _PyUnicode_AsString(to_str); ... PyErr_Format(PyExc_RuntimeError, "...%s", err_str); Using "%R" avoids any encoding conversion. |
|
|
msg201680 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-29 21:47 |
Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy(). |
|
|
msg201688 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-29 22:32 |
New changeset 34e166d60f37 by Victor Stinner in branch 'default': Issue #19424: Optimize PyUnicode_CompareWithASCIIString() http://hg.python.org/cpython/rev/34e166d60f37 |
|
|
msg201689 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-29 22:32 |
"Well, it looks reasonable. But an action should be ASCII only string. So perhaps we should first check PyUnicode_IS_ASCII() and then use PyUnicode_1BYTE_DATA() and strcpy()." Such optimizations in warnings seem overkill, performances are not critical in this module. I optimized PyUnicode_CompareWithASCIIString() instead. |
|
|
msg201691 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-29 22:44 |
New changeset c7326aa0b69c by Victor Stinner in branch 'default': Issue #19424: Fix the warnings module to accept filename containing surrogate http://hg.python.org/cpython/rev/c7326aa0b69c |
|
|
msg201692 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-29 22:46 |
@Serhiy: Thanks for your review. I modified warnings.warn_explicit() to reject types different than str for the filename. I also removed the useless optimization for PyUnicode_Substring() when i=0. |
|
|
msg201694 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-29 22:58 |
New changeset 05e8dde3229c by Victor Stinner in branch 'default': Issue #19424: Fix test_warnings for locale encoding unable to encode http://hg.python.org/cpython/rev/05e8dde3229c |
|
|
msg201697 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-29 23:37 |
Oops, while fixing this issue, I found a new issue related to warnings: I opened issue #19442 "Python crashes when a warning is emitted during shutdown". This issue can now be fixed. |
|
|
msg201703 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-10-30 02:19 |
Victor, I found out that this commit http://hg.python.org/cpython/rev/34e166d60f37 gives me compiler warning. Objects/unicodeobject.c: In function ‘PyUnicode_CompareWithASCIIString’: Objects/unicodeobject.c:10583:22: warning: pointer targets in initialization differ in signedness [-Wpointer-sign] gcc -pthread -c -Wno-unused-result -Werror=declaration-after-statement -g -O0 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE \ Attached the patch to remove the compiler warning. |
|
|
msg201749 - (view) |
Author: Zachary Ware (zach.ware) *  |
Date: 2013-10-30 17:15 |
Adding to Vajrasky's report, the same commit also adds 3 warnings when building on Windows: ..\Objects\unicodeobject.c(10588): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(10592): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(10594): warning C4018: '>' : signed/unsigned mismatch [P:\Projects\OSS\Python\cpython\PCbuild\pythoncore.vcxproj] Vajrasky's patch doesn't fix the Windows warnings (but doesn't create any new ones either); the attached patch does (but likely doesn't do anything for the other). I don't know nearly enough C to say whether my patch is any good or not, just enough to say it compiles, doesn't break anything immediately obvious, and removes the warnings on Windows. |
|
|
msg201752 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-30 17:27 |
New changeset 52ec6a3eeda5 by Victor Stinner in branch 'default': Issue #19424: Fix a compiler warning http://hg.python.org/cpython/rev/52ec6a3eeda5 |
|
|
msg202013 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-11-03 11:55 |
Py_ssize_t is signed long. size_it is unsigned long. In this case, I suppose we should avoid unsigned as much as possible in comparison with signed. So I think Zachary's patch is reasonable. What do you think, Victor? |
|
|
msg202022 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-11-03 12:56 |
New changeset 2ed8d500e113 by Victor Stinner in branch 'default': Issue #19424: Fix a compiler warning on comparing signed/unsigned size_t http://hg.python.org/cpython/rev/2ed8d500e113 |
|
|
msg202113 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-11-04 10:29 |
New changeset 494f736f5945 by Victor Stinner in branch 'default': Issue #19424: PyUnicode_CompareWithASCIIString() normalizes memcmp() result http://hg.python.org/cpython/rev/494f736f5945 |
|
|