msg181465 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-05 17:32 |
Python 3.3 64-bit seems to compile one of my files incorrectly. Specifically, os.path.isdir returns True for a nonexistent folder. The important point is that the code works correctly when it is performed step-by-step in pdb. Python version: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32 OS: Windows 8 64-bit The code works fine in Python 3.2.3 32-bit on Windows XP. My project is quite complex and it interacts with other software packages. I tried to make a reduced test-case but I could not reproduce the problem this way. What files do you need for processing this bug report? Will e.g. the source file in question and the corresponding compiled file (*.pyc) be enough? Or should I upload the whole project here, along with the instructions on how to run it? |
|
|
msg181481 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-02-05 20:36 |
Reminds me of this question on StackOverflow: http://stackoverflow.com/questions/14135846/string-concatenation-with-python-33-isdir-always-returns-true-3-hours-head |
|
|
msg181482 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2013-02-05 20:55 |
The SO post is scary. Maybe a non-normalized (smallest representation) PEP393 string is escaping into the wild? |
|
|
msg181488 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-05 22:01 |
On Windows, os.path.isdir calls nt._isdir(). Core of this C function: wchar_t *wpath = PyUnicode_AsUnicode(po); if (wpath == NULL) return NULL; attributes = GetFileAttributesW(wpath); if (attributes == INVALID_FILE_ATTRIBUTES) Py_RETURN_FALSE; ... Can you please try to call directly nt._isdir()? Can also also compare with stat.S_ISDIR(os.stat(fn).st_mode)? If the problem is something with the implementation of Unicode, it would be interesting to try to get the content of the string using: * print(ascii(path.encode("unicode_internal"))) # should be result of PyUnicode_AsUnicode() which is cached * print(ascii(path.encode("utf-8"))) |
|
|
msg181509 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-06 12:24 |
Here is a part of my code (with some comments added): for struct in ini_structures: dirname = wrkdir+os.sep+struct.name if not os.path.isdir(dirname): # This works fine. If the directory doesn't exist,... try: os.mkdir(dirname) # ... it is created here. except OSError: raise AutoLEEDError('Cannot create directory "'+dirname+'".') dirname += os.sep+'bulk' # This defines a subdirectory. if not os.path.isdir(dirname): ## Though it doesn't exist, os.path.isdir returns True,... try: os.mkdir(dirname) # ... so it is not created here. except OSError: raise AutoLEEDError('Cannot create directory "'+dirname+'".') fn = dirname+os.sep+'cluster.i' # This defines a filename. print('Writing file "'+fn+'"...') straos = struct.write_bulk_cluster(fn,at_rad) # Here it fails (cannot write to file). According to Victor's post, I have inserted these lines before the line marked ## (and added necessary imports): print('dirname =', dirname) print('os.path.isdir(dirname) =', os.path.isdir(dirname)) print('nt._isdir(dirname) =', nt._isdir(dirname)) print('stat.S_ISDIR(os.stat(dirname).st_mode) =', stat.S_ISDIR(os.stat(dirname).st_mode)) print(ascii(dirname.encode("unicode_internal"))) print(ascii(dirname.encode("utf-8"))) Here is the output of these lines (that directory really does not exist but its parent directory does): dirname = D:\Bug reports\Python\AutoLEED\default\sub-fcc\bulk os.path.isdir(dirname) = True nt._isdir(dirname) = True stat.S_ISDIR(os.stat(dirname).st_mode) = True b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00A\x00u\x00t\x00o\x00L\x00E\x00E\x00D\x00\\\x00d\x00e\x00f\x00a\x00u\x00l\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00\x00\x002\x00\x03\x00\x00\x00\x00\x00' b'D:\\Bug reports\\Python\\AutoLEED\\default\\sub-fcc\\bulk' Yeah, the result of ascii(dirname.encode("unicode_internal")) seems to be wrong (at the end). |
|
|
msg181516 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-06 13:19 |
I'm interested by your struct.name string: can you also dump it? Where does it come from? Does it use ctypes? * print(ascii(struct.name)) * print(ascii(struct.name.encode("unicode_internal"))) * print(ascii(struct.name.encode("utf-8"))) I'm interested by all variables used to build the final path. nt._isdir() doesn't check if the path contains a NUL character. It should: see aksi #13617 > b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00A\x00u\x00t\x00o\x00L\x00E\x00E\x00D\x00\\\x00d\x00e\x00f\x00a\x00u\x00l\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00\x00\x002\x00\x03\x00\x00\x00\x00\x00' Decoded from UTF-16-LE, it gives: 'D:\\Bug reports\\Python\\AutoLEED\\default\\sub-fcc\x002\x03\x00\x00' > b'D:\\Bug reports\\Python\\AutoLEED\\default\\sub-fcc\\bulk' Decode from UTF-8, it gives: 'D:\\Bug reports\\Python\\AutoLEED\\default\\sub-fcc\\bulk' It looks like the wstr representation of the string is corrupted. |
|
|
msg181538 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-06 16:07 |
print(ascii(struct.name)) print(ascii(struct.name.encode("unicode_internal"))) print(ascii(struct.name.encode("utf-8"))) produces: 'sub-fcc' b's\x00u\x00b\x00-\x00f\x00c\x00c\x00' b'sub-fcc' and that looks correct. struct.name originally comes from an ini-file: cp = configparser.ConfigParser(interpolation=None) try: cp.read(filename) ... The ini-file is encoded in pure ASCII (while my Python sources are in UTF-8 with the identification bytes at the beginning of the file). struct.name is the name of a section in this file, as provided by cp.sections() . The name gets through several objects. I am not pasting all the relevant code pieces here because there are too many relevant pieces but they do nothing special (just passing and copying the name). I do not use ctypes. wrkdir is generated from inp_file_name, which is 'default.ini', by this statement: wrkdir = os.path.splitext(os.path.abspath(inp_file_name))[0] BTW, ascii(dirname.encode("unicode_internal")) result is different in this run: b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00A\x00u\x00t\x00o\x00L\x00E\x00E\x00D\x00\\\x00d\x00e\x00f\x00a\x00u\x00l\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' |
|
|
msg181575 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-06 21:45 |
It would really help if you can write a short script reproducing the problem. Can you reproduce the problem with Python 3.2 on Windows 8, or with Python 3.3 on Windows XP or 7? |
|
|
msg181596 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-07 12:26 |
Knowing that the problem is related to the internal representation of the strings, I have written a short script which reproduces the problem. It is this simple: import os name = 'sub-fcc' wrkdir = 'D:\\Bug reports\\Python\\test' dirname = wrkdir+os.sep+name print(dirname) print(ascii(dirname.encode("unicode_internal"))) dirname += os.sep+'bulk' print(dirname) print(ascii(dirname.encode("unicode_internal"))) Output: D:\Bug reports\Python\test\sub-fcc b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00t\x00e\x00s\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00' D:\Bug reports\Python\test\sub-fcc\bulk b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00t\x00e\x00s\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00\x00\x00\x00\x00\xd8\xa3\x90\x02\x00\x00' The end of the output varies from run to run. It works correctly in Python 3.2.3 64-bit on Windows 8. |
|
|
msg181602 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-07 13:11 |
> It works correctly in Python 3.2.3 64-bit on Windows 8. Can you reproduce the issue on other Windows versions? 2013/2/7 Jan Lachnitt <report@bugs.python.org>: > > Jan Lachnitt added the comment: > > Knowing that the problem is related to the internal representation of the strings, I have written a short script which reproduces the problem. It is this simple: > > import os > name = 'sub-fcc' > wrkdir = 'D:\\Bug reports\\Python\\test' > dirname = wrkdir+os.sep+name > print(dirname) > print(ascii(dirname.encode("unicode_internal"))) > dirname += os.sep+'bulk' > print(dirname) > print(ascii(dirname.encode("unicode_internal"))) > > Output: > > D:\Bug reports\Python\test\sub-fcc > b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00t\x00e\x00s\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00' > D:\Bug reports\Python\test\sub-fcc\bulk > b'D\x00:\x00\\\x00B\x00u\x00g\x00 \x00r\x00e\x00p\x00o\x00r\x00t\x00s\x00\\\x00P\x00y\x00t\x00h\x00o\x00n\x00\\\x00t\x00e\x00s\x00t\x00\\\x00s\x00u\x00b\x00-\x00f\x00c\x00c\x00\x00\x00\x00\x00\xd8\xa3\x90\x02\x00\x00' > > The end of the output varies from run to run. > > It works correctly in Python 3.2.3 64-bit on Windows 8. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue17137> > _______________________________________ |
|
|
msg181611 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-07 14:38 |
Can you try to following command to get the size in bytes of the wchar_t type? >>> import types >>> ctypes.sizeof(ctypes.c_wchar) 4 You can also use _PyObject_Dump() to dump your string: >>> import ctypes >>> x="abc" >>> _PyObject_Dump=ctypes.pythonapi._PyObject_Dump >>> _PyObject_Dump.argtypes=(ctypes.py_object,) >>> _PyObject_Dump(x) object : 'abc' type : str refcount: 5 address : 0xb70bf980 48 Then you can use _PyObject_Dump() on your string. You may also try: print(list(dirname)). It's really strange that something very common like string concatenation returns an invalid string. |
|
|
msg181617 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-02-07 15:04 |
I don't think this is just windows; I see similarly odd results on OS X. The first encode call gives expected results; the second ends in garbage. Python 3.4.0a0 (default:eb0370d4686c+, Feb 7 2013, 14:59:41) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> dir1 = "D:\\Bug reports\\Python\\test\\sub-fcc" [66291 refs, 23475 blocks] >>> dir1 += "\\bulk" [66291 refs, 23474 blocks] >>> ascii(dir1.encode('unicode_internal')) "b'D\\x00\\x00\\x00:\\x00\\x00\\x00\\\\\\x00\\x00\\x00B\\x00\\x00\\x00u\\x00\\x00\\x00g\\x00\\x00\\x00 \\x00\\x00\\x00r\\x00\\x00\\x00e\\x00\\x00\\x00p\\x00\\x00\\x00o\\x00\\x00\\x00r\\x00\\x00\\x00t\\x00\\x00\\x00s\\x00\\x00\\x00\\\\\\x00\\x00\\x00P\\x00\\x00\\x00y\\x00\\x00\\x00t\\x00\\x00\\x00h\\x00\\x00\\x00o\\x00\\x00\\x00n\\x00\\x00\\x00\\\\\\x00\\x00\\x00t\\x00\\x00\\x00e\\x00\\x00\\x00s\\x00\\x00\\x00t\\x00\\x00\\x00\\\\\\x00\\x00\\x00s\\x00\\x00\\x00u\\x00\\x00\\x00b\\x00\\x00\\x00-\\x00\\x00\\x00f\\x00\\x00\\x00c\\x00\\x00\\x00c\\x00\\x00\\x00\\\\\\x00\\x00\\x00b\\x00\\x00\\x00u\\x00\\x00\\x00l\\x00\\x00\\x00k\\x00\\x00\\x00'" [69015 refs, 24925 blocks] >>> dir1 += "\\bulk" [69015 refs, 24925 blocks] >>> ascii(dir1.encode('unicode_internal')) "b'D\\x00\\x00\\x00:\\x00\\x00\\x00\\\\\\x00\\x00\\x00B\\x00\\x00\\x00u\\x00\\x00\\x00g\\x00\\x00\\x00 \\x00\\x00\\x00r\\x00\\x00\\x00e\\x00\\x00\\x00p\\x00\\x00\\x00o\\x00\\x00\\x00r\\x00\\x00\\x00t\\x00\\x00\\x00s\\x00\\x00\\x00\\\\\\x00\\x00\\x00P\\x00\\x00\\x00y\\x00\\x00\\x00t\\x00\\x00\\x00h\\x00\\x00\\x00o\\x00\\x00\\x00n\\x00\\x00\\x00\\\\\\x00\\x00\\x00t\\x00\\x00\\x00e\\x00\\x00\\x00s\\x00\\x00\\x00t\\x00\\x00\\x00\\\\\\x00\\x00\\x00s\\x00\\x00\\x00u\\x00\\x00\\x00b\\x00\\x00\\x00-\\x00\\x00\\x00f\\x00\\x00\\x00c\\x00\\x00\\x00c\\x00\\x00\\x00\\\\\\x00\\x00\\x00b\\x00\\x00\\x00u\\x00\\x00\\x00l\\x00\\x00\\x00k\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xfb\\xfb\\xfb\\xfb\\xfb\\xfb\\xfb\\xfb\\x00\\x00\\x00\\x00\\x00\\x015\\x1a'" [69015 refs, 24925 blocks] |
|
|
msg181619 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-07 15:19 |
On Windows XP 32-bit: 3.2.3 works, 3.3.0 fails. |
|
|
msg181621 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2013-02-07 15:41 |
Confirmed on OSX 64bits with Mark's sample. $ python3.3 Python 3.3.0 (default, Jan 24 2013, 08:28:09) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> dir1 = "D:\\Bug reports\\Python\\test\\sub-fcc" >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> len(s1), s1[499:] (586, "00k\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xf0\\xbda\\x00\\x01\\x00\\x00\\x00X\\x1da\\x00\\x01\\x00\\x00\\x00'") >>> dir1 = "D:\\Bug reports\\Python\\test\\sub-fcc" >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> len(s1), s1[499:] (586, "00k\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\xbca\\x00\\x01\\x00\\x00\\x00X\\x16a\\x00\\x01\\x00\\x00\\x00'") >>> dir1 = "D:\\Bug reports\\Python\\test\\sub-fcc" >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> len(s1), s1[499:] (595, "00k\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'") >>> dir1 = "D:\\Bug reports\\Python\\test\\sub-fcc" >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> dir1 += "\\bulk" >>> s1 = ascii(dir1.encode('unicode_internal')) >>> len(s1), s1[499:] (586, "00k\\x00\\x00\\x00\\x00\\x00\\x00\\x00p\\xbba\\x00\\x01\\x00\\x00\\x00\\x88\\x14a\\x00\\x01\\x00\\x00\\x00'") >>> Darwin Kernel Version 10.8.0: Tue Jun 7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64 |
|
|
msg181622 - (view) |
Author: Jan Lachnitt (pepalogik) |
Date: 2013-02-07 15:44 |
... print(ctypes.sizeof(ctypes.c_wchar)) _PyObject_Dump=ctypes.pythonapi._PyObject_Dump _PyObject_Dump.argtypes=(ctypes.py_object,) print(_PyObject_Dump(dirname)) print(list(dirname)) in Python 3.3.0 64-bit on Windows 8 produces: 2 object : 'D:\\Bug reports\\Python\\test\\sub-fcc\\bulk' type : str refcount: 3 address : 00000000028AC298 54 ['D', ':', '\\', 'B', 'u', 'g', ' ', 'r', 'e', 'p', 'o', 'r', 't', 's', '\\', 'P', 'y', 't', 'h', 'o', 'n', '\\', 't', 'e', 's', 't', '\\', 's', 'u', 'b', '-', 'f', 'c', 'c', '\\', 'b', 'u', 'l', 'k'] |
|
|
msg181625 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-07 16:19 |
Ok, it's a bug in the function resize a compact Unicode string, resize_compact(): wstr field is not updated to the new size. Attached patch should fix it. The bug was introduced by me in Python 3.3. I don't think that it's possible to resize wstr buffer instead of freeing it: it will not be refilled by PyUnicode_AsUnicodeAndSize() if wstr is not NULL. An alternative is to create a new string (instead of using realloc) if wstr is not NULL. 2013/2/7 Florent Xicluna <report@bugs.python.org>: > > Changes by Florent Xicluna <florent.xicluna@gmail.com>: > > > ---------- > components: -Windows > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue17137> > _______________________________________ |
|
|
msg181626 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-07 16:23 |
@Jan Lachnitt: Thanks for your patience and having executed all my commands :-) Thanks for the short script reproducing the issue. |
|
|
msg181645 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-02-07 21:29 |
The "import random" isn't needed in your patch. |
|
|
msg181648 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-02-07 22:18 |
New changeset 3b316ea5aa82 by Victor Stinner in branch '3.3': Issue #17137: When an Unicode string is resized, the internal wide character http://hg.python.org/cpython/rev/3b316ea5aa82 New changeset c10a3ddba483 by Victor Stinner in branch 'default': (Merge 3.3) Issue #17137: When an Unicode string is resized, the internal wide http://hg.python.org/cpython/rev/c10a3ddba483 |
|
|
msg182687 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-02-22 19:02 |
Shouldn't this issue be closed now? |
|
|
msg182704 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-02-22 22:09 |
> Shouldn't this issue be closed now? Correct, I forgot to close it. |
|
|