msg246971 - (view) |
Author: Steffen Kampmann (Steffen Kampmann) |
Date: 2015-07-20 08:40 |
I run python 2.7 on Windows 7 and the function rmtree of the shutil package fails to remove files with a non ascii filename: File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'H:\\ihre_perso\xa6\xeanlichen_Zugangsdaten600.jpg' Please let me know if i can help with something. |
|
|
msg246973 - (view) |
Author: Tim Golden (tim.golden) *  |
Date: 2015-07-20 08:48 |
Can you confirm whether it also fails if you pass in a unicode string? eg shutil.rmtree(u"filename.txt") |
|
|
msg271661 - (view) |
Author: Jason R. Coombs (jaraco) *  |
Date: 2016-07-30 03:46 |
I've confirmed the issue. It does indeed only occur if the string passed to rmtree is bytes. I discovered this during my investigation of https://github.com/cherrypy/cherrypy/issues/1467. The following script will replicate the failure on Windows systems on Python 2 and Python 3, but not on other operating systems: --- # encoding: utf-8 from __future__ import unicode_literals import os import shutil os.mkdir('temp') with open('temp/Слава Україні.html', 'w'): pass print(os.listdir(b'temp')[0]) shutil.rmtree(b'temp') --- The error on Python 2.7 is this: ????? ???????.html Traceback (most recent call last): File "C:\Users\jaraco\p\cherrypy\issue-1467.py", line 15, in shutil.rmtree(b'temp') File "C:\Program Files\Python27\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Program Files\Python27\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'temp\\????? ???????.html' This issue might be related to or or or or and probably others. It's not obvious to me browsing through those tickets why Windows should behave differently when a bytestring is passed to listdir. Perhaps I'll delve into those tickets in more depth. |
|
|
msg271664 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-07-30 04:55 |
See also . On Windows there are two sets of API: Unicode and bytes. File names are stored in Unicode (UTF-16) in modern filesystems and encoded to bytes by system for bytes API. Unfortunately this encoding is lossfull. Windows try to find the closest equivalent if the character is not encodable with current codepage (for example drops diacritics) and silently replaces it with "?" if can't find anything appropriate. We can't do anything with this from Python side except using Unicode API. |
|
|
msg271666 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-07-30 06:22 |
Use Unicode on Python 3, it will work on all platforms. Problem solved :-) |
|
|
msg271699 - (view) |
Author: Jason R. Coombs (jaraco) *  |
Date: 2016-07-30 17:09 |
I agree. I was able to apply a fairly simple fix to setuptools to address the failure (https://github.com/pypa/setuptools/commit/857949575022946cc60c7cd1d0d088246d3f7540). I suggest closing this ticket as won't fix. |
|
|
msg283707 - (view) |
Author: Jason R. Coombs (jaraco) *  |
Date: 2016-12-20 19:14 |
I'm afraid I need to re-open this issue. Although passing unicode names to rmtree fixes the issue on Windows systems, it causes problems on Linux systems where LC_ALL=C. Consider this script: ################################# # encoding: utf-8 from __future__ import unicode_literals import os import shutil os.mkdir('temp') with open('temp/Слава Україні.html'.encode('utf-8'), 'w'): pass print(os.listdir(b'temp')[0]) shutil.rmtree('temp') ################################# Invoked thus, a UnicodeDecodeError occurs: vagrant@trusty:/vagrant$ LC_ALL=C python2.7 .py Слава Україні.html Traceback (most recent call last): File ".py", line 15, in shutil.rmtree('temp') File "/usr/lib/python2.7/shutil.py", line 241, in rmtree fullname = os.path.join(path, name) File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128) This is the same error seen trying to rmtree an extraction of Sphinx (a package containing an offending non-ascii character):: vagrant@trusty:/vagrant$ wget 'https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz' -O - | tar xz --2016-12-20 19:07:21-- https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz Resolving files.pythonhosted.org (files.pythonhosted.org)... 151.101.33.63 Connecting to files.pythonhosted.org (files.pythonhosted.org) |
151.101.33.63 |
:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4397246 (4.2M) [binary/octet-stream] Saving to: ‘STDOUT’ 100%[========================================================>] 4,397,246 2.06MB/s in 2.0s 2016-12-20 19:07:23 (2.06 MB/s) - written to stdout [4397246/4397246] vagrant@trusty:/vagrant$ LC_ALL=C python2.7 -c "import shutil; shutil.rmtree(u'Sphinx-1.5.1')" Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 241, in rmtree fullname = os.path.join(path, name) File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 8: ordinal not in range(128) Is the solution to call rmtree with unicode on Windows, but with bytes when on Python 2 and Linux? What else can be done? |
msg283710 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-12-20 20:39 |
Lib/posixpath.py needs a huge amount of work to behave correctly for either bytes or Unicode paths. I don't know why Lib/ntpath.py is okay here, but the code is different so I suspect it just happens to not need the same conversion. Switching for each platform is probably the only way, unless you find someone willing to go through and make Unicode paths viable on Python 2.7 (this came up earlier today on one of the lists). |
|
|
msg283776 - (view) |
Author: Jason R. Coombs (jaraco) *  |
Date: 2016-12-21 19:48 |
In https://github.com/pypa/setuptools/issues/706, I've addressed this additional concern. |
|
|