Issue 24672: shutil.rmtree fails on non ascii filenames (original) (raw)

Created on 2015-07-20 08:40 by Steffen Kampmann, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg246971 - (view) Author: Steffen Kampmann (Steffen Kampmann) Date: 2015-07-20 08:40
I run python 2.7 on Windows 7 and the function rmtree of the shutil package fails to remove files with a non ascii filename: File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'H:\\ihre_perso\xa6\xeanlichen_Zugangsdaten600.jpg' Please let me know if i can help with something.
msg246973 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2015-07-20 08:48
Can you confirm whether it also fails if you pass in a unicode string? eg shutil.rmtree(u"filename.txt")
msg271661 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-07-30 03:46
I've confirmed the issue. It does indeed only occur if the string passed to rmtree is bytes. I discovered this during my investigation of https://github.com/cherrypy/cherrypy/issues/1467. The following script will replicate the failure on Windows systems on Python 2 and Python 3, but not on other operating systems: --- # encoding: utf-8 from __future__ import unicode_literals import os import shutil os.mkdir('temp') with open('temp/Слава Україні.html', 'w'): pass print(os.listdir(b'temp')[0]) shutil.rmtree(b'temp') --- The error on Python 2.7 is this: ????? ???????.html Traceback (most recent call last): File "C:\Users\jaraco\p\cherrypy\issue-1467.py", line 15, in shutil.rmtree(b'temp') File "C:\Program Files\Python27\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Program Files\Python27\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'temp\\????? ???????.html' This issue might be related to or or or or and probably others. It's not obvious to me browsing through those tickets why Windows should behave differently when a bytestring is passed to listdir. Perhaps I'll delve into those tickets in more depth.
msg271664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-30 04:55
See also . On Windows there are two sets of API: Unicode and bytes. File names are stored in Unicode (UTF-16) in modern filesystems and encoded to bytes by system for bytes API. Unfortunately this encoding is lossfull. Windows try to find the closest equivalent if the character is not encodable with current codepage (for example drops diacritics) and silently replaces it with "?" if can't find anything appropriate. We can't do anything with this from Python side except using Unicode API.
msg271666 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-30 06:22
Use Unicode on Python 3, it will work on all platforms. Problem solved :-)
msg271699 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-07-30 17:09
I agree. I was able to apply a fairly simple fix to setuptools to address the failure (https://github.com/pypa/setuptools/commit/857949575022946cc60c7cd1d0d088246d3f7540). I suggest closing this ticket as won't fix.
msg283707 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-12-20 19:14
I'm afraid I need to re-open this issue. Although passing unicode names to rmtree fixes the issue on Windows systems, it causes problems on Linux systems where LC_ALL=C. Consider this script: ################################# # encoding: utf-8 from __future__ import unicode_literals import os import shutil os.mkdir('temp') with open('temp/Слава Україні.html'.encode('utf-8'), 'w'): pass print(os.listdir(b'temp')[0]) shutil.rmtree('temp') ################################# Invoked thus, a UnicodeDecodeError occurs: vagrant@trusty:/vagrant$ LC_ALL=C python2.7 .py Слава Україні.html Traceback (most recent call last): File ".py", line 15, in shutil.rmtree('temp') File "/usr/lib/python2.7/shutil.py", line 241, in rmtree fullname = os.path.join(path, name) File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128) This is the same error seen trying to rmtree an extraction of Sphinx (a package containing an offending non-ascii character):: vagrant@trusty:/vagrant$ wget 'https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz' -O - | tar xz --2016-12-20 19:07:21-- https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz Resolving files.pythonhosted.org (files.pythonhosted.org)... 151.101.33.63 Connecting to files.pythonhosted.org (files.pythonhosted.org) 151.101.33.63 :443... connected. HTTP request sent, awaiting response... 200 OK Length: 4397246 (4.2M) [binary/octet-stream] Saving to: ‘STDOUT’ 100%[========================================================>] 4,397,246 2.06MB/s in 2.0s 2016-12-20 19:07:23 (2.06 MB/s) - written to stdout [4397246/4397246] vagrant@trusty:/vagrant$ LC_ALL=C python2.7 -c "import shutil; shutil.rmtree(u'Sphinx-1.5.1')" Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 241, in rmtree fullname = os.path.join(path, name) File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 8: ordinal not in range(128) Is the solution to call rmtree with unicode on Windows, but with bytes when on Python 2 and Linux? What else can be done?
msg283710 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-12-20 20:39
Lib/posixpath.py needs a huge amount of work to behave correctly for either bytes or Unicode paths. I don't know why Lib/ntpath.py is okay here, but the code is different so I suspect it just happens to not need the same conversion. Switching for each platform is probably the only way, unless you find someone willing to go through and make Unicode paths viable on Python 2.7 (this came up earlier today on one of the lists).
msg283776 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-12-21 19:48
In https://github.com/pypa/setuptools/issues/706, I've addressed this additional concern.
History
Date User Action Args
2022-04-11 14:58:19 admin set github: 68860
2016-12-21 19:48:45 jaraco set status: open -> closedresolution: wont fixmessages: +
2016-12-20 20:39:54 steve.dower set messages: +
2016-12-20 19:14:03 jaraco set status: closed -> openresolution: wont fix -> (no value)messages: +
2016-07-30 19:46:23 r.david.murray set status: open -> closedstage: resolvedresolution: wont fixversions: - Python 3.5, Python 3.6
2016-07-30 17:09:15 jaraco set messages: +
2016-07-30 06:22:00 vstinner set messages: +
2016-07-30 04:55:52 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2016-07-30 03:46:24 jaraco set nosy: + jaracotitle: shutil.rmtree failes on non ascii filenames -> shutil.rmtree fails on non ascii filenamesmessages: + versions: + Python 3.5, Python 3.6
2015-07-20 08:48:48 tim.golden set messages: +
2015-07-20 08:47:06 serhiy.storchaka set nosy: + paul.moore, vstinner, tim.golden, zach.ware, steve.dowercomponents: + Windows
2015-07-20 08:40:07 Steffen Kampmann create