Issue 846133: os.chmod/os.utime/shutil do not work with unicode filenames (original) (raw)

Issue846133

Created on 2003-11-20 21:27 by meyeet, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
q UŒ‹.txt meyeet,2003-11-20 21:30 filename with kanji characters
unicode_filenames.patch mhammond,2003-11-28 09:58 Patch, as discussed
Messages (13)
msg19050 - (view) Author: Eric Meyer (meyeet) Date: 2003-11-20 21:27
I have a filename that contains Kanji characters and I'm trying change the permissions on the file. I am running Python 2.3.1 on Windows 2000. Also I have the japanese language pack installed so that I can view the kanji characters in Windows explorer. >>> part u'\u5171\u6709\u3055\u308c\u308b.txt' >>> os.chmod(part, 0777) Traceback (most recent call last): File "", line 1, in ? OSError: [Errno 22] Invalid argument: '?????.txt' >>> I attached the above named file for you to test against. Thanks.
msg19051 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2003-11-21 00:07
Logged In: YES user_id=671362 I'm running Python in almost the same environment. I guess this results from the different bihavior of u'' and unicode(''). If you convert a multi-byte character to a unicode character, u'' and unicode('') don't return the same string. unicode'' works as intended but u'' doesn't. This is probably caused by the bug of Japanese codecs package. Eric, please try the session below and tell me what happens. NOTE: Japanese codecs needs to be installed to test the code below. Otherwise, UnicodeDecodeError will be raised. --- >>> import os >>> os.listdir('.') [] >>> lst = ['\x82', '\xa0'] # japanese character >>> u1 = unicode('\x82\xa0') >>> u2 = u'\x82\xa0' >>> u1 == u2 False >>> u1, u2 (u'\u3042', u'\x82\xa0') # u2 is odd >>> print >> file(u1, 'w'), "hello world" >>> os.listdir('.') ['B'] >>> os.chmod(u1, 0777) >>> os.chmod(u2, 0777) Traceback (most recent call last): File "<pyshell#179>", line 1, in -toplevel- os.chmod(u2, 0777) OSError: [Errno 22] Invalid argument: '??'
msg19052 - (view) Author: Eric Meyer (meyeet) Date: 2003-11-21 16:18
Logged In: YES user_id=913976 George, I tried the following but I had to specify one of the japanese codecs during the unicode() call. What is your default encoding set to? Below are my results. >>> import os >>> os.listdir('.') [] >>> u1 = unicode('\x82\xa0', 'cp932') >>> u2 = u'\x82\xa0' >>> u1, u2 (u'\u3042', u'\x82\xa0') >>> print >> file(u1, 'w'), "hello world" >>> os.listdir('.') ['?'] >>> os.chmod(u1, 0777) Traceback (most recent call last): File "", line 1, in ? OSError: [Errno 22] Invalid argument: '?'
msg19053 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2003-11-22 00:51
Logged In: YES user_id=671362 Hi, Eric. My previous post was maybe wrong. This is the problem of os.chmod. I've confirmed two kinds of exceptions are raised when using os.chmod for unicode filenames. The first one is [Errno 22] Invalid argument. You can read/write a file but cannot use os.chmod. The second one is [Errno 2] No such file or directory. Although there exists a file, Python complains "No such file or directory" test.test_codecs has a bunch of international unicode characters, so I borrowed them for testing. >>> import os >>> from test.test_codecs import punycode_testcases >>> def unicode_test(name): try: f = file(name, 'w') f.close() except IOError, e: print e return try: os.chmod(name, 0777) except OSError, e: print e >>> for i, (uni, puny) in enumerate (punycode_testcases): print i unicode_test(uni) I ran this script on Windows 2000(Japanese edition) using Python 2.3 and got "[Errno 22]" for 0,1,2,3,4,5,7,10 and "[Errno 2]" for 9.
msg19054 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-11-24 22:21
Logged In: YES user_id=21627 If you look at the source of os.chmod, it is not at all surprising that it does not work for characters outside the file system encoding: it is simply not implemented. Patches are welcome.
msg19055 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-11-28 09:58
Logged In: YES user_id=14198 I opened http://www.python.org/sf/846133 regarding os.utime, which I found via the "shutil" module, via SpamBayes, also on a Japanese system (see that bug for details), but then I saw this and decided to tackle them both. I rolled my fix for that in with a fix for chmod. I also hacked the test suite radically: * Creation of a test_support.TESTFN_UNICODE_UNENCODEABLE variable, which is a Unicode string that can *not* be encoded using the file system encoding. This will cause functions with 'encoding' support but without Unicode support (such as utime/chmod) to fail. * Made functions of all the test cases, so more combinations of unicode/encoded can be tested. Many are redundant, but that is OK. * Added shutil tests of the filenames * While I was there, converted to a unittest test. The new test case blows up with a couple of errors before the posixmodule patch is applied, and passes after. Note that shutil.move/copy etc can not handle being passed one string and one unicode arg, and therefore this combination is skipped. I'd like any opinions on whether this is a bug in shutil or not. Also note that the new comment in test_support.py regarding a potential bug in the 'mbcs' encoding - it appears as if it always works as though errors=ignore. Comments/reviews?
msg19056 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-11-29 01:32
Logged In: YES user_id=14198 I created www.python.org/sf/850997 about the MBCS encoding issue.
msg19057 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-12-01 21:39
Logged In: YES user_id=21627 The patches to posixmodule.c are fine for both 2.3 and 2.4. Can you apply them before 2.3.3 is frozen? The patches to the test suite are fine for 2.4 only, and they probably need to be relaxed. For example, on OSX, there simply is no file name that fails to work for the normal file system API: the file system encoding is UTF-8, so it supports all file names. You should consider changing test_pep277.py instead.
msg19058 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2003-12-03 01:33
Logged In: YES user_id=14198 release23-maint: Checking in posixmodule.c; new revision: 2.300.8.5; previous revision: 2.300.8.4 trunk: Checking in posixmodule.c; new revision: 2.309; previous revision: 2.308 Checking in test_support.py; new revision: 1.59; previous revision: 1.58 Checking in test_unicode_file.py; new revision: 1.11; previous revision: 1.10 Removing output/test_unicode_file; new revision: delete; previous revision: 1.1
msg19059 - (view) Author: Eric Meyer (meyeet) Date: 2003-12-03 19:16
Logged In: YES user_id=913976 Is there an approximate date (or month) when 2.3.3 is likely to be released?
msg19060 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-12-03 19:21
Logged In: YES user_id=31435 meyeet, 2.3.3 should be released this month (December). Mark, I reopened this, because test_unicode_filename fails on Win98SE now (see Python-Dev report; that was on the trunk; I don't know about 2.3 maint).
msg19061 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-12-04 07:18
Logged In: YES user_id=21627 2.3 maint should be fine: the problems are more likely in the new test cases than in the code itself.
msg19062 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2004-05-05 12:26
Logged In: YES user_id=14198 I'm fairly sure this has been nailed (including the test failure) for some time?
History
Date User Action Args
2022-04-11 14:56:01 admin set github: 39572
2003-11-20 21:27:12 meyeet create