[Python-Dev] tarfile and unicode filenames in windows (original) (raw)

Facundo Batista facundobatista at gmail.com
Thu Jun 8 21:11:06 CEST 2006


I'm working in Windows 2K SP4. I have a directory with non-ascii names (i.e.: "camión.txt").

I'm trying to tar.bzip it:

nomdir = sys.argv[1]
tar = tarfile.open("prueba.tar.bz2", "w:bz2")
tar.add(nomdir)
tar.close()

This works ok, even considering that the "ó" in the filename is not ascii 7-bits.

But then I put a file in that directory that has a more strange name (one with an "o" and a dash above it): Myō-ō.txt

Here, the tarfile can't find the file. This is the same limitation that with listdir(), where I have to pass the directory name unicoded, to the system be able to find it. So:

nomdir = unicode(sys.argv[1])
tar = tarfile.open("prueba.tar.bz2", "w:bz2")
tar.add(nomdir)
tar.close()

The problem is that when tarfile finds that name, it crashes:

Traceback (most recent call last): File "comprim.py", line 8, in ? tar.add(nomdir) File "C:\python24\lib\tarfile.py", line 1239, in add self.add(os.path.join(name, f), os.path.join(arcname, f)) File "C:\python24\lib\tarfile.py", line 1232, in add self.addfile(tarinfo, f) File "C:\python24\lib\tarfile.py", line 1297, in addfile self.fileobj.write(tarinfo.tobuf()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 8: ordinal not in range(128)

This is because tarinfo.tobuf() creates a unicode object (because it has the filename on it), and file.write() must have a standard string.

This is a known problem? Shall I post a bug? Couldn't find any regarding this, and google didn't help here.

Thank you very much!

-- . Facundo

Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/



More information about the Python-Dev mailing list