Issue 13207: os.path.expanduser breaks when using unicode character in the username (original) (raw)

Created on 2011-10-18 11:34 by mandel, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
expanduser.py mandel,2011-10-18 11:34 Example of using the win api to expand the user.
Messages (5)
msg145798 - (view) Author: Manuel de la Pena (mandel) Date: 2011-10-18 11:34
During our development we have experience the following: If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system: * The Windows Shell will show the path correctly, that is: “C:\Users\雄鳥お人好し” * cmd.exe will show: “C:\Users\??????” * All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe The above is a problem because the implementation of expanduser in ntpath.py uses the env variables to get expand the path which means that in this case the returned path will be wrong. I have attached a small example of how to get the user profile path (~) on Windows using SHGetFolderPathW or SHGetKnownFolderPathW to fix the issue. PS: I don't know if this issue also occurs on python 3.
msg145819 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-18 15:53
On POSIX, Python 3 works correctly if my home dir is /tmp/éric, and Python 2.7 returns a UTF-8-encoded (not locale-encoded!) bytes string. For Windows, a patch would probably need to add a private function to the _nt module (in C): ctypes is too dangerous to be used in the standard library.
msg146450 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-10-26 18:32
Unicode environment vars work properly in Python 3.x on Windows, too, because the convertenviron() function in posixmodule.c uses extern _wenviron PyUnicode_FromWideChar() in Python 3.x. In Python 2.7, convertenviron() uses extern environ and PyString_FromString*().
msg146460 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-26 22:50
Python 2 uses byte strings. If characters are not encodable to the ANSI code page, Windows replaces them by question marks. See the issue #13247 for another example (in Python 3 when using explicitly the bytes API). To be able to support characters not encodable to the ANSI code page, you have to use Unicode *everywhere*. Because Python 2 doesn't have access to the Unicode environment and uses bytes in most cases, I don't think that we can fix this issue in Python 2. I close this issue because it would require too much work to fix this issue in Python 2, whereas it already works in Python 3. Move to Python 3 is the best solution of this issue.
msg263052 - (view) Author: Arkady “KindDragon” Shapkin (Arkady “KindDragon” Shapkin) Date: 2016-04-09 00:50
At least Python 2.7 should return in locale.getpreferredencoding() encoding
History
Date User Action Args
2022-04-11 14:57:22 admin set github: 57416
2016-04-09 00:50:26 Arkady “KindDragon” Shapkin set nosy: + Arkady “KindDragon” Shapkinmessages: +
2011-10-26 22:50:03 vstinner set status: open -> closedresolution: wont fixmessages: +
2011-10-26 18:32:52 santoso.wijaya set nosy: + santoso.wijayamessages: +
2011-10-26 04:30:43 ezio.melotti set nosy: + ezio.melotti
2011-10-18 15:53:36 eric.araujo set nosy: + vstinner, eric.araujomessages: +
2011-10-18 12:16:31 flox set nosy: + floxtitle: os.path.expanduser brakes when using unicode character in the username -> os.path.expanduser breaks when using unicode character in the username
2011-10-18 11:34:28 mandel create