Issue 10187: exec encode unicode to utf-8 str automatically in GBK environment (original) (raw)

Created on 2010-10-24 03:04 by wjm251, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test.py	wjm251,2010-10-24 03:04

Messages (9)
msg119482 - (view)	Author: wjm251 (wjm251)	Date: 2010-10-24 03:04
windows Xp chinese version see the attached file, the header was set to GBK,and the file is GBK encoded, but why the output was '\xe5\xa4\xa7'(it is utf-8 encoded of Chinese character "大")
msg119483 - (view)	Author: wjm251 (wjm251)	Date: 2010-10-24 03:24
in windows English Version and ubuntu 10.04(locale is utf-8) all have the same the behavior, am I wrong?
msg119487 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-24 06:38
This is not a bug, but intentional. a is a Unicode string; it does not have an encoding internally (not GBK, not UTF-8). Then, the string being exec'ed also becomes a Unicode string. exec'ing Unicode strings is confusing; try to avoid this. The semantics of exec'ing a Unicode string is that all str (but not unicode) literals get encoded as UTF-8. To see the result you expect, write a = "麓贸"
msg119488 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-24 06:42
Oops, I meant a = "大"
msg119489 - (view)	Author: wjm251 (wjm251)	Date: 2010-10-24 06:44
but why it is forced to encoded to utf-8, I think it should be encoded by the locale related encodings,not always utf-8, for example,in GBK locale,it should use GBK to encode the unicode object,right?
msg119490 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-24 06:55
> but why it is forced to encoded to utf-8, > I think it should be encoded by the locale related encodings,not always utf-8, > for example,in GBK locale,it should use GBK to encode the unicode object,right? Wrong. Exec'ing Unicode strings has been specified to encode all strings as UTF-8. This cannot be changed anymore. Even if this was possible to change, it should not use the locale encoding. The source encoding and the locale encoding are independent; the source encoding is normally determined from PEP 263 declarations. So if anything, exec'ing Unicode strings should use an encoding declaration that you have in that string. However, you don't have one, and they are unsupported for Unicode strings, anyway.
msg119491 - (view)	Author: wjm251 (wjm251)	Date: 2010-10-24 07:01
oh,you mentioned the PEP 263 but I already set a header like this,you can see the attached test.py #coding=GBK why exec choose to use utf-8 not GBK? GBK is a valid Chinese character set in python26
msg119492 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-24 07:15
> oh,you mentioned the PEP 263 > but I already set a header like this,you can see the attached test.py > #coding=GBK You have that in test.py, but not in the string you are giving to exec. This is really a separate source code. So you could have written exec '''#coding:GBK print hi('%s') ''' % a You didn't, hence the code you pass to exec has no declared source encoding. > why exec choose to use utf-8 not GBK? exec always choses UTF-8 when exec'ing Unicode strings. The source encoding of the file that has the exec statement must be irrelevant: the string being exec'ed may have been received from a different source file.
msg119493 - (view)	Author: wjm251 (wjm251)	Date: 2010-10-24 07:38
Got that , thank you

History
Date	User	Action	Args
2022-04-11 14:57:07	admin	set	github: 54396
2010-10-24 07:38:36	wjm251	set	messages: +
2010-10-24 07:15:03	loewis	set	messages: +
2010-10-24 07:01:54	wjm251	set	messages: +
2010-10-24 06:55:55	loewis	set	messages: + title: exec encode unicode to utf-8 str automatically in GBK environment -> exec encode unicode to utf-8 str automatically in GBK environment
2010-10-24 06:44:50	wjm251	set	messages: +
2010-10-24 06:42:41	loewis	set	messages: +
2010-10-24 06:38:41	loewis	set	status: open -> closednosy: + loewismessages: + resolution: not a bug
2010-10-24 03:24:24	wjm251	set	messages: +
2010-10-24 03:04:45	wjm251	create