Issue 13432: Encoding alias "unicode" (original) (raw)

Created on 2011-11-19 11:35 by kxroberto, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (9)
msg147936 - (view) Author: kxroberto (kxroberto) Date: 2011-11-19 11:35
"unicode" seems not to be an official unicode encoding name alias. Yet it is quite frequent on the web - and obviously means UTF-8. (search '"text/html; charset=unicode"' in Google) Chrome and IE display it as UTF-8. (Mozilla as ASCII, thus mixed up chars). Should it be added in to aliases.py ? --- ./aliases.py +++ ./aliases.py @@ -511,6 +511,7 @@ 'utf8' : 'utf_8', 'utf8_ucs2' : 'utf_8', 'utf8_ucs4' : 'utf_8', + 'unicode' : 'utf_8', # uu_codec codec 'uu' : 'uu_codec',
msg147937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-19 11:49
Sorry, but it's not obviously that Unicode means UTF-8.
msg147938 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-11-19 12:03
Definitely; this will just serve to create more confusion for beginners over what a Unicode string is: unicodestring.encode('unicode') <- WTF?
msg147969 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-19 20:28
Joining the chorus: people who need it in their application will have to add it themselves (monkeypatching the aliases dictionary as appropriate).
msg148309 - (view) Author: kxroberto (kxroberto) Date: 2011-11-25 08:22
I wonder where is the origin, who is the inventor of the frequent charset=unicode? But: "Sorry, but it's not obviously that Unicode means UTF-8." When I faced the first time on the web, I guessed it is UTF-8 without looking. It even sounds colloquially reasonable ;-) And its right 99.999% of cases. (UTF-16 is less frequent than this non-canonical "unicode") "Definitely; this will just serve to create more confusion for beginners over what a Unicode string is: unicodestring.encode('unicode') <- WTF?" I guess no python tutorial writer or encoding menu writer poses that example. That string comes in on technical paths: web, MIME etc. In the aliases.py there are many other names which are not canonical. frequency > convenience > alias "Joining the chorus: people who need it in their application will have to add it themselves (monkeypatching the aliases dictionary as appropriate)." Those people first would need to be aware of the option: Be all-seeing, or all wait for the first bug reports ... Reverse question: what would be the minus of having this alias?
msg148312 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-25 11:43
> Python is not a language written for the web, it's generic language to program anything! If you have a problem to parse an HTML page, the special case should be added to the HTML parser, not to the language. Do you have the encoding issue with a parser included in Python (html.parser.*)? If you have the issue with an third-party parser, you have to report the bug there.
msg148353 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-11-25 19:38
The mapping "unicode" -> "utf-8" is simply not defined unambiguously, in addition to being factually wrong. For example, when Microsoft talks about Unicode they mean UTF-16.
msg148354 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-25 19:46
> For example, when Microsoft talks about Unicode they mean UTF-16. Sorry, but UTF-16 is ambiguously: do you mean UTF-16-LE or UTF-16-BE? ;-)
msg148362 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-25 21:09
> Reverse question: what would be the minus of having this alias? Please accept that this issue is closed.
History
Date User Action Args
2022-04-11 14:57:23 admin set github: 57641
2011-11-25 21:09:54 loewis set messages: +
2011-11-25 19:46:49 vstinner set messages: +
2011-11-25 19:38:42 georg.brandl set messages: +
2011-11-25 11:43:21 vstinner set messages: +
2011-11-25 08:22:26 kxroberto set messages: +
2011-11-19 20:57:25 ezio.melotti set stage: resolvedversions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.4
2011-11-19 20:28:46 loewis set nosy: + loewismessages: +
2011-11-19 12:03:32 georg.brandl set status: open -> closednosy: + georg.brandlmessages: + resolution: rejected
2011-11-19 11:49:47 vstinner set nosy: + vstinnermessages: +
2011-11-19 11:36:16 kxroberto set nosy: + ezio.melottitype: enhancementcomponents: + Unicodeversions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
2011-11-19 11:35:12 kxroberto create