[Python-Dev] Python and the Unicode Character Database (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Nov 29 13:43:26 CET 2010
- Previous message: [Python-Dev] Python and the Unicode Character Database
- Next message: [Python-Dev] Python and the Unicode Character Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Nov 29, 2010 at 9:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
If we would go down that road, we would also have to disable other Unicode features based on locale, e.g. whether to apply non-ASCII case mappings, what to consider whitespace, etc.
We don't do that for a good reason: Unicode is supposed to be universal and not limited to a single locale.
Because parsing numbers is about more than just the characters used for the individual digits. There are additional semantics associated with digit ordering (for any number) and decimal separators and exponential notation (for floating point numbers) and those vary by locale. We deliberately chose to make the builtin numeric parsers unaware of all of those things, and assuming that we can simply parse other digits as if they were their ASCII equivalents and otherwise assume a C locale seems questionable.
If the existing semantics can be adequately defined, documented and defended, then retaining them would be fine. However, the language reference needs to define the behaviour properly so that other implementations know what they need to support and what can be chalked up as being just an implementation accident of CPython. (As a point in the plus column, both decimal.Decimal and fractions.Fraction were able to handle the '١٢٣٤.٥٦' example in a manner consistent with the int and float handling)
Regards, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] Python and the Unicode Character Database
- Next message: [Python-Dev] Python and the Unicode Character Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]