[Python-Dev] Python and the Unicode Character Database (original) (raw)

Alexander Belopolsky alexander.belopolsky at gmail.com
Thu Dec 2 04:28:49 CET 2010


On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy <tjreedy at udel.edu> wrote:

On 12/1/2010 7:44 PM, Alexander Belopolsky wrote:

it.  The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how  Eastern Arabic numerals are written.  So far nobody has even claimed to know conclusively that Arabic-Indic digits are always written left-to-right. Both my personal observations when travelling from Turkey to India and Wikipedia say yes. "When representing a number in Arabic, the lowest-valued position is placed on the right, so the order of positions is the same as in left-to-right scripts." https://secure.wikimedia.org/wikipedia/en/wiki/Arabiclanguage#Numerals

This matches my limited research on this topic as well. However, I am not sure that when these codes are embedded in Arabic text, their logical order always matches their display order. It seems to me that it can go either way depending on the surrounding text and/or presence of explicit formatting codes. Also, I don't understand why Eastern Arabic-Indic digits have the same Bidi-Class as European digits, but Arabic-Indic digits, Arabic decimal and thousands separators have Bidi-Class "AN".

http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types



More information about the Python-Dev mailing list