[Tutor] Python for Windows: module re, re.LOCALE different fo r Idle and p ython shell? (original) (raw)

Steckel, Ralf Ralf.Steckel at AtosOrigin.com
Thu Jul 29 13:22:11 CEST 2004


Hi Steve,

thanx for your suggestion (what actually made me to improve my script by opening the file via codecs), but this doesn't fix the problem.

In my first script i used f.open() and lines = f.readlines() to get the input. After converting the lines to unicode, printing out German Umlaute as characters shows, that in Idle the Umlaute are printed correctly but in Python shell some DOS-special chars are printed.

By using the codecs.open with encoding = 'iso-8859-1' my basic problem (re doesn't recognize German Umlaute as valid characters in re in Python shell) still exists.

Greetings,

Ralf

PS: Please See Sample:

-file umlaute.txt: öäüÄÖÜß End. -end file umlaute.txt

-script: umlaute.py:

import codecs import re

r = re.compile('[\w]+', re.LOCALE )

f = codecs.open('umlaute.txt', 'r', 'iso-8859-1')

lines = f.readlines()

for line in lines: print 'line', line l = len(line) i = 0 while i < l: print 'character:', line[i], ord(line[i]) i = i + 1

words = r.findall(line)
print 'words:', words

f.close()

dummy = raw_input('') -end script

-output from Idle: >>> ================================ RESTART

line öäüÄÖÜß character: ö 246 character: ä 228 character: ü 252 character: Ä 196 character: Ö 214 character: Ü 220 character: ß 223 character: 13 character: 10 words: [u'\xf6\xe4\xfc\xc4\xd6\xdc\xdf'] line End. character: E 69 character: n 110 character: d 100 character: . 46 character: 13 character: 10 words: [u'End'] -end output from Idle

-output from python shell: D:\Src\Python\wordcount>python umlaute.py line öäüÄÖÜß

character: ö 246 character: ä 228 character: ü 252 character: Ä 196 character: Ö 214 character: Ü 220 character: ß 223 13aracter: character: 10 words: [] line End.

character: E 69 character: n 110 character: d 100 character: . 46 13aracter: character: 10 words: [u'End']

D:\Src\Python\wordcount> -end output from python shell

-----Original Message----- From: Steve [mailto:lonetwin at gmail.com] Sent: Thursday, July 29, 2004 11:57 AM To: Steckel, Ralf Subject: Re: [Tutor] Python for Windows: module re, re.LOCALE different for Idle and p ython shell?

Hi Ralf, Just a wild guess here ....haven't actually tried this .. On Thu, 29 Jul 2004 09:25:45 +0200, Steckel, Ralf <ralf.steckel at atosorigin.com> wrote: > i've written a python script to extract all words from a text file and to > print how often they are used. For doing that i use the re module with: > > r=re.compile('[\w]+', re.LOCALE | re.IGNORECASE) <...snip...> > My question is: how do i get for the command line the same environment as > for Idle? > > I guess this is rather a Windows question than a Python one, because Windows > and DOS both support German 'Umlaute', but it seems they do it with > different character codes. How are you actually passing the contents of the file to the re expression ? Probably you'd have to enforce your particular encoding before have the re parse the string. Something like: s = file('foo.txt').read() unicode(s, ) re.search(r, s) HTH Steve



More information about the Tutor mailing list