[Tutor] Python for Windows: module re, re.LOCALE different fo r Idle and p ython shell? (original) (raw)
Steckel, Ralf Ralf.Steckel at AtosOrigin.com
Thu Jul 29 13:22:11 CEST 2004
- Previous message: [Tutor] wxPython buttons in a loop
- Next message: [Tutor] Python for Windows: module re, re.LOCALE different fo r Idle and p ython shell?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Steve,
thanx for your suggestion (what actually made me to improve my script by opening the file via codecs), but this doesn't fix the problem.
In my first script i used f.open() and lines = f.readlines() to get the input. After converting the lines to unicode, printing out German Umlaute as characters shows, that in Idle the Umlaute are printed correctly but in Python shell some DOS-special chars are printed.
By using the codecs.open with encoding = 'iso-8859-1' my basic problem (re doesn't recognize German Umlaute as valid characters in re in Python shell) still exists.
Greetings,
Ralf
PS: Please See Sample:
-file umlaute.txt: öäüÄÖÜß End. -end file umlaute.txt
-script: umlaute.py:
import codecs import re
r = re.compile('[\w]+', re.LOCALE )
f = codecs.open('umlaute.txt', 'r', 'iso-8859-1')
lines = f.readlines()
for line in lines: print 'line', line l = len(line) i = 0 while i < l: print 'character:', line[i], ord(line[i]) i = i + 1
words = r.findall(line)
print 'words:', wordsf.close()
dummy = raw_input('') -end script
-output from Idle: >>> ================================ RESTART
line öäüÄÖÜß character: ö 246 character: ä 228 character: ü 252 character: Ä 196 character: Ö 214 character: Ü 220 character: ß 223 character: 13 character: 10 words: [u'\xf6\xe4\xfc\xc4\xd6\xdc\xdf'] line End. character: E 69 character: n 110 character: d 100 character: . 46 character: 13 character: 10 words: [u'End'] -end output from Idle
-output from python shell: D:\Src\Python\wordcount>python umlaute.py line öäüÄÖÜß
character: ö 246 character: ä 228 character: ü 252 character: Ä 196 character: Ö 214 character: Ü 220 character: ß 223 13aracter: character: 10 words: [] line End.
character: E 69 character: n 110 character: d 100 character: . 46 13aracter: character: 10 words: [u'End']
D:\Src\Python\wordcount> -end output from python shell
-----Original Message----- From: Steve [mailto:lonetwin at gmail.com] Sent: Thursday, July 29, 2004 11:57 AM To: Steckel, Ralf Subject: Re: [Tutor] Python for Windows: module re, re.LOCALE different for Idle and p ython shell?
Hi Ralf, Just a wild guess here ....haven't actually tried this .. On Thu, 29 Jul 2004 09:25:45 +0200, Steckel, Ralf <ralf.steckel at atosorigin.com> wrote: > i've written a python script to extract all words from a text file and to > print how often they are used. For doing that i use the re module with: > > r=re.compile('[\w]+', re.LOCALE | re.IGNORECASE) <...snip...> > My question is: how do i get for the command line the same environment as > for Idle? > > I guess this is rather a Windows question than a Python one, because Windows > and DOS both support German 'Umlaute', but it seems they do it with > different character codes. How are you actually passing the contents of the file to the re expression ? Probably you'd have to enforce your particular encoding before have the re parse the string. Something like: s = file('foo.txt').read() unicode(s, ) re.search(r, s) HTH Steve
- Previous message: [Tutor] wxPython buttons in a loop
- Next message: [Tutor] Python for Windows: module re, re.LOCALE different fo r Idle and p ython shell?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]