Issue 937263: cannot find or replace umlauts (original) (raw)

#codecs1.py # -- coding: -- import codecs import string from string import * import re s = 'Cica Ûl daß Õrölt Ûz' print s ---s = unicode(s,"iso-8859-1") ------- this is needed!! print s print s.lower() print find(s,u'daß') s = replace(s, u'daß', u'dass') print s

Without the unicode conversion a normal string cannot be searched or replaced by other than ascii chars (up to 128). This is very bad praxis. At least iso-8859-1 should be the default codec, not ascii.

Please inform me over email (eleonora46@gmx.net) about the processing of this issue, thanks.

Logged In: YES user_id=89016

If you replace s = 'Cica Ûl daß Õrölt Ûz' with s = u'Cica Ûl daß Õrölt Ûz' (note the u prefix), you can drop the s=unicode(...) line. Specifying an encoding header will only change the unicode literals in the script, not the str literals. Note that ASCII was choosen as the default encoding, because it helps to detect conversion error.