Regular expressions and non-standard character set

Petri Mikael Kuittinen eye at niksula.hut.fi
Tue Mar 27 10:01:39 EST 2001


"Fredrik Lundh" <fredrik at pythonware.com> writes:

> >>> import locale
> >>> locale.setlocale(locale.LC_ALL, "")
> 'Swedish_Sweden.1252'

I tried to find information about setting locale. The Linux man pages
nor the standard Python were not helpful on this matter.

I want to set the character set manually e.g. to ISO-8859-1 (ISO Latin
1). I tried something like:

locale.setlocale(locale.LC_CTYPE, "iso_88591_1")
locale.setlocale(locale.LC_CTYPE, "en_US.ISO8859-1")

Is there any good tutorial on locale? The best I could find is
http://www.uni-ulm.de/~s_smasch/Locale/

> >>> import re
> >>> re.findall(r"\b...\b", "spam, egg, bacon, and åäö")
> ['egg', 'and']
> >>> re.findall(r"(?L)\b...\b", "spam, egg, bacon, and åäö")
> ['egg', 'and', 'åäö']

I tried the above. It didn't work under Python 2.0 under Windows 2000,
but it worked using pre instead of re (just like you said). The
problem is that is still doesn't solve my problem. åäö work, because
they are part of Finnish LC_CTYPE, but how do I get all the other
national characters work (german umlaut etc.) at the same time?


Petri

-- 
<(O)> Petri Kuittinen, also known as Eye, Dj Eye or Peku               <(O)>
<(O)> ADDRESS: Postipuuntie 10 A 14, FIN-02600 Espoo, Finland          <(O)>
<(O)> EMAIL: eye at iki.fi WWW: http://www.iki.fi/~eye/ PHONE: 09-5472380 <(O)>
~Steckel's Rule to Success: Good enough is never good enough. 




More information about the Python-list mailing list