[solved] RE: No explanation for weird behavior in re module! [reading files in ISO-8859-1]
synthespian
synthespian at uol.com.br
Mon Feb 11 20:17:23 EST 2002
In article <Xns91B26FC9941CEmhueningzedatfuberli at 130.133.1.4>, Matthias Huening <mhuening at zedat.fu-berlin.de> wrote:
> "Tim Peters" <tim.one at home.com> wrote in
> news:mailman.1013386937.27383.python-list at python.org:
>
>>
>>> Other than the fact that 'Tür' has the 'ü' unicode charcater, I
>>> fail
>>> to see any difference!
>>
>> Heh. Leaving this joy to someone else <wink>.
>>
>
> Okay, I'll try...
> If your string comes in 'Latin-1' you will have to tell Python to treat it
> as Unicode. And when you want to print it afterwards, you'll have to
> encode it as 'Latin-1'.
>
> ---
> import re
> txt = 'die Tür, Türen'
> txt = unicode(txt, 'latin-1')
> pattern = re.compile(ur'(der|die|das)\s+(\w+)', re.UNICODE)
>
> match = pattern.match(txt)
> article = match.group(1)
> noun = match.group(2)
>
> print article.encode('latin-1')
> print noun.encode('latin-1')
> ---
>
> Hope this helps.
> Matthias
Matthias-
Thank you for this simple elegant solution.
Although Jason Orendorf has also posted a similar solution, this one is simpler.
Thanks for all the fine people who tried to help, too.
Learning German will never be the same again for me!!
Danke schön!!! :-))
Cheers
Henry
synthespian at uol.com.br
More information about the Python-list
mailing list