[solved] RE: No explanation for weird behavior in re module! [reading files in ISO-8859-1]

synthespian synthespian at uol.com.br
Mon Feb 11 20:17:23 EST 2002


In article <Xns91B26FC9941CEmhueningzedatfuberli at 130.133.1.4>, Matthias Huening <mhuening at zedat.fu-berlin.de> wrote:
> "Tim Peters" <tim.one at home.com> wrote in
> news:mailman.1013386937.27383.python-list at python.org: 
> 
>> 
>>>      Other than the fact that 'Tür' has the 'ü' unicode charcater, I
>>>      fail 
>>> to see any difference! 
>> 
>> Heh.  Leaving this joy to someone else <wink>.
>> 
> 
> Okay, I'll try...
> If your string comes in 'Latin-1' you will have to tell Python to treat it 
> as Unicode. And when you want to print it afterwards, you'll have to 
> encode it as 'Latin-1'.
> 
> ---
> import re
> txt = 'die Tür, Türen'
> txt = unicode(txt, 'latin-1')
> pattern = re.compile(ur'(der|die|das)\s+(\w+)', re.UNICODE)
> 
> match = pattern.match(txt)
> article = match.group(1)
> noun = match.group(2)
> 
> print article.encode('latin-1')
> print noun.encode('latin-1')
> ---
> 
> Hope this helps.
> Matthias

Matthias-

	Thank you for this simple elegant solution.
	Although Jason Orendorf has also posted a similar solution, this one is simpler.
	Thanks for all the fine people who tried to help, too.

	Learning German will never be the same again for me!!
	Danke schön!!! :-))

	Cheers
	Henry
	synthespian at uol.com.br






More information about the Python-list mailing list