[Tutor] UTF-8 title() string method
Kent Johnson
kent37 at tds.net
Thu Jul 5 19:52:54 CEST 2007
Jon Crump wrote:
> On Wed, 4 Jul 2007, Kent Johnson wrote:
>> First, don't confuse unicode and utf-8.
>
> Too late ;-) already pitifully confused.
This is a good place to start correcting that:
http://www.joelonsoftware.com/articles/Unicode.html
>> Second, convert the string to unicode and then title-case it, then
>> convert back to utf-8 if you need to:
>
> I'm having trouble figuring out where, in the context of my code, to
> effect these translations.
if s is your utf-8 string, instead of s.title(), use
s.decode('utf-8').title().encode('utf-8')
> In parsing the text file, I depend on
> matching a re:
>
> if re.match(r'[A-Z]{2,}', line)
>
> to identify and process the place name data. If I translate the line to
> unicode, the re fails.
I don't know why that is, re works with unicode strings:
In [1]: import re
In [2]: re.match(r'[A-Z]{2,}', 'ABC')
Out[2]: <_sre.SRE_Match object at 0x12078e0>
In [3]: re.match(r'[A-Z]{2,}', u'ABC')
Out[3]: <_sre.SRE_Match object at 0x11c1f00>
Kent
More information about the Tutor
mailing list