Parsing

Timothy Wu huggiepython at graffiti.idv.tw
Sun Apr 11 12:13:09 EDT 2004


I'm parsing Firefox bookmarks and writing the same bookmark to another 
file. To make sure I read and write utf-8 correctly I make open files 
like this for read and write:

   codecs.open(file, "r", "utf-8")

For regular expression I parse like this:

   m = re.search("<TITLE>(.*?)</TITLE>", line, re.I)

How do I tell the regular expression to parse in utf-8? From the docs it 
seems like I can do re.compile("<TITLE>(.*?)</TITLE>", 'U') for unicode. 
But does it need to be specified to be utf-8 instead of some other 
unicode standards? Or does that matter at all?

And, I'm not calling compile() directly at all. I'm simply calling 
re.search(). How would I specify unicode? Is it simply re.flags = 'U' 
before any call search?

Timothy




More information about the Python-list mailing list