[Tutor] Regular Expression guru saught

Jeff Shannon jeff@ccvcorp.com
Mon Aug 4 15:01:14 EDT 2003


Kirk Bailey wrote:

> This thing is just flat going to need a lot of re stuff, and I need 
> therefore to ome up to speed on re. 


I'm not so sure that re's are quite what you want -- or at least, I'm 
not sure if re's are enough.

The problem with re's is that they're not very good at handling nested 
data structures.  It's often mentioned that re's are not appropriate for 
parsing HTML or XML because of this limitation, and I suspect that the 
same will apply to parsing your simple wiki code as well.  You could 
perhaps write re's that will handle the majority of likely cases, but 
(if I'm right) it's almost assured that eventually, someone will write a 
wiki page that can't be properly parsed with a re-based approach.  

It seems to me that you may be better off investigating a full 
lexer/parser, a la the HTMLParser or sgmllib standard library modules. 
 In fact, you may be able to use a lot of the code from HTMLParser.  It 
looks like this uses re's internally, to identify tags, but it does a 
lot more than just extracting re groups.

Jeff Shannon
Technician/Programmer
Credit International






More information about the Tutor mailing list