Regular expression problem

Sean 'Shaleh' Perry shalehperry at attbi.com
Thu Feb 28 01:17:11 EST 2002


On 28-Feb-2002 Asheesh Laroia wrote:
> I've been trying to use sgmllib, actually, to delete all the other tags.
> 
> It just doesn't handle the <@ [...] > condition well.  It refuses to
> parse it, treating it as text.
> 

The reason is this:

starttagopen = re.compile('<[>a-zA-Z]')
tagfind = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*')

near the top of sgmllib.py.

Changing them in your code will allow the parser to understand the tag. 
However there is another problem which requires more work.  When a tag is found
the parser tries to run 'start_' + tag.  start_ at Trap() is not a valid python
name.  You could redefine the function which calls the handlers so that it
looks for perhaps start_atTrap().  This would allow you to use the SGMLParser
for all of your parsing needs, but may also be overkill for the problem.




More information about the Python-list mailing list