sgmllib too slow

Stuart D. Gathman stuart at bmsi.com
Mon May 6 23:16:10 EDT 2002


I've run into my very first situation where python is not "fast enough". I
am using the sgmllib module to parse HTML attachments in a milter.  The
processor idle time goes from 80% to 30% when the HTML parsing is turned
on (machine is also a web server, so this is bad).  It takes 5 minutes to
parse a 150K attachment.  (100Mhz 604 PPC).

1. Rewriting the whole thing in C is out of the question.  Rewriting in
Java is a possibility, and easier than C - but not nearly as easy as
Python.

2. Since sgmllib.SGMLParser is callback based, I could make a flex or
bison grammar in C with recognized elements calling back to the SGMLParser
methods. That may or may not speed things up.

Any suggestions?

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.



More information about the Python-list mailing list