Something faster then sgmllib for sucking out URLs

Alex Polite m2 at plusseven.com
Wed Jun 12 16:28:29 EDT 2002


I'm working on a webspider to fit my sick needs. The profiler
tells me that about 95% of the time is spent in sgmllib. I use sgmllib
solely for extracting URLs. I'm looking for a faster way of doing
this. Regular expressions, string searches? What's the way to go? I'm
not a python purist. Calling some fast C program with the html as
argument and getting back a list of URLs would be fine by me.

-- 
Alex Polite
http://plusseven.com/gpg/





More information about the Python-list mailing list