[Tutor] splice a string object based on embedded html tag...

Stella Rockford stella at solarenergenex.com
Fri Jan 30 01:08:36 EST 2004


I am parsing some html apart with sgmllib
the end result is to feed an RSS with info scraped with pycurl
and whatever...

when I run sgmllib.py on the html file object
I am returned an extremely long list of "pieces"

this is good, however,

there is a lot of data I have no use for and it slows the parser down
I have studied the HTML code and found that the info I need
is, naturally, nested in a table with a unique id for CSS

I am assuming that removing everything but this table
before it gets parsed will allow sgmllib to function faster...

I would like to SPLICE everything before and after this table off of 
the file object
this would be the first operation on the object,  but when I looked up 
string's methods
I couldn't quite find what i am looking for to do this.

  indexing and splicing of a string seems only responds to integers

Any advice would be well received

thanks in advance,

Stella




More information about the Tutor mailing list