Is sgmllib.py 's BUG?

Sean 'Shaleh' Perry shalehperry at home.com
Thu Oct 18 01:39:24 EDT 2001


On 18-Oct-2001 limodou wrote:
> Sometimes I use python to analyse a HTML document. But I found that if
> there is a tag start with '<!' not '<!--', sgmllib with treat it as a
> 'special' pattern. It'll be ok mostly, occasionaly failed. Because
> sometimes someone can use tag '<!' for comment. I fix it by treat all
> '<!' as comment, but this will lost declaration like DOCTYPE. Anyone
> has some ideas?

at the start:
special = re.compile('<![^<>]*>')
then later:
match = special.match(rawdata, i)
if match:
    if self.literal:
        self.handle_data(rawdata[i])
        i = i+1
        continue
    i = match.end(0)
    continue

so if you want to handle <!DOCTYPE> it needs to be in a data handler.

-----
We have buried the putrid corpse of Liberty. -- Benito Mussolini




More information about the Python-list mailing list