sgmllib parser keeps old tag data?

MRAB google at mrabarnett.plus.com
Fri Feb 13 10:41:52 EST 2009


Berend van Berkum wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Fri, Feb 13, 2009 at 02:31:40PM +0000, MRAB wrote:
>> Berend van Berkum wrote:
>>> import sgmllib
>>>
>>>
>>> class MyParser(sgmllib.SGMLParser):
>>>
>>> 	content = ''		
>>> 	markup = []
>>> 	span_stack = []
>>>
>> These are in the _class_ itself, so they will be shared by all its
>> instances. You should so something like this instead:
>>
>> 	def __init__(self):
>> 		self.content = ''
>> 		self.markup = []
>> 		self.span_stack = []
>>
> 
> Yes.. tested that and SGMLParser won't let me override __init__, 
> (SGMLParser vars are uninitialized even with sgmllib.SGMLParser(self) call).

OK, so SGMLParser needs to be initialised:

	def __init__(self):
		sgmllib.SGMLParser.__init__(self)
		self.content = ''
		self.markup = []
		self.span_stack = []

> Tried some but not the following:
> with a differently named init function and one boolean class var 'initialized'
> it can check 'if self.initialized' in front of each handler. Does the trick.
> 
> Confusion dissolved :)
> thanks.
> 



More information about the Python-list mailing list