Problem with "&" charater in xml.

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Jul 13 02:55:06 EDT 2006


Kirt wrote:
> How do i append characters to a string?

I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(self, ...):
    self.chars = ''
def characters(self, s):
    self.chars += s
def endElement(self, ...):
    value = self.chars

Or use a list and do this:

def endElement(self, ...):
    value = ''.join(self.char_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan


> Stefan Behnel wrote:
>> Kirt wrote:
>>> i have walked a directory and have written the foll xml document.
>>> one of the folder had "&" character so i replaced it by "&"
>>> #------------------test1.xml
>>> <Directory>
>>>   <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
>>> w&y </dirname>
>>>   <file>
>>>   <name>def.txt</name>
>>>   <time>200607130417</time>
>>>   </file>
>>> </Directory>
>>>  <Directory>
>>>   <dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
>>> wx</dirname>
>>>   <file>
>>>   <name>abc.txt</name>
>>>   <time>200607130415</time>
>>>   </file>
>>> </Directory
>>>
>>> now in my python code i want to parse this doc and print the directory
>>> name.
>>> ###----------handler------------filename---handler.py
>>> from xml.sax.handler import ContentHandler
>>> class oldHandler(ContentHandler):
>>>                def __init__(self):
>>>   			self.dn = 0
>>>                 def startElement(self, name, attrs):
>>> 			if name=='dirname':
>>> 				self.dn=1
>>>
>>> 		def characters(self,str):
>>> 			if self.dn:
>>>                                print str
>>
>> The problem is here. "print" adds a newline. Don't use print, just append the
>> characters (to a string or list) until the endElement callback is called.
>>
>>
>>>                 def endElement(self, name):
>>> 			if name == 'dirname':
>>>                          	self.dn=0
>>>
>>>
>>> #---------------------------------------------------------------------
>>> #main code--- fname----art.py
>>> import sys
>>> from xml.sax 	import 	make_parser
>>> from handlers import	oldHandler
>>>
>>> ch = oldHandler()
>>> saxparser = make_parser()
>>>
>>> saxparser.setContentHandler(ch)
>>> saxparser.parse(sys.argv[1])
>>> #-----------------------------------------------------------------------------
>>> i run the code as:  $python art.py test1.xml
>>>
>>> i am getting output as:
>>>
>>> C:\Documents and Settings\Administrator\Desktop\1\bye w
>>> &
>>> y
>>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
>>>
>>> where as i need an output which should look like this.
>>> C:\Documents and Settings\Administrator\Desktop\1\bye w&y
>>>
>>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
>>>
>>> Can someone tell me the solution for this.
>>>
> 



More information about the Python-list mailing list