Problem with "&" charater in xml.

Kirt moqtar at gmail.com
Thu Jul 13 03:25:13 EDT 2006


thanx stefan ur approach worked.

Stefan Behnel wrote:
> Kirt wrote:
> > How do i append characters to a string?
>
> I think the normal approach is to store an empty string (or list) in an
> attribute in startElement(), append to it in characters() and use the result
> in endElement().
>
> def startElement(self, ...):
>     self.chars = ''
> def characters(self, s):
>     self.chars += s
> def endElement(self, ...):
>     value = self.chars
>
> Or use a list and do this:
>
> def endElement(self, ...):
>     value = ''.join(self.char_list)
>
> Maybe you should consider switching to iterparse() of ElementTree or lxml.
> Should be a bit easier to use than SAX ...
>
> http://effbot.org/zone/element-iterparse.htm
> http://codespeak.net/svn/lxml/trunk/doc/api.txt
>
> Stefan
>
>
> > Stefan Behnel wrote:
> >> Kirt wrote:
> >>> i have walked a directory and have written the foll xml document.
> >>> one of the folder had "&" character so i replaced it by "&"
> >>> #------------------test1.xml
> >>> <Directory>
> >>>   <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
> >>> w&y </dirname>
> >>>   <file>
> >>>   <name>def.txt</name>
> >>>   <time>200607130417</time>
> >>>   </file>
> >>> </Directory>
> >>>  <Directory>
> >>>   <dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
> >>> wx</dirname>
> >>>   <file>
> >>>   <name>abc.txt</name>
> >>>   <time>200607130415</time>
> >>>   </file>
> >>> </Directory
> >>>
> >>> now in my python code i want to parse this doc and print the directory
> >>> name.
> >>> ###----------handler------------filename---handler.py
> >>> from xml.sax.handler import ContentHandler
> >>> class oldHandler(ContentHandler):
> >>>                def __init__(self):
> >>>   			self.dn = 0
> >>>                 def startElement(self, name, attrs):
> >>> 			if name=='dirname':
> >>> 				self.dn=1
> >>>
> >>> 		def characters(self,str):
> >>> 			if self.dn:
> >>>                                print str
> >>
> >> The problem is here. "print" adds a newline. Don't use print, just append the
> >> characters (to a string or list) until the endElement callback is called.
> >>
> >>
> >>>                 def endElement(self, name):
> >>> 			if name == 'dirname':
> >>>                          	self.dn=0
> >>>
> >>>
> >>> #---------------------------------------------------------------------
> >>> #main code--- fname----art.py
> >>> import sys
> >>> from xml.sax 	import 	make_parser
> >>> from handlers import	oldHandler
> >>>
> >>> ch = oldHandler()
> >>> saxparser = make_parser()
> >>>
> >>> saxparser.setContentHandler(ch)
> >>> saxparser.parse(sys.argv[1])
> >>> #-----------------------------------------------------------------------------
> >>> i run the code as:  $python art.py test1.xml
> >>>
> >>> i am getting output as:
> >>>
> >>> C:\Documents and Settings\Administrator\Desktop\1\bye w
> >>> &
> >>> y
> >>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
> >>>
> >>> where as i need an output which should look like this.
> >>> C:\Documents and Settings\Administrator\Desktop\1\bye w&y
> >>>
> >>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
> >>>
> >>> Can someone tell me the solution for this.
> >>>
> >




More information about the Python-list mailing list