xml.dom.minidom weirdness: bug?

Tue Apr 29 22:51:14 EDT 2008

Hi.

I was writing an xmltv parser using python when I faced some weirdness 
that I couldn't explain.

What I'm doing, is read an xml file, create another dom object and copy 
the element from one to the other.

At no time do I ever modify the original dom object, yet it gets modified.

Unless I missed something, it sounds like a bug to me.

the xml file is simply:
<?xml version="1.0" encoding="utf-8"?>
<tv><channel id="id1"><display-name lang="en">full 
name</display-name></channel></tv>

which I store under the name test.xmltv

Here is the code, I've removed everything that isn't applicable to my 
description. can't make it any simpler I'm afraid:

from xml.dom.minidom import Document
import xml.dom.minidom

def adjusttimezone(docxml, timezone):
	doc = Document()

	# Create the <tv> base element
	tv_xml = doc.createElement("tv")
	doc.appendChild(tv_xml)

	#Create the channel list
	channellist = docxml.getElementsByTagName('channel')

	for x in channellist:
		#Copy the original attributes
		elem = doc.createElement("channel")
		for y in x.attributes.keys():
			name = x.attributes[y].name
			value = x.attributes[y].value
			elem.setAttribute(name,value)
		for y in x.getElementsByTagName('display-name'):
			elem.appendChild(y)
		tv_xml.appendChild(elem)

	return doc

if __name__ == '__main__':
	handle = open('test.xmltv','r')
	docxml = xml.dom.minidom.parse(handle)
	print 'step1'
	print docxml.toprettyxml(indent="  ",encoding="utf-8")
	doc = adjusttimezone(docxml, 1000)
	print 'step2'
	print docxml.toprettyxml(indent="  ",encoding="utf-8")

Now at "step 1" I will display the content of the dom object, quite 
natually it shows:
<?xml version="1.0" encoding="utf-8"?>
<tv>
  <channel id="id1">
    <display-name lang="en">
      full name
    </display-name>
  </channel>
</tv>

After a call to adjusttimezone, "step 2" however will show:
<?xml version="1.0" encoding="utf-8"?>
<tv>
  <channel id="id1"/>
</tv>

That's it !

You'll note that at no time do I modify the content of docxml, yet it 
gets modified.

The weirdness disappear if I change the line
	channellist = docxml.getElementsByTagName('channel')
to
	channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))

However, my understanding is that it shouldn't be necessary.

Any thoughts on this weirdness ?

Thanks
Jean-Yves

-- 
They who would give up an essential liberty for temporary security, 
deserve neither liberty or security (Benjamin Franklin)