[XML-SIG] Parsing XML file with Minidom has problem with cr/lf

Bill Kinnersley billk at sunflower.com
Mon May 10 19:59:17 CEST 2010


> I am parsing an XML file with Python 2.6.5 minidom in Windows and it is
> mostly working but minidom seems to have problems dealing with Windows
> cr/lf characters. It creates an extra textnode that needs to be ignored
> instead of just returning the xml elements. I have tried different
> methods of opening the file but it doesn’t seem to make a difference. It
> is happiest when reading a file in Unix format.
>
> *Wayne Peterson **|** Consultant
> Sierra Systems

Wayne,

It sounds to me like you're doing everything correctly.

- XML files are text files, and should be read as text.

- In the absence of a DTD, all whitespace is regarded as significant. 
Typically this means yes, there will be a text node between consecutive 
element nodes.

- The XML processor is required to return end-of-line as a single '\n', 
regardless of which OS or programming language.

If you are traversing every node, you'll need to explicitly ignore the 
text nodes. More usually you don't have to deal with them, because you 
know what nodes you're looking for and pick them out with 
GetElementsByTagName.




More information about the XML-SIG mailing list