[ python-Bugs-937282 ] minidom.py writes TEXT_NODE wrong

SourceForge.net noreply at sourceforge.net
Wed Apr 21 17:33:16 EDT 2004


Bugs item #937282, was opened at 2004-04-18 14:11
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=937282&group_id=5470

Category: XML
Group: Python 2.3
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Bernd Preusing (bpreusing)
Assigned to: Nobody/Anonymous (nobody)
Summary: minidom.py writes TEXT_NODE wrong

Initial Comment:
Hi,

if I read in a

<tag>value</tag>

it is written with indent and line feeds like this:
domdoc.writexml(of, "", "\t", "\n", self.encoding)

<tag>
     value
</tag>

This behaviour destroys the value, since white space
and line feed belong to the value after the next
reading.

I could circumvent this with strip(), but every XML
validator raises an error, if value is an enumeration
or boolean.

CDATA has a similar problem.

Thanks,
  Bernd


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2004-04-21 23:33

Message:
Logged In: YES 
user_id=21627

If you want line breaks at certain points where pretty
printing would not normally put them, you either need to
traverse the tree yourself and put out XML in the form you
like, or you can add explicit text nodes to the tree where
you think they belong.

Alternatively, you can, of course, modify minidom and use
the modified implementation instead. 

I personally see no problem with having an XML file of 5 MB
with no line breaks. Python will parse such a file just as
efficiently as it parses a file with line breaks; most
likely, all other XML applications have no problems with
that, either.

----------------------------------------------------------------------

Comment By: Bernd Preusing (bpreusing)
Date: 2004-04-21 10:35

Message:
Logged In: YES 
user_id=879395

So I should put a 5 megabyte XML file into one single line?
You are kidding!

White-space between tags is only relevant between end node
tags (text or CDATA).
I have a modified myminidom.py now, which tests if the child is 
a text node. So the output looks very pretty like this:

<element>
  <tag>value</tag>
</element>



----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-04-20 21:53

Message:
Logged In: YES 
user_id=21627

This is not a bug. Passing white-space arguments to writexml
is specifically added for the purpose of pretty-printing.
This, by design, will add whitespace to the output document
that was not present in the input document.

If you don't want pretty-printing, just don't pass these
additional parameters.

>>> import xml.dom.minidom
>>> d=xml.dom.minidom.parseString("<tag>value</tag>")
>>> import sys
>>> d.writexml(sys.stdout)
<?xml version="1.0" ?>
<tag>value</tag>>>>
>>> d.writexml(sys.stdout);print
<?xml version="1.0" ?>
<tag>value</tag>



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=937282&group_id=5470



More information about the Python-bugs-list mailing list