parsing XML - getting lots of white space

Robert Roy rjroy at takingcontrol.com
Sun Nov 5 09:12:49 EST 2000


On Wed, 1 Nov 2000 19:22:30 -0000, "Tom wright"
<thomas.wright1 at ntlworld.REMOVETHIS.com> wrote:

>hi all,
>
>when parsing the following message
>
><?xml version="1.0"?>
><ServerMessage>
>    <Command fromDirection="North">AddUser</Command>
>    <User>
>        <UserName>$userName</UserName>
>        <UserId>$userId</UserId>
>    </User>
></ServerMessage>
>
snip

>Is there a way to loose the '\012' strings and the space filled string ?
>and why is everything in unicode ??

The XML standard specifies that all whitespace is to be passed on to
the processing application. This differs from SGML where extraneous
whitespace is not.

from section 2.10 of the XML spec:

"An XML processor must always pass all characters in a document that
are not markup through to the application. A validating XML processor
must also inform the application which of these characters constitute
white space appearing in element content. "

In the Annotated XML spec, Tim Bray discusses this.

http://www.xml.com/axml/testaxml.htm





More information about the Python-list mailing list