parsing XML - getting lots of white space

Fernando Pereira fcnpereira at home.com
Sat Nov 4 12:57:29 EST 2000


In article <f5_L5.1176$1B5.25648 at news2-win.server.ntlworld.com>, Tom
wright <thomas.wright1 at ntlworld.REMOVETHIS.com> wrote:

> hi all,
> 
> when parsing the following message
> 
> <?xml version="1.0"?>
> <ServerMessage>
>     <Command fromDirection="North">AddUser</Command>
>     <User>
>         <UserName>$userName</UserName>
>         <UserId>$userId</UserId>
>     </User>
> </ServerMessage>
The parser has no way of knowing that the white space between tags is
not significant in your application, since it could be significant to
other applications. There are optional ways of ignoring the whitespace
if a DTD is provided, but I don't know if the Python XML tools support
this. In any case, the safe thing to do is to check explicitly for the
node types that you are relevant to your application and ignore or warn
about the rest. In the case at hand, look explicitly for the elements
that contain the text of interest (eg. <UserName>) and just ignore all
other text nodes.

-- F



More information about the Python-list mailing list