SOT : & in XML-documents

Thomas Weholt 2002 at weholt.org
Tue Oct 8 15:49:13 EDT 2002


Damn!! The original data is filled with it. My solution so far has been to
keep a list ( limited to '&' so far ) of characters to replace ( '&' is
replaced with 'and' etc. ).

Are there any other characters I must avoid/replace?

Thanks for your help.

Best regards,
Thomas

"Henrik Motakef" <henrik.motakef at web.de> wrote in message
news:87bs64903u.fsf at pokey.henrik-motakef.de...
> "Thomas Weholt" <2002 at weholt.org> writes:
>
> > I'm trying to parse an old fileformat into xml. The problem is that the
> > character & appears from time to time in the original file.
> [...]
> > Anybody got any clues on how to avoid problems with characters like
this?
>
> Don't use them ;-) Or, better, proberly escape them as &. This is
> not an issue of the charset, so no XML declaration will save you.
>
> If you are dealing with HTML, you could use tidy (google will find it
> for you) to create well-formed XML. IIRC there is also a shareware
> program that tries to clean up broken XML regardless of it's document
> type, probably called "XML tidy" or some such.
>
> Good luck
> Henrik





More information about the Python-list mailing list