[Expat-discuss] Extra character inserted in CharacterData Handler?

Karl Waclawek karl@waclawek.net
Fri Jun 14 17:08:18 2002


> Hi expaters,
>
> I'm having a problem where this xml document:
>
> <SomeTag>
> <Text>
> · some bulleted point
> </Text>
> </SomeTag>
>
> gets parsed into this output:
>
>
> <SomeTag>
>   <Text>
>  · some bulleted point
> </Text>
>
> </SomeTag>
>
> as you can see I get the extra  added after the bulleted point.
>
> I've seen this behavior in both expat-1.95.2 and expat-1.95.3 on Linux
> 2.2.19 (x86) as well as in vxWorks (Tornado 2.0.1 for ARM). I'm about t=
o
> try and wade into the source myself, but I thought I might ask you folk=
s
> first, just in case I'm missing something obvious.
>
> I've attached a tarball with the xml document, output, example parser
> and Makefile. (the parser's pretty dumb and ugly, I'll apologise for
> that now. It does show the bug though) if the list strips attachments,
> I've also posted the tarball here:
> http://macboy.homeip.net/~mike/PLCM/expatCharError.tar.gz
>

The "." character in your file - 0xB7 - is invalid UTF-8.
Maybe it is valid ISO-8859-1?
In that case you must add an XML declaration.

Actually, 1.95.3 should reject it (and it does so on my system).

Karl