[XML-SIG] Processing xml files with ISO 8859-1 chars

Dan Gunter DKGunter@lbl.gov
Wed, 07 Nov 2001 10:15:33 -0800


This is a multi-part message in MIME format.
--------------173FBD0C4E7173794F79C54C
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by portnoy.lbl.gov id fA7IFXh19857

I stand corrected. That's what you get for skimming the first
reference that pops up in Google :) Of course, checking an _arbitrary_
encoding for correctness seems like a real burden on the parser, but
maybe UTF-8 is so common it should be checked. This time, I will do
the wise thing and defer to the experts on this issue.

Dan


Morus Walter wrote:
>=20
> Dan Gunter writes:
> > The simple answer is that the XML parser is illiterate. Since there a=
re
> > no bit patterns that are illegal in UTF-8, I don't see how the parser
> > could know that the chosen encoding produced, from the user's
> > perspective, garbage. The pretty-printer, on the other hand, knows th=
e
> > difference between printable and non-printable characters and can thu=
s
> > complain.
> >
> Sorry. This is wrong.
> There are a lot of byte combinations that can never occur
> in UTF-8. E.g. there can never be a single 8-bit character between 7-bi=
t
> characters ([\x20-\x7F][\x80-\xFF][\x20-\x7F]).
> So the parser could check, whether the byte stream forms valid utf-8.
>=20
> greetings
>         Morus
>=20
> --
> Th. Morus Walter =B7 Manager Content & Data Development
> xipolis.net GmbH & Co. KG =B7 Schellingstrasse 35 =B7 80799 M=FCnchen
>=20
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig

--=20
[ Dan Gunter, LBNL -  http://www-didc.lbl.gov/~dang/ ]
--------------173FBD0C4E7173794F79C54C
Content-Type: text/x-vcard; charset=us-ascii;
 name="dkgunter.vcf"
Content-Description: Card for Dan Gunter
Content-Disposition: attachment;
 filename="dkgunter.vcf"
Content-Transfer-Encoding: 7bit

begin:vcard 
n:;
x-mozilla-html:FALSE
org:LBNL;DIDC group, DSD Division, NERSC
adr;quoted-printable:;;One Cyclotron Road=0D=0AM/S 50B-2239;Berkeley;CA;94720;USA
adr:;;;;;;
version:2.1
x-mozilla-cpt:;-29184
end:vcard

--------------173FBD0C4E7173794F79C54C--