A challenge to the ASCII proponents.

Steven D'Aprano steve at cyber.com.au
Sun Jul 20 21:56:51 EDT 2003


Alan Kennedy <alanmk at hotmail.com> wrote in message news:<3F1AAC0A.7A6326B at hotmail.com>...
> Alan Kennedy:
> 
> > The final point I'd like to make [explicit] is: nobody had to ask
> > me how or why my xml snippet worked: there were no tricks. Nobody
> > asked for debugging information, or for reasons why they couldn't
> > see it:

Sorry Alan, but when I follow your instructions and save your XML to
disk and open it in Opera 6.01 on Win 98, I get this:

XML parsing failed: not well-formed (1:0)

At least it renders visibly in my browser, although I don't think its
rendering the way you wished. <grin>

(For the record, this is the contents of the XML file, triple-quoted
for your convenience:
"""<?xml version="1.0" encoding="utf-8"?>
<verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>""")


[snip]
> In summary:
> 
> 1. I managed to make a greek word, using the original greek glyphs,
> appear on everyone's "rendering surface", by posting a 7-bit clean XML
> snippet. Another poster widened the software coverage even further by
> posting a 7-bit clean HTML snippet. Both of our 7-bit markup snippets
> travelled safely throughout the entirety of UseNet, including all the
> 7-bit relays and gateways.

I couldn't see either rendered correctly in either Opera's newsreader
or the Google archive.

> 2. The only other person who managed it, without using markup, was
> Martin von Loewis, who is so good at this stuff that he confidently
> makes statements like "what I did was right: it was Google that got it
> wrong". Martin used the UTF-8 character set, i.e. a non-ASCII,
> non-7-bit-clean character set, to achieve this. Although I'm sure
> Martin could have managed it with UTF-7 as well.

Martin's effort did work for me in Opera's newsreader, but not in the
Google Groups archive. But we already knew that Google broke it.

> 3. If anybody else was willing to give it a try, they don't seem to
> have had enough confidence in their knowledge of encodings, MIME,
> transports, NNTP, etc, etc, to have actually hit the "send" button, in
> case it didn't work. Which doesn't bode well for the average person in
> the street: if the technology specialists in this newsgroup don't feel
> in command of the issue, what hope for everyone else?

Exactly. Which brings us back to Ben's suggestion: when writing for a
general audience using unknown systems, stick to ASCII, or at least
follow your rich text with a description of what your reader should
see:

"""And I can use Umlauts (äöü) -- you should see a, o and u all in
lowercase with two dots on top."""

It's a mess and I despair. It would be nice if everyone used bug-free
XML-aware newsreaders, browsers and mail clients, but the majority
don't. That's why I always practice defensive writing whenever I use
any character I can't see on my keyboard, and spell it out in ASCII.
That's not very satisfactory, but its better than some random
percentage of your audience seeing "?????".


-- 
Steven D'Aprano




More information about the Python-list mailing list