A challenge to the ASCII proponents.

Alan Kennedy alanmk at hotmail.com
Mon Jul 21 05:51:32 EDT 2003


I don't want to go on and on about this, and I'm happy to concede that
some of my points are far from proven, and others are disproven.
However, there are one or two small points I'd like to make.

Ben Finney wrote:

>>> Which quickly leads to "You must use $BROWSER to view this site".
>>> No thanks.

Alan Kennedy wrote:

>> No, that's the precise opposite of the point I was making.

Ben Finney wrote:

> You also stipulated "... from a Usenet post".  Most Usenet readers
> do not handle markup, nor should they.  There are many benefits from
> the fact that posts are plain text, readable by any software that can
> handle character streams;

1. While there may be benefits from posts being plain text, there are
also costs. The cost is a "semantic disconnect", where related
concepts are not searchable, linkable or matchable, because their
character representations are not comparable.

2. I chose the "from a usenet post" restriction precisely because of
the 7-bit issue, because I knew that 8-bit character sets would break
in some places. It was an obstacle course.

> parsing a markup tree for an article is a whole order
> of complexity that I'd rather not have in my newsreader.
>
> Expecting people to use a news reader that attempts to parse markup
> and render the result, is like expecting people to use an email reader
> that attempts to parse markup and render ther result.  Don't.

I don't expect people's newsreaders or email clients to start parsing
embedded XML (I nearly barfed when I saw Microsoft's "XML Data
Islands" for the first time).

What I'm really concerned about is the cultural impact. I voluntarily
maintain a web site for an organisation that has members in 26
countries, who not surprisingly have lots of non-ASCII characters in
their names. Here's one:

http://www.paratuberculosis.org/members/pavlik.htm

Because of the ASCII restriction in URLs, I was only able to offer Dr.
Pavlík the above uri, or this:

http://www.paratuberculosis.org/members/pavl%EDk.htm

which sucks.

Little wonder then that the next generation are choosing to explicitly
remove the accents from their names, i.e. his colleague Dr. Machackova
explicitly asked to have the accents in her name removed. Although I
assured her that her name would be correctly spelled, on web sites
that I maintain, the fact that her name breaks continually with
various ASCII centric technologies makes her think it's not worth the
hassle, or worth the risk of searches for her name failing.

http://www.paratuberculosis.org/members/machackova.htm

And what about Dr. Sigurðardóttir, Dr. Djønne, and Dr. de la Cruz
Domínguez Punaro? Are they destined to be passed over more often than
ASCII-named people?

[BTW, I've written the above in "windows-1252", apologies if it gets
mangled]

Solely because of technical inertia, and unwillingness to address the
(perhaps excessive) complexity of our various communications layers,
i.e. our own "Tower of 7-bit Babel", we're suppressing cultural
diversity, for no technically valid reason.

I personally don't have the slightest problem with reformulating NNTP
and POP to use XML instead: In a way, I think it's almost inevitable,
given how poor our existing "ascii" technologies are at dealing with
i18n and l10n issues. Emails and usenet posts are all just documents
after all.

Would something like this really be so offensive (the Gaelic isn't, I
promise :-)? Or inefficient?

#begin---------
<?xml version="1.0" encoding="windows-1252"?>
<xnntp>
  <subject>An mhaith l'éinne dul go dtí an nGaillimh Dé
Domhnaigh?</subject>
  <from>aláin ó cinnéide</from>
  <to>na cailíní agus na buachaillí</to>
</xnntp>
#end-----------

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list