Managing Google Groups headaches

Ned Batchelder ned at nedbatchelder.com
Fri Dec 6 21:24:50 EST 2013


On 12/6/13 8:03 AM, rusi wrote:
>> I think you're off on the wrong track here.  This has nothing to do with
>> >plain text (ascii or otherwise).  It has to do with divorcing how you
>> >store and transport messages (be they plain text, HTML, or whatever)
>> >from how a user interacts with them.
>
> Evidently (and completely inadvertently) this exchange has just
> illustrated one of the inadmissable assumptions:
>
> "unicode as a medium is universal in the same way that ASCII used to be"
>
> I wrote a number of ellipsis characters ie codepoint 2026 as in:
>
>    - human communication…
> (is not very different from)
>    - machine communication…
>
> Somewhere between my sending and your quoting those ellipses became
> the replacement character FFFD
>
>>> > >   - human communication�
>>> > >(is not very different from)
>>> > >   - machine communication�
> Leaving aside whose fault this is (very likely buggy google groups),
> this mojibaking cannot happen if the assumption "All text is ASCII"
> were to uniformly hold.
>
> Of course with unicode also this can be made to not happen, but that
> is fragile and error-prone.  And that is because ASCII (not extended)
> is ONE thing in a way that unicode is hopelessly a motley inconsistent
> variety.

You seem to be suggesting that we should stick to ASCII.  There are of 
course languages that need more than just the Latin alphabet.  How would 
you suggest we support them?  Or maybe I don't understand?

--Ned.




More information about the Python-list mailing list