Is Unicode support so hard...

Chris “Kwpolska” Warrick kwpolska at gmail.com
Sat Apr 20 14:15:07 EDT 2013


On Sat, Apr 20, 2013 at 8:02 PM, Benjamin Kaplan
<benjamin.kaplan at case.edu> wrote:
> On Sat, Apr 20, 2013 at 10:22 AM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> On 4/20/2013 1:12 PM, jmfauth wrote:
>>>
>>> In a previous post,
>>>
>>>
>>> http://groups.google.com/group/comp.lang.python/browse_thread/thread/6aec70817705c226#
>>> ,
>>>
>>> Chris “Kwpolska” Warrick wrote:
>>>
>>> “Is Unicode support so hard, especially in the 21st century?”
>>>
>>> --
>>>
>>> Unicode is not really complicate and it works very well (more
>>> than two decades of development if you take into account
>>> iso-14****).
>>>
>>> But, - I can say, "as usual" - people prefer to spend their
>>> time to make a "better Unicode than Unicode" and it usually
>>> fails. Python does not escape to this rule.
>>>
>>> -----
>>>
>>> I'm "busy" with TeX (unicode engine variant), fonts and typography.
>>> This gives me plenty of ideas to test the "flexible string
>>> representation" (FSR). I should recognize this FSR is failing
>>> particulary very well...
>>>
>>> I can almost say, a delight.
>>>
>>> jmf
>>> Unicode lover
>>
>> I'm totally confused about what you are saying.  What does "make a better
>> Unicode than Unicode" mean?  Are you saying that Python is guilty of this?
>> In what way?  Can you provide specifics?  Or are you saying that you like
>> how Python has implemented it?  "FSR is failing ... a delight"?  I don't
>> know what you mean.
>>
>> --Ned.
>
> Don't bother trying to figure this out. jmfauth has been hijacking
> every thread that mentions Unicode to complain about the flexible
> string representation introduced in Python 3.3. Apparently, having
> proper Unicode semantics (indexing is based on characters, not code
> points) at the expense of performance when calling .replace on the
> only non-ASCII or BMP character in the string is a horrible bug.
> --
> http://mail.python.org/mailman/listinfo/python-list

Don’t forget the original context: this was a short remark to a guy I
was responding to.  His newsgroups software (slrn according to the
headers) mangled the encoding of U+201C and U+201D in my From field,
turning them into three question marks each.  And jmf started a rant,
as usual…

PS. There are two fancy Unicode characters around.  Can you find both
of them, jmf?

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail                | always bottom-post
http://asciiribbon.org        | http://caliburn.nl/topposting.html



More information about the Python-list mailing list