Python Unicode handling wins again -- mostly

Neil Cerutti neilc at norwich.edu
Tue Dec 3 08:47:42 EST 2013


On 2013-12-02, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
>> Out of the nine tests, Python 3.3 passes six, with three tests
>> being failures or dubious. If you believe that the native
>> string type should operate on code-points, then you'll think
>> that Python does the right thing.
>
> I think Python is doing it correctly.  If I want to operate on
> "clusters" I'll normalize the string first.

Normalizing doesn't resolve the issues the blog brings up; NFC
can't condense every multi-code-point sequence into one, and
normalizing can lose or mangle information. There are good
examples here: http://unicode.org/reports/tr15/

> Thanks for this excellent post.

Agreed.

-- 
Neil Cerutti




More information about the Python-list mailing list