[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

M.-A. Lemburg mal at egenix.com
Wed Apr 22 16:43:34 EDT 2009


On 2009-04-22 22:06, Walter Dörwald wrote:
> Martin v. Löwis wrote:
>>> "correct" -> "corrected"
>> Thanks, fixed.
>>
>>>> To convert non-decodable bytes, a new error handler "python-escape" is
>>>> introduced, which decodes non-decodable bytes using into a private-use
>>>> character U+F01xx, which is believed to not conflict with private-use
>>>> characters that currently exist in Python codecs.
>>> Would this mean that real private use characters in the file name would
>>> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to
>>> any error handler.
>> The python-escape codec is only used/meaningful if the env encoding
>> is not UTF-8. For any other encoding, it is assumed that no character
>> actually maps to the private-use characters.
> 
> Which should be true for any encoding from the pre-unicode era, but not
> for UTF-16/32 and variants.

Actually it's not even true for the pre-Unicode codecs. It was and is common
for Asian companies to use company specific symbols in private areas
or extended versions of CJK character sets.

Microsoft even published an editor for Asian users create their
own glyphs as needed:

    http://msdn.microsoft.com/en-us/library/cc194861.aspx

Here's an overview for some US companies using such extensions:

    http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=VendorUseOfPUA
(it's no surprise that most of these actually defined their own charsets)

SIL even started a registry for the private use areas (PUAs):

    http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA

This is their current list of assignments:


http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=SILPUAassignments

and here's how to register:


http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA#404a261e

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 22 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-list mailing list