[Python-Dev] PEP 460 reboot

M.-A. Lemburg mal at egenix.com
Mon Jan 13 10:06:00 CET 2014


On 13.01.2014 07:51, Nick Coghlan wrote:
>
> [Using a new asciistr type]
>
> The key thing that the text model change in Python 3 enabled is for us
> to use the type system to *help* with managing the complexity of
> dealing with text encodings. We've got a long way with just the two
> pure types, and no additional types that straddle the binary/text
> boundary the way the Python 2 str type did. Unlike introducing *new*
> ASCII-only operations to the bytes type, adding new types specifically
> for dealing with ASCII compatible formats (especially starting life as
> a third party library) isn't compromising the Python 3 text model,
> it's embracing it and making it work for us (which is why I've been
> suggesting that it be considered since at least 2010). The problem
> with "str" in Python 2 was that one type was used to represent too
> many things with serious semantic differences.
> 
> The ongoing attempts to reintroduce that ambiguity to the core bytes
> type rather than exploring the creation of new types and then filing
> bugs for any interoperability issues those attempts uncover in the
> core types represents one of the worst cases of paradigm lock that I
> have ever seen :P

In theory this sounds nice, but in practice you often run into the issue
that whenever you pass such a str-subtype to some function that
works on str doesn't return the str-subtype as result, but instead
a new str object.

As a result, you have to keep track of which operations work
on your str-subtype alone and which convert it back to a str,
making the approach infeasible for all but the most basic
uses.

This is why we try to make the basic types as useful as possible
for everyone. It's also the main reason why subtyping 8-bit strings
and Unicode in Python 2 wasn't a popular sport :-)

Leaving aside the discussion about str and bytes, I think PEP 460
has much potential of making life easier for people dealing with binary
data: the formatting codes for the bytes format methods could
be extended to include the struct module features - with the struct
module then turning into a proxy for these new format methods (much
like we did with the string module when string methods were
introduced).


BTW: There's a little known trick in Python 2 which also lets you
disable the string to Unicode coercion: all you have to do is
set the default encoding to "undefined" (see site.py:setencoding()).
Python 2 will then raise a UnicodeError whenever coercion would trigger.
I added that codec to experiment with this scenario in the early days
of the Unicode integration.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 13 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list