[Python-ideas] bytes indexing behavior
Nick Coghlan
ncoghlan at gmail.com
Tue Jun 7 16:07:14 EDT 2016
On 7 June 2016 at 12:01, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Serhiy Storchaka writes:
>
> > I think representing bytes as an array of ints was good decision. If you
> > need indexing to return a substring, you should use str instead. It is
> > as well memory efficient thanks to PEP 393.
>
> You can do this by using latin-1 as the codec, but that's pretty
> unpleasant, because of the risk of combining with another str and
> getting mojibake.
>
> I have long thought that it would be interesting to have a codec and
> an extension to PEP 393 that gives "asciibytes" behavior. That is,
> the codec simply slops the bytes into the 8-bit storage of a string,
> but when joined with another string the result types are:
>
> asciibytes other arg result
> has 8bit type type
> yes pure ascii asciibytes
> yes asciibytes asciibytes
> yes other str str with 8bit bytes from asciibytes
> encoded as PEP 383 surrogateescape
> (note: promotes latin1 to 2-byte-wide)
> no whatever whatever
>
> I think Nick actually had a module that worked pretty much like this,
> but he never pushed it. I've never had time to reason out the
> possible failure modes, though, or the performance issues. And it's
> not an itch I personally need to scratch.
Benno Rice, rather than me (although I gave Benno the idea):
https://github.com/jeamland/asciicompat
Managing extra C dependencies is a pain though, and it's a dubious
idea at best, so neither of us seriously pushed for anyone to use it.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list