string storage [was: Re: imaplib: is this really so unwieldy?]

Wed May 26 12:07:19 EDT 2021

On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list
<python-list at python.org> wrote:
>
> On 2021-05-26, Alan Gauld <alan.gauld at yahoo.co.uk> wrote:
> > On 25/05/2021 23:23, Terry Reedy wrote:
> >> In CPython's Flexible String Representation all characters in a string
> >> are stored with the same number of bytes, depending on the largest
> >> codepoint.
> >
> > I'm learning lots of new things in this thread!
> >
> > Does that mean that if I give Python a UTF8 string that is mostly single
> > byte characters but contains one 4-byte character that Python will store
> > the string as all 4-byte characters?
> >
> > If so, doesn't that introduce a pretty big storage overhead for
> > large strings?
>
> Memory is cheap ;-)
>

This is true, but sometimes memory translates into time - either
direction. When the Flexible String Representation came in, it was
actually an alternative to using four bytes per character on ALL
strings (not just those that contain non-BMP characters), and it
actually improved performance quite notably, despite some additional
complications.

Performance optimization is a funny science :)

ChrisA