string storage [was: Re: imaplib: is this really so unwieldy?]

Wed May 26 19:17:10 EDT 2021

On 26/05/2021 22:15, Tim Chase wrote:

> If you don't decode it upon reading it in, it should still be 100MB
> because it's a stream of encoded bytes.  

I usually convert them to utf8.

> You don't specify what you then do with this humongous string, 

Mainly I search for regex patterns which can span multiple lines.
I could chunk it up if memory was an issue but a single read is
just more convenient. Up until now it hasn't been an issue and
to be honest I don't often hit multi-byte characters so mostly
it will be single byte character strings.

They are mostly research papers and such from my university days
written on a Commodore PET and various early DOS computers with
weird long-lost word processors. Over the years they've been
exported/converted/reimported and then re-xported several times.
A very few have embedded text or "graphics"/equations which might
have some unicode characters but its not a big issue for me in practice.
I was more just thinking of the kinds of scenario where big strings
might become a problem if suddenly consuming 4x the storage
you expect.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos