string storage [was: Re: imaplib: is this really so unwieldy?]

Wed May 26 17:15:38 EDT 2021

On 2021-05-26 18:43, Alan Gauld via Python-list wrote:
> On 26/05/2021 14:09, Tim Chase wrote:
>>> If so, doesn't that introduce a pretty big storage overhead for
>>> large strings?  
>> 
>> Yes.  Though such large strings tend to be more rare, largely
>> because they become unweildy for other reasons.  
> 
> I do have some scripts that work on large strings - mainly produced
> by reading an entire text file into a string using file.read().
> Some of these are several MB long so potentially now 4x bigger than
> I thought. But you are right, even a 100MB string should still be
> OK on a modern PC with 8GB+ RAM!...

If you don't decode it upon reading it in, it should still be 100MB
because it's a stream of encoded bytes.  It would only 2x or 4x in
size if you decoded that (either as a parameter of how you opened it,
or if you later took that string and decoded it explicitly, though
now you have the original 100MB byte-string **plus** the 100/200/400MB
decoded unicode string).

You don't specify what you then do with this humongous string, but
for most of my large files like this, I end up iterating over them
piecewise rather than f.read()'ing them all in at once. Or even if
the whole file does end up in memory, it's usually chunked and split
into useful pieces.  That could mean that each line is its own
string, almost all of which are one-byte-per-char with a couple
strings at sporadic positions in the list-of-strings where they are
2/4 bytes per char.

-tkc