"".join(string_generator()) fails to be magic

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Thu Oct 11 02:53:11 EDT 2007


On Thu, 11 Oct 2007 01:26:04 -0500, Matt Mackal wrote:

> I have an application that occassionally is called upon to process
> strings that are a substantial portion of the size of memory. For
> various reasons, the resultant strings must fit completely in RAM.
> Occassionally, I need to join some large strings to build some even
> larger strings.
> 
> Unfortunately, there's no good way of doing this without using 2x the
> amount of memory as the result. You can get most of the way there with
> things like cStringIO or mmap objects, but when you want to actually
> get the result as a Python string, you run into the copy again.
> 
> Thus, it would be nice if there was a way to join the output of a
> string generator so that I didn't need to keep the partial strings in
> memory. <subject> would be the obvious way to do this, but it of
> course converts the generator output to a list first.

Even if `str.join()` would not convert the generator into a list first,
you would have overallocation.  You don't know the final string size
beforehand so intermediate strings must get moved around in memory while
concatenating.  Worst case: all but the last string are already
concatenated and the last one does not fit into the allocated memory
anymore, so there is new memory allocates that can hold both strings ->
double amount of memory needed.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list