"".join(string_generator()) fails to be magic

Dan Stromberg dstromberglists at gmail.com
Thu Oct 11 13:36:06 EDT 2007


On Thu, 11 Oct 2007 01:26:04 -0500, Matt Mackal wrote:

> I have an application that occassionally is called upon to process
> strings that are a substantial portion of the size of memory. For
> various reasons, the resultant strings must fit completely in RAM.
> Occassionally, I need to join some large strings to build some even
> larger strings.
> 
> Unfortunately, there's no good way of doing this without using 2x the
> amount of memory as the result. You can get most of the way there with
> things like cStringIO or mmap objects, but when you want to actually
> get the result as a Python string, you run into the copy again.
> 
> Thus, it would be nice if there was a way to join the output of a
> string generator so that I didn't need to keep the partial strings in
> memory. <subject> would be the obvious way to do this, but it of
> course converts the generator output to a list first.
> 
> -- 
>  "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

Some options you might evaluate (I'm -not- guaranteeing all of these'll
work "as advertised"):

1) Add some swap space to your machine and use standard python strings

2) Use mmap.  I may be wrong, and I know you mentioned mmap, but I suspect
that mmap won't use up VM equal to the size of an mmap'd file; I suspect
it just caches portions of the data in physical memory when it's
convenient to do so with the primary copy of the data residing on disk in
a file

3) Use ctypes, and stay in ctypes - don't convert them to python str's. 
Of course, then you're basically writing a C program using the python
intrepreter

4) Use temporary files via the usual file API

5) If you can live with alpha code, you might try the python 3 alpha and
use the mutable "bytes" type, and stay in the "bytes" type - don't convert
it to a str





More information about the Python-list mailing list