"".join(string_generator()) fails to be magic
Dan Stromberg
dstromberglists at gmail.com
Thu Oct 11 13:36:06 EDT 2007
On Thu, 11 Oct 2007 01:26:04 -0500, Matt Mackal wrote:
> I have an application that occassionally is called upon to process
> strings that are a substantial portion of the size of memory. For
> various reasons, the resultant strings must fit completely in RAM.
> Occassionally, I need to join some large strings to build some even
> larger strings.
>
> Unfortunately, there's no good way of doing this without using 2x the
> amount of memory as the result. You can get most of the way there with
> things like cStringIO or mmap objects, but when you want to actually
> get the result as a Python string, you run into the copy again.
>
> Thus, it would be nice if there was a way to join the output of a
> string generator so that I didn't need to keep the partial strings in
> memory. <subject> would be the obvious way to do this, but it of
> course converts the generator output to a list first.
>
> --
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
Some options you might evaluate (I'm -not- guaranteeing all of these'll
work "as advertised"):
1) Add some swap space to your machine and use standard python strings
2) Use mmap. I may be wrong, and I know you mentioned mmap, but I suspect
that mmap won't use up VM equal to the size of an mmap'd file; I suspect
it just caches portions of the data in physical memory when it's
convenient to do so with the primary copy of the data residing on disk in
a file
3) Use ctypes, and stay in ctypes - don't convert them to python str's.
Of course, then you're basically writing a C program using the python
intrepreter
4) Use temporary files via the usual file API
5) If you can live with alpha code, you might try the python 3 alpha and
use the mutable "bytes" type, and stay in the "bytes" type - don't convert
it to a str
More information about the Python-list
mailing list