[Tutor] Load Entire File into memory

eryksun eryksun at gmail.com
Tue Nov 5 19:37:05 CET 2013


On Mon, Nov 4, 2013 at 11:26 AM, Amal Thomas <amalthomas111 at gmail.com> wrote:
> @Dave: thanks.. By the way I am running my codes on a server with about
> 100GB ram but I cant afford my code to use 4-5 times the size of the text
> file. Now I am using  read() / readlines() , these seems to be more
> efficient in memory usage than io.StringIO(f.read()).

f.read() creates a string to initialize a StringIO object. You could
instead initialize a BytesIO object with a mapped file; that should
cut the peak RSS down by half. If you need decoded text, add a
TextIOWrapper.

    import io
    import mmap

    with open('output.txt') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mf:
            content = io.TextIOWrapper(io.BytesIO(mf))

    for line in content:
        'process line'

However, before you do something extreme (like say... loading a 50 GiB
file into RAM), try tweaking the TextIOWrapper object's readline() by
increasing _CHUNK_SIZE. This can be up to 2**63-1 in a 64-bit process.

    with open('output.txt') as content:
        content._CHUNK_SIZE = 65536
        for line in content:
            'process line'

Check content.buffer.tell() to confirm that the file pointer is
increasing in steps of the given chunk size.

Built-in open() also lets you set the "buffering" size for the
BufferedReader, content.buffer. However, in this case I don't think
you need to worry about it. content.readline() calls
content.buffer.read1() to read directly from the FileIO object,
content.buffer.raw.


More information about the Tutor mailing list