python: ascii read

Brian van den Broek bvande at po-box.mcgill.ca
Thu Sep 16 11:56:35 EDT 2004


Alex Martelli said unto the world upon 2004-09-16 07:22:
> Sebastian Krause <canopus at gmx.net> wrote:
> 
> 
>>Hello,
>>
>>I tried to read in some large ascii files (200MB-2GB) in Python using
>>scipy.io.read_array, but it did not work as I expected. The whole idea
>>was to find a fast Python routine to read in arbitrary ascii files, to
>>replace Yorick (which I use right now and which is really fast, but not
>>as general as Python). The problem with scipy.io.read_array was, that it
>>is really slow, returns errors when trying to process large files and it
>>also changes (cuts) the files (after scipy.io.read_array processed a 2GB
>>file its size was only 64MB).
>>
>>Can someone give me hint how to use Python to do this job correctly and
>>fast? (Maybe with another read-in routine.)
> 
> 
> If all you need is what you say -- read a huge amount of ASCII data into
> memory -- it's hard to beat
>     data = open('thefile.txt').read()
> 
> mmap may in fact be preferable for many uses, but it doesn't actually
> read (it _maps_ the file into memory instead).
> 
> 
> Alex

Hi all,

[neophyte question warning]

I'd not been aware of mmap until this post. Looking at the Library 
Reference and my trusty copy of Python in a Nutshell, I've gotten some 
idea of the differences between using mmap and the .read() method on a 
file object -- such as it returns a mutable object vs an immutable 
string, constraint on slice assignment that len(oldslice) must be equal 
to len(newslice), etc.

But I don't really feel I've a handle on the significance of saying it 
maps the file into memory versus reading the file. The naive thought is 
that since the data gets into memory, the file must be read. But this 
makes me sure I'm missing a distinction in the terminology. Explanations 
and pointers for what to read gratefully received.

And, since mmap behave differently on different platforms, I'm mostly a 
win32 user looking to transition to Linux.

Best to all,

Brian vdB




More information about the Python-list mailing list