reading file contents to an array (newbie)

John Lenton jlenton at gmail.com
Tue Jul 6 23:33:58 EDT 2004


On Tue, 6 Jul 2004 21:02:32 -0400, Christopher T King <squirrel at wpi.edu> wrote:
> For great readability (at the cost of some speed), I might suggest writing
> the above using a nested function, so your final output looks like this:
> 
> from numarray import *
> 
> def parseline(line):
>         return [float(value) for value in line.split()]
> 
> myFile=file('test.dat',mode='rt')
> data=array([parseline(line) for line in myFile])

actually, I find the following more readable, and even faster:

    from mmap import mmap, MAP_PRIVATE, PROT_READ
    from os import fstat

    f = file('test.dat',mode='rt')
    fd = f.fileno()
    m = mmap(fd, fstat(fd).st_size, MAP_PRIVATE, PROT_READ)

    data=[]
    while True:
        line = m.readline()
        if not line: break
        data.extend(map(float, line.split()))

of course the speedup is because of mmap, not because of faster python
code; however, remember this is (once you've got rid of the evil eval)
an IO-bound task, so anything you do to speed up that (like the mmap)
is a gain. If mmap returned something you could iterate over, you
could probably shave a second off (I shaved 3 seconds of your example
with this, and your example shaved 11 seconds of the original---on my
machine, with my data, and my wife asking for the computer).

(I'd replace the map with a list comprehension as soon as the function
stopped being C)

I'd talk about numarray.memmap if I knew it were going to be useful,
but as I don't, I won't.



PS: use mmap! it's not the '70s any more!

-- 
John Lenton (jlenton at gmail.com) -- Random fortune:
bash: fortune: command not found



More information about the Python-list mailing list