reading file contents to an array (newbie)
John Lenton
jlenton at gmail.com
Tue Jul 6 23:33:58 EDT 2004
On Tue, 6 Jul 2004 21:02:32 -0400, Christopher T King <squirrel at wpi.edu> wrote:
> For great readability (at the cost of some speed), I might suggest writing
> the above using a nested function, so your final output looks like this:
>
> from numarray import *
>
> def parseline(line):
> return [float(value) for value in line.split()]
>
> myFile=file('test.dat',mode='rt')
> data=array([parseline(line) for line in myFile])
actually, I find the following more readable, and even faster:
from mmap import mmap, MAP_PRIVATE, PROT_READ
from os import fstat
f = file('test.dat',mode='rt')
fd = f.fileno()
m = mmap(fd, fstat(fd).st_size, MAP_PRIVATE, PROT_READ)
data=[]
while True:
line = m.readline()
if not line: break
data.extend(map(float, line.split()))
of course the speedup is because of mmap, not because of faster python
code; however, remember this is (once you've got rid of the evil eval)
an IO-bound task, so anything you do to speed up that (like the mmap)
is a gain. If mmap returned something you could iterate over, you
could probably shave a second off (I shaved 3 seconds of your example
with this, and your example shaved 11 seconds of the original---on my
machine, with my data, and my wife asking for the computer).
(I'd replace the map with a list comprehension as soon as the function
stopped being C)
I'd talk about numarray.memmap if I knew it were going to be useful,
but as I don't, I won't.
PS: use mmap! it's not the '70s any more!
--
John Lenton (jlenton at gmail.com) -- Random fortune:
bash: fortune: command not found
More information about the Python-list
mailing list