best way to read a huge ascii file.

BartC bc at freeuk.com
Fri Nov 25 11:39:12 EST 2016


On 25/11/2016 15:17, Heli wrote:

> I have a huge ascii file(40G) and I have around 100M lines.  I read this file using :
>
> f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)
>
> x=f1[:,1]
> y=f1[:,2]
> z=f1[:,3]
> id=f1[:,0]
>
> I will need the x,y,z and id arrays later for interpolations. The problem is reading the file takes around 80 min while the interpolation only takes 15 mins.
>
> I was wondering if there is a more optimized way to read the file that would  reduce the time to read the input file?
>
> I have the same problem when writing the output using np.savetxt.

Is that read entirely into RAM? I suppose lines are discarded once they 
are read otherwise it would have to load 40GB before it can do anything.

How much of your RAM is used up during the operation? If that starts to 
get full then it can get very slow. (Same with a fragmented hard drive.)

Where does the file come from (with savetxt?); could it be generated 
more compactly?

I don't quite understand what f1[:,1] does, but if that's a slice, that 
usually involves copying (and extra memory). But presumably you've 
already measured the 80 minutes just to do the np.loadtxt part. (I guess 
the processing won't allow the file to be split into smaller pieces.)


-- 
Bartc




More information about the Python-list mailing list