[Numpy-discussion] loadtxt slow

Brent Pedersen bpederse at gmail.com
Sun Mar 1 19:51:00 EST 2009


On Sun, Mar 1, 2009 at 11:29 AM, Michael Gilbert
<michael.s.gilbert at gmail.com> wrote:
> On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:
>
>> So I have some data sets of about 160000 floating point numbers stored
>> in text files.  I find that loadtxt is rather slow.  Is this to be
>> expected?  Would it be faster if it were loading binary data?
>
> i have run into this as well.  loadtxt uses a python list to allocate
> memory for the data it reads in, so once you get to about 1/4th of your
> available memory, it will start allocating the updated list (every
> time it reads a new value from your data file) in swap instead of main
> memory, which is rediculously slow (in fact it causes my system to be
> quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be
> smarter about allocating memory, but it is slower overall and doesn't
> support all of the original arguments/options (yet).  i have some
> ideas to make it smarter/more efficient, but have not had the time
> to work on it recently.
>
> i will send the current version to the list tomorrow when i have access
> to the system that it is on.
>
> best wishes,
> mike
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>

to address the slowness, i use wrappers around savetxt/loadtxt that
save/load a .npy file
along with/instead of the .txt file. -- and the loadtxt wrapper checks
if the .npy is up-to-date.
code here:

http://rafb.net/p/dGBJjg80.html

of course it's still slow the first time. i look forward to your speedups.
-brentp



More information about the NumPy-Discussion mailing list