[Numpy-discussion] seeking advice on a fast string->array conversion
Christopher Barker
Chris.Barker at noaa.gov
Tue Nov 16 13:44:10 EST 2010
On 11/16/10 10:01 AM, Christopher Barker wrote:
> OK -- I'll whip up a test similar to yours -- stay tuned!
Here's what I've done:
import numpy as np
from maproomlib.utility import file_scanner
def gen_file():
f = file('test.dat', 'w')
for i in range(1200):
f.write('1 ' * 2048)
f.write('\n')
f.close()
def read_file1():
""" read unknown length: doubles"""
f = file('test.dat')
arr = file_scanner.FileScan(f)
f.close()
return arr
def read_file2():
""" read known length: doubles"""
f = file('test.dat')
arr = file_scanner.FileScanN(f, 1200*2048)
f.close()
return arr
def read_file3():
""" read known length: singles"""
f = file('test.dat')
arr = file_scanner.FileScanN_single(f, 1200*2048)
f.close()
return arr
def read_fromfile1():
""" read unknown length with fromfile(): singles"""
f = file('test.dat')
arr = np.fromfile(f, dtype=np.float32, sep=' ')
f.close()
return arr
def read_fromfile2():
""" read unknown length with fromfile(): doubles"""
f = file('test.dat')
arr = np.fromfile(f, dtype=np.float64, sep=' ')
f.close()
return arr
def read_fromstring1():
""" read unknown length with fromstring(): singles"""
f = file('test.dat')
str = f.read()
arr = np.fromstring(str, dtype=np.float32, sep=' ')
f.close()
return arr
And the results (ipython's timeit):
In [40]: timeit test.read_fromfile1()
1 loops, best of 3: 561 ms per loop
In [41]: timeit test.read_fromfile2()
1 loops, best of 3: 570 ms per loop
In [42]: timeit test.read_file1()
1 loops, best of 3: 336 ms per loop
In [43]: timeit test.read_file2()
1 loops, best of 3: 341 ms per loop
In [44]: timeit test.read_file3()
1 loops, best of 3: 515 ms per loop
In [46]: timeit test.read_fromstring1()
1 loops, best of 3: 301 ms per loop
So my filescanner is faster, but not radically so, than fromfile().
However, reading the whole file into a string, then using fromstring()
is, in fact, tne fastest method -- interesting -- shows you why you need
to profile!
Also, with my code, reading singles is slower than doubles -- odd.
Perhaps the C lib fscanf read doubles anyway, then converts to singles?
Anyway, for my needs, my file_scanner and fromfile() are fast enough,
and much faster than parsing the files with Python. My issue with
fromfile is flexibility and robustness -- it's buggy in the face of
ill-formed files. See the list archives and the bug reports for more detail.
Still, it seems your very basic method is indeed a faster way to go.
I've enclosed the files. It's currently built as part of a larger lib,
so no setup.py -- though it could be written easily enough.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: file_scan_module.c
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101116/b40e5c38/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_simple_large.py
Type: application/x-python
Size: 1354 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101116/b40e5c38/attachment.bin>
More information about the NumPy-Discussion
mailing list