[Numpy-discussion] seeking advice on a fast string->array conversion

Darren Dale dsdale24 at gmail.com
Tue Nov 16 09:20:29 EST 2010


I am wrapping up a small package to parse a particular ascii-encoded
file format generated by a program we use heavily here at the lab. (In
the unlikely event that you work at a synchrotron, and use Certified
Scientific's "spec" program, and are actually interested, the code is
currently available at
https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/
.)

I have been benchmarking the project against another python package
developed by a colleague, which is an extension module written in pure
C. My python/cython project takes about twice as long to parse and
index a file (~0.8 seconds for 100MB), which is acceptable. However,
actually converting ascii strings to numpy arrays, which is done using
numpy.fromstring,  takes a factor of 10 longer than the extension
module. So I am wondering about the performance of np.fromstring:

import time
import numpy as np
s = b'1 ' * 2048 *1200
d = time.time()
x = np.fromstring(s)
print time.time() - d



More information about the NumPy-Discussion mailing list