[SciPy-User] numpy I/O question

Matwey V. Kornilov matwey.kornilov at gmail.com
Sun Jan 2 11:09:37 EST 2011


These files are pipe-streams but when they are dumped they are about 50M.

Replacement that you described requires O(N) (where N is line length) but 
C++ operator>> requires O(1) for the same parsing.

I hoped there were a way to split data for numpy by regexp instead of 
delimiter.

i.e.

np.genfromtxt(StringIO(data), regexp=r"-?[\d\.]+")

instead of

np.genfromtxt(StringIO(data), delimiter=None)


Yury V. Zaytsev wrote:

> On Sun, 2011-01-02 at 18:51 +0300, Matwey V. Kornilov wrote:
>> 
>> I will be asked 'why should we use python which even can't parse as good
>> as c++ does?' `sed` isn't a solution.
> 
> How big are these files in question?
> 
> Why can't you just load them in memory and do the replacement before
> feeding them into NumPy if you don't want to pre-process files
> beforehand? This is just 2-3 lines of code.
>  





More information about the SciPy-User mailing list