[Numpy-discussion] Fastest way to parsing a specific binay file
Robert Kern
robert.kern at gmail.com
Wed Sep 2 13:46:16 EDT 2009
On Wed, Sep 2, 2009 at 12:33, Gökhan Sever<gokhansever at gmail.com> wrote:
> How your find suggestion work? It just returns the location of the first
> occurrence.
http://docs.python.org/library/stdtypes.html#str.find
str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is
found, such that sub is contained in the range [start, end]. Optional
arguments start and end are interpreted as in slice notation. Return
-1 if sub is not found.
But perhaps you should profile your code to see where it is actually
taking up the time. Regexes on 1.3 MB of data should be quite fast.
In [21]: marker = '\x00\x00\@\x00$\x00\x02'
In [22]: block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4)
In [23]: data = int(round(1.3 * 1024)) * block
In [24]: import re
In [25]: r = re.compile(re.escape(marker))
In [26]: %time r.findall(data)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list