[Numpy-discussion] Fastest way to parsing a specific binay file

Robert Kern robert.kern at gmail.com
Wed Sep 2 13:46:16 EDT 2009


On Wed, Sep 2, 2009 at 12:33, Gökhan Sever<gokhansever at gmail.com> wrote:
> How your find suggestion work? It just returns the location of the first
> occurrence.

http://docs.python.org/library/stdtypes.html#str.find

str.find(sub[, start[, end]])
    Return the lowest index in the string where substring sub is
found, such that sub is contained in the range [start, end]. Optional
arguments start and end are interpreted as in slice notation. Return
-1 if sub is not found.

But perhaps you should profile your code to see where it is actually
taking up the time. Regexes on 1.3 MB of data should be quite fast.

In [21]: marker = '\x00\x00\@\x00$\x00\x02'

In [22]: block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4)

In [23]: data = int(round(1.3 * 1024)) * block

In [24]: import re

In [25]: r = re.compile(re.escape(marker))

In [26]: %time r.findall(data)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list