Regular expression to match a #

Devan L devlai at gmail.com
Thu Aug 11 18:27:32 EDT 2005


John Machin wrote:
> Aahz wrote:
> > In article <42fb45d7$1 at news.eftel.com>,
> > John Machin  <sjmachin at lexicon.net> wrote:
> >
> >>Search for r'^something' can never be better/faster than match for
> >>r'something', and with a dopey implementation of search [which Python's
> >>re is NOT] it could be much worse. So please don't tell newbies to
> >>search for r'^something'.
> >
> >
> > You're somehow getting mixed up in thinking that "^" is some kind of
> > "not" operator -- it's the start of line anchor in this context.
>
> I can't imagine where you got that idea from.
>
> If I change "[which Python's re is NOT]" to "[Python's re's search() is
> not dopey]", does that help you?
>
> The point was made in a context where the OP appeared to be reading a
> line at a time and parsing it, and re.compile(r'something').match()
> would do the job; re.compile(r'^something').search() will do the job too
> -- BECAUSE ^ means start of line anchor -- but somewhat redundantly, and
> very inefficiently in the failing case with dopey implementations of
> search() (which apply match() at offsets 0, 1, 2, .....).

I don't see much difference.
Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.

    ****************************************************************
    Personal firewall software may warn about the connection IDLE
    makes to its subprocess using this computer's internal loopback
    interface.  This connection is not visible on any external
    interface and no data is sent to or received from the Internet.
    ****************************************************************

IDLE 1.1.1
>>> import timeit
>>> t1 = timeit.Timer('re.search("^\w"," will not work")','import re')
>>> t1.timeit()
34.938577109660628
>>> t2 = timeit.Timer('re.match("\w"," will not work")','import re')
>>> t2.timeit()
31.381461330979164
>>> 3.0/1000000
3.0000000000000001e-006
>>> t1.timeit()
35.282282524734228
>>> t2.timeit()
31.403153752781463

~4 second difference after a million times through seems to be trivial.
Then again, I haven't tested it for larger patterns and strings.




More information about the Python-list mailing list