re module non-greedy matches broken

lothar lothar at ultimathule.nul
Mon Apr 4 12:20:05 EDT 2005


how then, do i specify a non-greedy regex
  <1st-pat><not-1st-pat>*?<follow-pat>

that is, such that non-greedy part <not-1st-pat>*?
excludes a match of <1st-pat>

in other words, how do i write regexes for my examples?

what book or books on regexes or with a good section on regexes would you
recommend?
Hopcroft and Ullman?


"André Malo" <auch-ich-m at g-kein-spam.com> wrote in message
news:d2qnf5$a1b$1 at news.web.de...
> * "lothar" <lothar at ultimathule.nul> wrote:
>
> > this response is nothing but a description of the behavior i reported.
>
> Then you have not read my response carefully enough.
>
> > as to whether this behaviour was intended, one would have to ask the
module
> > writer about that.
>
> No, I've responded with a view on regexes, not on the module. That is the
way
> _regexes_ work. Non-greedy regexes do not match the minimal-length at all,
they
> are just ... non-greedy (technically the backtracking just stacks the
longest
> instead of the shortest). They *may* match the shortest match, but it's a
> special case. Therefore I've stated that the documentation is incomplete.
>
> Actually your expectations go a bit beyond the documentation. From a
certain
> point of view (matches always start most left) the matches you're seeing
> *are* the minimal-length matches.
>
> > because of the statement in the documentation, which places no
qualification
>
^^^^^^^^^^^^^^^^
>                                                               that's the
point.
>
> > on how the scan for the shortest possible match is to be done, my guess
is
> > that this problem was overlooked.
>
> In the docs, yes. But buy yourself a regex book and learn for yourself ;-)
> The first thing you should learn about regexes is that the source of pain
> of most regex implementations is the documentation, which is very likely
> to be wrong.
>
> Finally let me ask a question:
>
> import re
> x = re.compile('<.*?>')
> print x.search('<title>...</title><body>...</body>').group(0)
>
> What would you expect to be printed out? <title> or <body>? Why?
>
> nd







More information about the Python-list mailing list