[Python-Dev] Re: Discordance in documentation...

Terry Reedy tjreedy at udel.edu
Thu Sep 4 21:43:31 EDT 2003


"gminick" <gminick at hacker.pl> wrote in message
news:20030904200448.GA1164 at hannibal...
> ...or is this just me?
>
> Let's take a look, Reference Lib, 4.2.1 Regular Expression Syntax
says:

It is the Library Reference, not Ref Lib.

>    "|"
>            A|B, where A and B can be arbitrary REs, creates a
regular
>            expression that will match either A or B.
>            [...]
>            REs separated by "|" are tried from left to right, and
the
>            first one that allows the complete pattern to match is
considered
>            the accepted branch. This means that if A matches, B will
never
>            be tested, even if it would produce a longer overall
match. [...]

I think the following version of the last four lines is correct and
clearer.

As the target string is scanned, REs separated by "|" are tried from
left to right.  When one pattern completely matches, that branch is
accepted.  This means that once A matches, B will not be tested
further, even if it would produce a longer overall match.

> And now a little test:
>
> import re
> a = "Fuentes Rushdie Marquez"
> print re.search("Rushdie|Fuentes", a).group() # returns "Fuentes"
>
> According to the documentation I suspected it will return "Rushdie"
> rather than "Fuentes", but it looks like it returns first part of
the
> string that matches rather than first part of regular expression.

As I hope my rewrite makes clearer,  consideration of alternatives is
nested within scanning of the source, and not vice verse as you
inferred from the current doc.

Doc bugs like this can (and should be) reported on SourceForge just
like program bugs.
http://sourceforge.net/tracker/?group_id=5470
That way, the report stays on the open list until someone decides
either to make a fix or that no fix is needed.

Terry J. Reedy






More information about the Python-Dev mailing list