[Python-Dev] Missing arguments in RE functions

Noam Raphael noamr at myrealbox.com
Fri Sep 10 01:03:05 CEST 2004


I've read the objections. I understand being careful about extending an 
API, but I still think that there are things to improve, even when being 
conservative about the API.

I think that the straightforward functions should be taken seriously. 
The reason is that although you can write 
re.compile(pattern).match(...), re.match(pattern, ...) is shorter and 
just as clear - I think of the fact that REs are first compiled and then 
applied as an implementation issue, which lets you save time when 
applying the same RE many times. The documentation is with me - let me 
quote:

=====================
The sequence

prog = re.compile(pat)
result = prog.match(str)

is equivalent to

result = re.match(pat, str)

but the version using compile() is more efficient when the expression 
will be used several times in a single program.

=====================
findall(string)
    Identical to the findall() function, using the compiled pattern.
=====================

Not only the straightforward functions are not being regarded as being 
"only there for trivial cases", the methods of the compiled RE are 
regarded as sometimes-more-efficient versions of the straightforward 
functions. This is why I didn't even know, until I made my research 
before sending my message to python-dev, that you could match from a 
given start position - I studied the page documenting the functions, 
because I didn't want on an early stage to bother my students with the 
fact that REs are first compiled and then applied, and I didn't find any 
mention of the start position option.

So, as I see it, there are two options.

The first one is to decide that the functions are a ligitimate way of 
using REs in python, and add the optional parameters that I added in my 
patch. In this way, anything you can do with the compiled pattern you 
could do using the functions. (I'm not that big expert in REs, but I 
checked through the documentation and didn't find any functionality that 
was missing from the functions, after adding these parameters.)

The second option is to decide that the functions are only a shortcut, 
meant for use in trivial cases. In that case, two things should be done, 
IMHO: The main thing is to update the documentation, to make that clear. 
It means at least adding a prominent note in the "module contents" page, 
stating something like "these functions are here only as shortcuts; to 
access the full functionality, use compiled patterns". I think that in 
this case, the documentation should be further updated, by changing all 
the function explanations to something like "equivalent to 
re.compile(pattern, flags).match(string)", instead of the detailed 
explanations now given. The second thing that should be done even if the 
functions are considered shortcuts, is to add the "flags" parameter to 
the findall() and finditer() functions - I really can't see any reason 
why the search() and match() functions should have that parameter and 
findall() and finditer() shouldn't - they all get two arguments, pattern 
and string. Why should the optional parameter be available only for the 
older functions?

And a final note: the parameters for start and end positions are already 
available in the findall() and finditer() methods. Should this be left 
an undocumented feature? It seems to me perfectly legitimate to search 
for all the matches of a specific RE in a substring without actually 
copying all the characters of the substring to another string.

Noam

(P.S. Can you please add me to the CC of your replies? It would make it 
easier for me to reply, since I'm not a member of python-dev.)


More information about the Python-Dev mailing list