Basic Python V3 Search Tool using RE module

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Mar 26 19:01:38 EDT 2015


On Fri, 27 Mar 2015 04:11 am, Gregg Dotoli wrote:

>> Thanks for your help and patience. I'm new with Python.

No problems! If you hang around here, pay attention to the constructive
criticism you are given, and ignore the troll over on the "test1" thread,
you'll learn a lot.

Let's look at your code:


>> import os
>> import re
>> # From the Root
>> topdir = "."

Typically, "." is not considered the root, so that comment may be a bit
misleading. On Linux systems, "/" is the root. On Windows, each drive has
its own root, e.g. "C:/". You should consider a more descriptive comment.


>> # Regex Pattern
>> pattern="DECRYPT_I"
>> regexp=re.compile(pattern)

As given, using a regex to search for a fixed substring is rather like
firing up a nuclear-powered bulldozer to crack open a peanut. I will assume
that later you will add more complicated regexes with wildcards. If not,
you are literally wasting time here: substring matching with the "in"
operator will be significantly faster than matching using a regex.


>> for dirpath,dirnames, files in os.walk(topdir):
>>     for name in files:
>>             result=regexp.search(name)
>>             print(os.path.join(dirpath,name))
>>             print (result)


All this does is check with the string "DECRYPT_I" is in the file name.


> I posted this because I thought it may be of help to others. This does
> grep through all the files 

It absolutely does not.


> and is very fast because the regex is compiled 
> in Python , rather than sitting in some directory as an external command.
> That is where the optimization comes in.

Please take this with the intention I give it: constructive advice.

    "More computing sins are committed in the name of efficiency 
    (without necessarily achieving it) than for any other single
    reason — including blind stupidity." — W.A. Wulf

This is a great example. You have been too focused on optimizing your code
and not focused enough on getting it to actually work correctly. It's fast,
*not* because "the regex is compiled in Python", but because it doesn't do
the work you think it does.

If you had tested this code, by creating a file called "FOUND IT" containing
the string "xxxxxDECRYPT_Ixxxxx" (for example), you would have discovered
for yourself that your search tool does not in fact search correctly.

Write your code first. Get it working. Make sure it is working. Then, and
only then, should you try to optimize it. Test your code: if you haven't
tested it, you don't know if it works or not.

Test means, does it work the way it needs to work with files containing the
string *as well as* files not containing the string? It's trivial to check
that the program doesn't find DECRYPT files when there are no DECRYPT
files, but if it fails to find them when they are actually there, that's a
pretty big bug.


And one more quote:

    "The First Rule of Program Optimization: Don't do it. 
    The Second Rule of Program Optimization (for experts only!): 
    Don't do it yet." — Michael A. Jackson

(No, not Michael Jackson the dead pop singer.)



-- 
Steven




More information about the Python-list mailing list