Searching through more than one file.

Cem Karan cfkaran2 at gmail.com
Mon Dec 29 08:35:33 EST 2014


On Dec 29, 2014, at 2:47 AM, Rick Johnson <rantingrickjohnson at gmail.com> wrote:

> On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote:
>> I need to search through a directory of text files for a string.
>> Here is a short program I made in the past to search through a single
>> text file for a line of text.
> 
> Step1: Search through a single file. 
> # Just a few more brush strokes...
> 
> Step2: Search through all files in a directory. 
> # Time to go exploring! 
> 
> Step3: Option to filter by file extension. 
> # Waste not, want not!
> 
> Step4: Option for recursing down sub-directories. 
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> [Opps, fell into a recursive black hole!]
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> # Look out deeply nested structures, here i come!
> [BREAK]
> # Whew, no worries, MaximumRecursionError is my best friend! 
> 
> ;-)
> 
> In addition to the other advice, you might want to check out os.walk()

DEFINITELY use os.walk() if you're going to recurse through a directory tree.  Here is an untested program I wrote that should do what you want.  Modify as needed:

"""
# This is all Python 3 code, although I believe it will run under Python 2
# as well.  

# os.path is documented at https://docs.python.org/3/library/os.path.html
# os.walk is documented at https://docs.python.org/3/library/os.html#os.walk
# losging is documented at https://docs.python.org/3/library/logging.html

import os
import os.path
import logging

# Logging messages can be filtered by level.  If you set the level really
# low, then low-level messages, and all higher-level messages, will be
# logged.  However, if you set the filtering level higher, then low-level
# messages will not be logged.  Debug messages are lower than info messages,
# so if you comment out the first line, and uncomment the second, you will
# only get info messages (right now you're getting both).  If you look
# through the code, you'll see that I go up in levels as I work my way 
# inward through the filters; this makes debugging really, really easy.
# I'll start out with my level high, and if my code works, I'm done. 
# However, if there is a bug, I'll work my downwards towards lower and
# lower debug levels, which gives me more and more information.  Eventually
# I'll hit a level where I know enough about what is going on that I can 
# fix the problem.  By the way, if you comment out both lines, you shouldn't
# have any logging at all.
logging.basicConfig(level=logging.DEBUG)
##logging.basicConfig(level=logging.INFO)

EXTENSIONS = {".txt"}

def do_something_useful(real_path):
    # I deleted the original message, so I have no idea 
    # what you were trying to accomplish, so I'm punting 
    # the definition of this function back to you.
    pass

for root, dirs, files in os.walk('/'):
    for f in files:
        # This expands symbolic links, cleans up double slashes, etc.
        # This can be useful when you're trying to debug why something
        # isn't working via logging.
        real_path = os.path.realpath(os.path.join(root, f))
        logging.debug("operating on path '{0!s}'".format(real_path))
        (r, e) = os.path.splitext(real_path)
        if e in EXTENSIONS:
            # If we've made a mistake in our EXTENSIONS set, we might never
            # reach this point.  
            logging.info("Selected path '{0!s}'".format(real_path))
            do_something_useful(real_path)
"""

As a note, for the sake of speed and your own sanity, you probably want to do the easiest/computationally cheapest filtering first here.  That means selecting the files that match your extensions first, and then filtering those files by their contents second.

Finally, if you are planning on parsing command-line options, DON'T do it by hand!  Use argparse (https://docs.python.org/3/library/argparse.html) instead.

Thanks,
Cem Karan




More information about the Python-list mailing list