Organizing function calls once files have been moved to a directory

Steven D'Aprano steve at pearwood.info
Tue Jun 23 22:18:28 EDT 2015


On Wed, 24 Jun 2015 06:16 am, kbtyo wrote:

> I am working on a workflow module that will allow one to recursively check
> for file extensions and if there is a match move them to a folder for
> processing (parsing, data wrangling etc).
> 
> I have a simple search process, and log for the files that are present
> (see below). However, I am puzzled by what the most efficient
> method/syntax is to call functions once the selected files have been
> moved? 

The most efficient syntax is the regular syntax that you always use when
calling a file:

    function(arg, another_arg)


What else would you use?


> I have the functions and classes written in another file. Should I 
> import them or should I include them in the same file as the following
> mini-script?

That's entirely up to you. Some factors you might consider:

- Are these functions and classes reusable by other code? then you might
want to keep them separate in another file, treated as a library, and
import the library into your application.

- If you merge the two files together, will it be so big that it is
difficult to work with? Then don't merge them together. My opinion is that
the decimal module from the standard library is about as big as a single
module should every be, and it is almost 6,500 lines. So if your
application is bigger than that, you might want to split it.



> Moreover, should I create another log file for processing? If so, what is
> an idiomatically correct method to do so?

I don't know. Do you want a second log file? How will it be different from
the first?

As for creating another log file, I guess the most correct way to do so
would be the same way you created the first log file.

I'm not sure I actually understand your questions so far.

Some further comments on your code:

> if __name__ == '__main__':
> 
> # The top argument for name in files
>     topdir = '.'
>     dest = 'C:\\Users\\wynsa2\\Desktop\\'

Rather than escaping backslashes, you can use regular forward slashes:

dest = 'C:/Users/wynsa2/Desktop/'


Windows will accept either.


>     extens = ['docs', 'docx', 'pdf'] # the extensions to search for
>     found = {x: [] for x in extens} # lists of found files
>  
>     # Directories to ignore
>     ignore = ['docs', 'doc', 'py', 'pdf']
>     logname = "file_search.log"
>     print('Beginning search for files in %s' % os.path.realpath(topdir))
>   
>     # Walk the tree
>     for dirpath, dirnames, files in os.walk(topdir):
>         # Remove directories in ignore
>         # directory names must match exactly!
>         for idir in ignore:
>             if idir in dirnames:
>                 dirnames.remove(idir)
>      
>         # Loop through the file names for the current step
>         for name in files:
>      #Calling str.rsplit on name then
>     #splits the string into a list (from the right)
>     #with the first argument "."" delimiting it,
>     #and only making as many splits as the second argument (1).
>     #The third part ([-1]) retrieves the last element of the list--we
>     #use this instead of an index of 1 because if no splits are made
>     #(if there is no "."" in name), no IndexError will be raised
> 
>             ext = name.lower().rsplit('.', 1)[-1]

The better way to split the extension from the file name is to use
os.path.splitext(name):


py> import os
py> os.path.splitext("this/file.txt")
('this/file', '.txt')
py> os.path.splitext("this/file")  # no extension
('this/file', '')
py> os.path.splitext("this/file.tar.gz")
('this/file.tar', '.gz')


-- 
Steven




More information about the Python-list mailing list