Script Optimization

Mon May 5 01:04:53 EDT 2008

En Sun, 04 May 2008 17:01:15 -0300, lev <levlozhkin at gmail.com> escribió:

>> * Change indentation from 8 spaces to 4
>     I like using tabs because of the text editor I use, the script at
> the end is with 4 though.

Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?

>> * Remove useless "pass" and "return" lines
>     I replaced the return nothing lines with passes, but I like
> keeping them in case the indentation is ever lost - makes it easy to
> go back to original indentation

I can't think of a case when only indentation "is lost" - if you have a crash or something, normally you lose much more than indentation... Simple backups or a SCM system like cvs/svn will help. So I don't see the usefulness of those "pass" statements; I think that after some time using Python you'll consider them just garbage, as everyone else.

>> * Temporarily change broken "chdir" line
>     removed as many instances of chdir as possible (a few useless ones
> to accomodate the functions - changed functions to not chdir as much),
> that line seems to work... I made it in case the script is launched
> with say: 'python somedir\someotherdir\script.py' rather than 'python
> script.py', because I need it to work in it's own and parent
> directory.

You can determine the directory where the script resides using

import os
basedir = os.path.dirname(os.path.abspath(__file__))

This way it doesn't matter how it was launched. But execute the above code as soon as possible (before any chdir)

>     checksums = open(checksums, 'r')
>     for fline in checksums.readlines():

You can directly iterate over the file:

     for fline in checksums:

(readlines() reads the whole file contents in memory; I guess this is not an issue here, but in other cases it may be an important difference)
Although it's perfectly valid, I would not reccomend using the same name for two different things (checksums refers to the file name *and* the file itself)

>     changed_files_keys = changed_files.keys()
>     changed_files_keys.sort()
>     missing_files.sort()
>     print '\n'
>     if len(changed_files) != 0:
>         print 'File(s) changed:'
>         for key in changed_files_keys:

You don't have to copy the keys and sort; use the sorted() builtin:

     for key in sorted(changed_files.iterkeys()):

Also, "if len(changed_files) != 0" is usually written as:

     if changed_files:

The same for missing_files.

>         for x in range(len(missing_files)):
>             print '\t', missing_files[x]

That construct range(len(somelist)) is very rarely used. Either you don't need the index, and write:

for missing_file in missing_files:
     print '\t', missing_file

Or you want the index too, and write:

for i, missing_file in enumerate(missing_files):
     print '%2d: %s' % (i, missing_file)

> def calculate_checksum(file_name):
>     file_to_check = open(file_name, 'rb')
>     chunk = 8196

Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't remember the value. I usually write 1024*1024 when I want exactly 1M.

-- 
Gabriel Genellina