file_name_fixer.py

bruno at modulix onurb at xiludom.gro
Tue Jan 24 08:50:10 EST 2006


ewaguespack at gmail.com wrote:
> i put this together to fix a bunch of files with wierd names, please
> gimme feedback, i am a newbie

Ok, so let's go... Hope you won't hate me too much !-)

> 
> #!/usr/bin/env python
> import os
> import sys
> import string
> import platform
> dir = sys.argv[1]
This will shadow the builtin 'dir' function.

> noworky = sys.argv[2]

If the user fails to provide args, the program will crash with an
IndexError - which may not be very helpful.

Also, a better scheme would be
  myprog [-opt1 [-opt2=val [-optN]]] arg1 arg2 argN

Hint: the optparse module is your friend
http://www.python.org/doc/2.4.2/lib/module-optparse.html

> if platform.system() == 'Linux':
>     uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
> else:
>     if platform.system() == 'Windows':#this is broken because windows
> is gay with case
>         uglychars = ''.join( set(string.punctuation+' ') -
> set(':\\/_.') )
>     else:
>         print "wtf... what platform is this anyway?"

May be MacOS Classic, MacOS X or any *n*x or *BSD variant, or any other
platform supporting Python - are there are some...

> underscore = '_'
> underscore = underscore * len(uglychars)

You don't need the intermediate value:
  underscores = '_' * len(uglychars)

> chars = string.maketrans(uglychars, underscore)

> print "# PHASE I, DIRECTORIES"

Why not processing dirs and files in one pass ?

> for path, subdirs, files in os.walk(dir, topdown=True):

Err... is the 'dir' argument supposed to be an absolute path or a
relative path ?

And why using the topdown option ?

>     oldname = path

woops ! this may be the absolute path. Are you sure you want to process
an absolute path ?

I think you'd better process files and dirs in one path, walking bottom
up (so you don't process any file twice).

>     newname = oldname.translate(chars)
>     newname = string.lower(newname)

Use the 'str' object methods instead of functions from the string module:

     newname = newname.lower()

You can also chain method/function calls:
     newname = oldname.translate(chars).lower()


>     while string.count(newname, "__") > 0:

in the context of a boolean expression, 0 evaluate to False, non-zero to
True. So you don't have to be so explicit:
      while newname.count('__'):

>         newname = string.replace(newname,"__","_")

You don't need to actually *count* the occurrences of  '__' - if there's
one, that's enough:
     while '__' in newname:
         # proceed

Also, a regexp may be more effective here.

>     while string.count(newname, "..") > 0:
>         newname = string.replace(newname,"..",".")

Don't forget the '..' and '.' special directories in unix filesystems...

>     if oldname != newname:
>         if os.path.isfile(newname) or os.path.isdir(newname):

And if there's a special file (device etc) ?
hint : os.path.exists()


>             print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
> exists\n"

stdout is for 'normal' program outputs (ie: outputs that may be used as
inputs to another program). This kind of output should go to stderr:
           print >> sys.stdout, "%s --> %s : \t\t\tERROR: "
                                "file/dir exists" % (oldname,
                                                     newname,)

>         else:
>             print oldname, "-->\n", newname, "\t\t\tYAY: file
> renamed\n"
>             if noworky == "doit":
>                 os.renames(oldname, newname)

How is the user supposed to know that he has to pass the string "doit"
as a second arg ?

There are some (more or less agreed upon) conventions about cli options.
Like '--dry-run' to express the fact that the program shouldn't actually
do more than simulate it's execution.

> print "# PHASE II, FILES"
> for path, subdirs, files in os.walk(dir, topdown=True):
>     for oldname in files:
>         oldname = os.path.join(path, oldname)

Are you *sure* you want to operate on the *whole* *absolute* path ?
(cf my thoughts about the "first phase" and the whole algorithm)

>         newname = oldname.translate(chars)
>         newname = string.lower(newname)

Aren't you repeating some code here ?
hint : all duplicated code should be factored out into a function.

>         newname = string.replace(newname,".mpeg",".mpg")
>         newname = string.replace(newname,".ram",".rm")
>         newname = string.replace(newname,".jpeg",".jpg")
>         newname = string.replace(newname,".qt",".mov")

# outside the loop, define a dict like:
ext_map = {'mpeg': 'mpg',
           'ram' : 'rm',
           'jpeg' : 'jpg',
           # etc
          }

# then in the loop:
          base, ext = os.path.split(newname)
          if ext in ext_map:
             newname = "%s.%s" % (base, ext_map[ext])


>         while string.count(newname, "__") > 0:
>             newname = string.replace(newname,"__","_")

duplicated code...

>         while string.count(newname, "..") > 0:
>             newname = string.replace(newname,"..",".")
>         newname = string.replace(newname,"._","_")
>         newname = string.replace(newname,"_.",".")

all this is a perfect usecase for regexps...

>         if oldname != newname:
>             if os.path.isfile(newname) or os.path.isdir(newname):
>                 print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
> exists\n"
>             else:
>                 print oldname, "-->\n", newname, "\t\t\tYAY: file
> renamed\n"
>                 if noworky == "doit":
>                     os.renames(oldname, newname)

duplicated code.


Still alive ?-)

I thing the first step would be to correct your algorithm, then use some
more efficient or idiomatic constructs where possible (like using dicts
instead of repeated tests, sending messages to stderr etc).

Then here are some more hints:

- put the processing algorithm in a dedicated function, if possible one
that doesn't rely on any global variable,
- factor out any duplicated code into helper functions

 - then add a 'main' function that take cares of options and call the
'processing' function if everything's ok. The main function should
return 0 if ok, non-zero if errors (usually 2 for invalid cli options or
args, 1 for other errors)

- and finally add this at the end of your module:

if __name__ == '__main__':
  sys.exit(main(sys.argv))


This will allow your script to be used either as a program or as a module.


-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb at xiludom.gro'.split('@')])"



More information about the Python-list mailing list