file_name_fixer.py

Tue Jan 24 11:26:34 EST 2006

thanks for the feedback!

I'll work on your suggestions.

bruno at modulix wrote:
> ewaguespack at gmail.com wrote:
> > i put this together to fix a bunch of files with wierd names, please
> > gimme feedback, i am a newbie
>
> Ok, so let's go... Hope you won't hate me too much !-)
>
> >
> > #!/usr/bin/env python
> > import os
> > import sys
> > import string
> > import platform
> > dir = sys.argv[1]
> This will shadow the builtin 'dir' function.
>
> > noworky = sys.argv[2]
>
> If the user fails to provide args, the program will crash with an
> IndexError - which may not be very helpful.
>
> Also, a better scheme would be
>   myprog [-opt1 [-opt2=val [-optN]]] arg1 arg2 argN
>
> Hint: the optparse module is your friend
> http://www.python.org/doc/2.4.2/lib/module-optparse.html
>
> > if platform.system() == 'Linux':
> >     uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
> > else:
> >     if platform.system() == 'Windows':#this is broken because windows
> > is gay with case
> >         uglychars = ''.join( set(string.punctuation+' ') -
> > set(':\\/_.') )
> >     else:
> >         print "wtf... what platform is this anyway?"
>
> May be MacOS Classic, MacOS X or any *n*x or *BSD variant, or any other
> platform supporting Python - are there are some...
>
> > underscore = '_'
> > underscore = underscore * len(uglychars)
>
> You don't need the intermediate value:
>   underscores = '_' * len(uglychars)
>
> > chars = string.maketrans(uglychars, underscore)
>
> > print "# PHASE I, DIRECTORIES"
>
> Why not processing dirs and files in one pass ?
>
> > for path, subdirs, files in os.walk(dir, topdown=True):
>
> Err... is the 'dir' argument supposed to be an absolute path or a
> relative path ?
>
> And why using the topdown option ?
>
> >     oldname = path
>
> woops ! this may be the absolute path. Are you sure you want to process
> an absolute path ?
>
> I think you'd better process files and dirs in one path, walking bottom
> up (so you don't process any file twice).
>
> >     newname = oldname.translate(chars)
> >     newname = string.lower(newname)
>
> Use the 'str' object methods instead of functions from the string module:
>
>      newname = newname.lower()
>
> You can also chain method/function calls:
>      newname = oldname.translate(chars).lower()
>
>
> >     while string.count(newname, "__") > 0:
>
> in the context of a boolean expression, 0 evaluate to False, non-zero to
> True. So you don't have to be so explicit:
>       while newname.count('__'):
>
> >         newname = string.replace(newname,"__","_")
>
> You don't need to actually *count* the occurrences of  '__' - if there's
> one, that's enough:
>      while '__' in newname:
>          # proceed
>
> Also, a regexp may be more effective here.
>
> >     while string.count(newname, "..") > 0:
> >         newname = string.replace(newname,"..",".")
>
> Don't forget the '..' and '.' special directories in unix filesystems...
>
> >     if oldname != newname:
> >         if os.path.isfile(newname) or os.path.isdir(newname):
>
> And if there's a special file (device etc) ?
> hint : os.path.exists()
>
>
> >             print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
> > exists\n"
>
> stdout is for 'normal' program outputs (ie: outputs that may be used as
> inputs to another program). This kind of output should go to stderr:
>            print >> sys.stdout, "%s --> %s : \t\t\tERROR: "
>                                 "file/dir exists" % (oldname,
>                                                      newname,)
>
> >         else:
> >             print oldname, "-->\n", newname, "\t\t\tYAY: file
> > renamed\n"
> >             if noworky == "doit":
> >                 os.renames(oldname, newname)
>
> How is the user supposed to know that he has to pass the string "doit"
> as a second arg ?
>
> There are some (more or less agreed upon) conventions about cli options.
> Like '--dry-run' to express the fact that the program shouldn't actually
> do more than simulate it's execution.
>
> > print "# PHASE II, FILES"
> > for path, subdirs, files in os.walk(dir, topdown=True):
> >     for oldname in files:
> >         oldname = os.path.join(path, oldname)
>
> Are you *sure* you want to operate on the *whole* *absolute* path ?
> (cf my thoughts about the "first phase" and the whole algorithm)
>
> >         newname = oldname.translate(chars)
> >         newname = string.lower(newname)
>
> Aren't you repeating some code here ?
> hint : all duplicated code should be factored out into a function.
>
> >         newname = string.replace(newname,".mpeg",".mpg")
> >         newname = string.replace(newname,".ram",".rm")
> >         newname = string.replace(newname,".jpeg",".jpg")
> >         newname = string.replace(newname,".qt",".mov")
>
> # outside the loop, define a dict like:
> ext_map = {'mpeg': 'mpg',
>            'ram' : 'rm',
>            'jpeg' : 'jpg',
>            # etc
>           }
>
> # then in the loop:
>           base, ext = os.path.split(newname)
>           if ext in ext_map:
>              newname = "%s.%s" % (base, ext_map[ext])
>
>
> >         while string.count(newname, "__") > 0:
> >             newname = string.replace(newname,"__","_")
>
> duplicated code...
>
> >         while string.count(newname, "..") > 0:
> >             newname = string.replace(newname,"..",".")
> >         newname = string.replace(newname,"._","_")
> >         newname = string.replace(newname,"_.",".")
>
> all this is a perfect usecase for regexps...
>
> >         if oldname != newname:
> >             if os.path.isfile(newname) or os.path.isdir(newname):
> >                 print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
> > exists\n"
> >             else:
> >                 print oldname, "-->\n", newname, "\t\t\tYAY: file
> > renamed\n"
> >                 if noworky == "doit":
> >                     os.renames(oldname, newname)
>
> duplicated code.
>
>
> Still alive ?-)
>
> I thing the first step would be to correct your algorithm, then use some
> more efficient or idiomatic constructs where possible (like using dicts
> instead of repeated tests, sending messages to stderr etc).
>
> Then here are some more hints:
>
> - put the processing algorithm in a dedicated function, if possible one
> that doesn't rely on any global variable,
> - factor out any duplicated code into helper functions
>
>  - then add a 'main' function that take cares of options and call the
> 'processing' function if everything's ok. The main function should
> return 0 if ok, non-zero if errors (usually 2 for invalid cli options or
> args, 1 for other errors)
>
> - and finally add this at the end of your module:
>
> if __name__ == '__main__':
>   sys.exit(main(sys.argv))
>
>
> This will allow your script to be used either as a program or as a module.
>
>
> --
> bruno desthuilliers
> python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
> p in 'onurb at xiludom.gro'.split('@')])"