[Tutor] improvements on a renaming script

Mon Mar 10 03:50:02 CET 2014

On 3/9/2014 3:22 PM, street.sweeper at mailworks.org wrote:
> Hello all,
>
> A bit of background, I had some slides scanned and a 3-character
> slice of the file name indicates what roll of film it was.
> This is recorded in a tab-separated file called fileNames.tab.
> Its content looks something like:
>
> p01     200511_autumn_leaves
> p02     200603_apple_plum_cherry_blossoms
>
> The original file names looked like:
>
> 1p01_abc_0001.jpg
> 1p02_abc_0005.jpg
>
> The renamed files are:
>
> 200511_autumn_leaves_-_001.jpeg
> 200603_apple_plum_cherry_blossoms_-_005.jpeg
>
> The script below works and has done what I wanted, but I have a
> few questions:
>
> - In the get_long_names() function, the for/if thing is reading
> the whole fileNames.tab file every time, isn't it?  In reality,
> the file was only a few dozen lines long, so I suppose it doesn't
> matter, but is there a better way to do this?
The "usual" way is to create a dictionary with row[0] contents as keys 
and row[1] contents as values. Do this once per run. Then lookup each 
glnAbbrev in the dictionary and return the corresponding value.
> - Really, I wanted to create a new sequence number at the end of
> each file name, but I thought this would be difficult.  In order
> for it to count from 01 to whatever the last file is per set p01,
> p02, etc, it would have to be aware of the set name and how many
> files are in it.  So I settled for getting the last 3 digits of
> the original file name using splitext().  The strings were unique,
> so it worked out.  However, I can see this being useful in other
> places, so I was wondering if there is a good way to do this.
> Is there a term or phrase I can search on?
I'm  sorry but I don't fully understand that paragraph. And why would 
you need to know the number of files?
> - I'd be interested to read any other comments on the code.
> I'm new to python and I have only a bit of computer science study,
> quite some time ago.
Beware using tabs as indents. As rendered by Thunderbird they appear as 
8 spaces which is IMHO overkill.
It is much better to use spaces. Most Python IDEs have an option to 
convert tabs to spaces.

The Python recommendation is 4; I use 2.
> #!/usr/bin/env python3
>
> import os
> import csv
>
> # get longnames from fileNames.tab
> def get_long_name(glnAbbrev):
> 	with open(
> 		  os.path.join(os.path.expanduser('~'),'temp2','fileNames.tab')
> 		  ) as filenames:
> 		filenamesdata = csv.reader(filenames, delimiter='\t')
> 		for row in filenamesdata:
> 			if row[0] == glnAbbrev:
> 				return row[1]
>
> # find shortname from slice in picture filename
> def get_slice(fn):
> 	threeColSlice = fn[1:4]
> 	return threeColSlice
Writing a function to get a slice seems overkill also. Just slice in place.
> # get 3-digit sequence number from basename
> def get_bn_seq(fn):
> 	seq = os.path.splitext(fn)[0][-3:]
> 	return seq
>
> # directory locations
> indir = os.path.join(os.path.expanduser('~'),'temp4')
> outdir = os.path.join(os.path.expanduser('~'),'temp5')
>
> # rename
> for f in os.listdir(indir):
> 	if f.endswith(".jpg"):
> 		os.rename(
> 			os.path.join(indir,f),os.path.join(
> 				outdir,
> 				get_long_name(get_slice(f))+"_-_"+get_bn_seq(f)+".jpeg")
> 				)
>
> exit()
>
HTH - remember to reply-all so a copy goes to the list, place your 
comments in-line as I did, and delete irrelevant text.