dummy needs help with Python

Tim Chase python.list at tim.thechases.com
Sat Dec 27 10:28:10 EST 2008


> I am trying to find somebody who can give me a simple python 
> program I can use to "program by analogy".  I just want to 
> read two CSV files and match them on several fields, 
> manipulate some of the fields, and write a couple of output 
> files.
...
> Please forgive me if this is so, and take pity on a stranger 
> in a strange land.

Pittsburgh is a little strange, but not *that* bad :)

Just for fun, I threw together a simple (about 30 lines) program 
to do what you describe.  Consider it a bit of slightly belated 
Christmas pity on the assumption that this isn't classwork (a 
little googling suggests that it's not homework).  It's 100% 
untested, so if it formats your hard-drive, steals your spouse, 
wrecks your truck, kicks your dog, makes a mess of your 
trailer-home, and drinks all your beer, caveat coder.  But you've 
got the source, so you can vet it...and it's even commented a bit 
for pedagogical amusement if you plan to mung with it :)

   from csv import reader
   SMALL = 'a.txt'
   OTHER = 'b.txt'
   smaller_file = {} # key->line mapping dict for the smaller file
   f_a = file(SMALL)
   r_a = reader(f_a)
   #a_headers = reader.next() # optionally discard a header row

   # build up the map in smaller_file of key->line
   for i, line in enumerate(r_a):
     a1, a2, a3, a4, a5 = line # name the fields
     key = f1, f3, f5
     if key in smaller_file:
       print "Duplicate key [%r] in %s:%i" % (key, SMALL, i+1)
       #continue # does the 1st or 2nd win? uncomment for 1st
     smaller_file[key] = line
   f_a.close()

   b = file(OTHER)
   r_b = reader(b)
   #b_headers = reader.next() # optionally discard a header row
   for i, line in enumerate(r_b):
     b1, b2, b3, b4, b5, b6, b7, b8, b9 = line
     key = b2, b8, b9
     if key not in smaller_file:
       print "Key for line #%i (%r) not in %s" % (i+1, key, SMALL)
       continue
     a1, a2, a3, a4, a5 = smaller_file[key]
     # do manipulation with a[1-5]/b[1-9] here
     # and do something with them
   b.close()

It makes more sense if instead of calling them a[1-5]/b[1-9], you 
actually use the field-names that may have be in the header rows 
such as

   cost_center, store, location, manager_id = line
   key = cost_center, store, location

You may also have to manipulate some of the values to make 
key-matches work, such as

   cc, store, loc, mgr = line
   cc = cc.strip().upper()
   store = store.strip().title()
   key = cc, store, loc

ensuring that you do the same manipulations for both files.

The code above reads the entire smaller file into memory and uses 
it for fast lookup.  However, if you have gargantuan files, you 
may need to process them differently.  You don't detail the 
fields/organization of the files, so if they're both sorted by 
key, you can change the algorithm to behave like the standard 
*nix "join" command.

Other asides:  you may have to tweak treatment of a header-row 
(and correspondingly the line-numbers), as well as 
conflict-handling for keys in your a.txt source if they exist, 
along with the behavior when a key can't be found in a.txt but is 
requested in b.txt (maybe set some defaults instead of logging 
the error and skipping the row?), and then lastly and most 
importantly, you have to fill in the manipulations you desire and 
then actually do something with the processed results (write them 
to a file, upload them to a database, send them via email, output 
them to a text-to-speech engine and have it speak them, etc).

> I come from 30 years of mainframe programming so I understand
> how computers work at a bits/bytes /machine language/ source
> vs.executable/reading core dumps level,  and I can program in
> a lot of languages most people using Python have never even
> heard of,

If there's such urgency, I hope you resorted to simply using one 
of these multitude of other languages you know -- Even in C, this 
wouldn't be too painful as projects go (there's a phrase you 
won't hear me utter frequently).  Or maybe try your hand at it in 
pascal, shell-scripting (see the "join" command) or even assembly 
language.  Not sure I'd use Logo, Haskel, Erlang, or Prolog. :)

> My problem is that I want to do this all yesterday, and the
> Python text I bought is not easy to understand. I don't have
> time to work my way through the online Python tutorial. 

As Rick mentioned, there are a number of free online sources for 
tutorials, books, and the like.  Dive Into Python is one of the 
classics.  Searching the archives of comp.lang.python for 
"beginner books" will yield the same thread coming up every 
couple weeks.  For future reference, if you've got time-sensitive 
projects to tackle "yesterday", it's usually not the best time to 
try and learn a new language.  Good luck in your exploration of 
Python.

-tkc







More information about the Python-list mailing list