newbie file/DB processing

Mike Meyer mwm at mired.org
Thu May 19 03:27:03 EDT 2005


"len" <lsumnler at gmail.com> writes:

> I am in the process of learning python.  I have bought Learning Python
> by Mark Lutz, printed a copy of Dive into Python and various other
> books and looked at several tutorials.  I have started a stupid little
> project in python and things are proceeding well.  I am an old time
> cobol programmer from the IBM 360/370 eria and this ingrained idea of
> file processing using file definition (FD's) I believe  is causing me
> problems because I think python requires a different way of looking at
> datafiles and I haven't really gotten my brain around it yet.  I would
> like to create a small sequential file, processing at first to store a
> group id, name, amount, and date which I can add to delete from and
> update. Could someone point me to some code that would show me how this
> is done in python.  Eventually, I intend to expand my little program to
> process this file as a flat comma delimited file, move it to some type
> of indexed file and finally to some RDBMS system.  My little program
> started out at about 9 lines of code and is now at about 100 with 5 or
> six functions which I will eventually change to classes (I need to
> learn OOP to but one step at a time).

What you're looking for isn't so much the Python way of doing things;
it's the Unix way of doing things. The OS doesn't present the file as
a sequence of records; files are presented as a sequence of bytes. Any
structure beyond that is provided by the application - possibly via a
library. This is a sufficiently powerful way of looking at files that
every modern OS I'm familiar with uses this view of files.

You might want to look at <URL: http://www.faqs.org/docs/artu/ >. It's
not really an answer to your question, but looks at Unix programming
in general. It uses fetchmail as an example application, including
examining the configuration editor written in Python.

A classic Unix approach to small databases is to use text files. When
you need to update the file, you just rewrite the whole thing. This
works well on Unix, because it comes with a multitude of tools for
processing text files. Such an approach is simple and easy to
implement, but not very efficient for large files. A classic example
is a simple phone book application: you have a simple tool for
updating the phone book, and use the "grep" command for searching
it. Works like a charm for small files, and allows for some amazingly
sophisticated queries.

To provide some (simple) code, assume your file is a list of lines,
with id, name, amount, date on each line, separated by spaces. Loading
this into a list in memory is trivial:

datafile = open("file", "r")
datalist = []
for line in data:
    datalist.append(line.split())
datafile.close()

At this point, datalist is a list of lists. datalist[0] is a list of
[id, name, amount, date]. You could (for example) sum all the amounts
like so:

total = 0
for datum in datalist:
    total += datum[2]

There are more concise ways to write this, but this is simple and
obvious.

Writing the list back out is also trivial:

datafile = open("file", "w")
datafile.writelines(" ".join(x) + "\n" for x in datalist)
datafile.close()

Note that that uses a 2.4 feature. The more portable - and obvious -
way to write this is:

datafile = open("file", "w")
for datum in datalist:
    datafile.write(" ".join(datum) + "\n")
datafile.close()

For comma delimited files, there's the CSV module. It loads files
formmated as Comma Seperated Values (a common interchange format for
spreadsheets) into memory, and writes them back out again. This is a
slightly more structured version of the simple text file approach. It
may be just what you're looking for.

If you want to store string objects selectable by a single string key,
the various Unix db libraries are just what the doctor ordered. The
underlying C libraries allow arbitrary memory chunks as keys/objects,
but the Python libraries use Python strings, and dbms look like
dictionaries. The shelve module is built on top of these, allowing you
to store arbitrary Python objects instead of just strings.

Finally, for RDBMS, you almost always get SQL these days. The options
run from small embedded databases built in python to network
connections to full-blown SQL servers. I'd put that decision off until
you really need a database.

    <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list