Newbie query - reading text file (with column headings) into dictionary

Bengt Richter bokr at oz.net
Sun Dec 8 05:19:54 EST 2002


On Sun, 8 Dec 2002 12:44:42 +1300, "Andy Elvey" <andy.elvey at paradise.net.nz> wrote:

>Hi all.
>  I have a text file (comma separated) which looks like this -
> Country, City,  Var1, Var2, Var3
> Australia, Sydney, 123,456,789
> Australia, Perth, 415, 201, 142
> .... and so on.
How far will "so on" go? A thousand lines? 100k lines? 10 million?
It will make a difference. But later. We haven't got a spec yet.

>( Btw- I'm running Python 2.2 on Win98 )
>  I want to read the file into a dictionary so that the column headings
>(Country, city ,... ) are the keys, and the rest of the file is slotted into

You are saying "dictionary," but are you sure you mean dictionary?

Let's just say you want a magic black box (MBB) that will do what you want ;-)
Things that do things with information are often well modeled with classes.
So we'll start with
_____________________________________

class MBB:
    """magic black box for Andy"""
_____________________________________

and think where to go next.

To get an instance, we'll write

mbb = MBB()

Now what should mbb be able to do?

It needs to know about column names, so we'll give it a way to do that, namely a
mbb.set_col_names method. Likewise it will need to load your comma-delimited file,
so we'll provide a method for that. We can call it mbb.load_csv. We'll assume that
we'll just make a list of cleaned-up row field tuples for starters.

Later you may want to open and save an existing MBB content from a previous time,
so we'll assume it's going to do that by some name. We'll give each MBB instance
a name when we create it. Ok, that gets us to (untested):
_____________________________________

class MBB:
    """magic black box for Andy"""

    def __init__(self, name):
        self.name = name

    def set_col_names(self, *names):
        """save column names, e.g., mbb.set_colnames("Country", "City", "Var1", "Var2", "Var3")
        if len(names)==1 and isinstance(names[0],(list,tuple)): #allow call with single list or tuple
            self.colnames = tuple(names[0])
        else:
            self.col_names = namelist

    def open(self, mode):
        self.mode = mode
        # XXX ??

    def close(self, how=None):
        pass   # XXX ??

    def load_csv(self, filepath):
        """Load the black box with rows of data from a file"""
        f = file(filepath)
        self.rows = []    # this will hold all the row data
        for line in f:
            fields = line.split(',')
            fields = map(str.strip, fields) # trim leading and trailing whitespace??
            rows.append(tuple(fields))    # tentatively, a list of field tuples
        f.close()
_____________________________________

We assumed you want to split the fields out and clean off leading and trailing spaces.
If you have to deal with quoted strings containing commas (the delimiter), that can be
done, but we'll cross that bridge later.

I'll assume you want to be able to retrieve data from rows by specifying country and/or
city. You may want to use a real database and simple sql select statements, but we'll
just proceed so you can get some ideas.

Time to decide what you want to retrieve, and how you want to specify it. The data is
a table of n rows with labeled columns.

Presumably you will want to get back selected column values from selected rows. Maybe one
row at a time, or maybe a chunk or maybe all at once. That will depend on your application.
We'll reserve a place for chunk size.

Ok, how to express that a row is selected? You probably want to be able to say that
a series of things should be true about certain column values or else another series
of things about other column values, etc. But let's start with something easy, like
Country=="Australia". How to express that as a condition we can pass to a query method of
our MBB?

How would SQL suit you? E.g.,

    result = MBB.SQL('''\
        SELECT City, Var1, Var3
        FROM CityInfo 
        WHERE Country = "Australia" AND Var2 < 100
    ''')

We'd have to limit the syntax etc., so it's legal SQL but very restricted.
(But a first cut would not likely be secure against malicious SQL!)


Anyway, it might be fun to reinvent(?) a little dumbSQL database that would be just big
enough a subset to handle your problem, but whose query interface would be all SQL.
I'm changing course here, because I got a little interested in doing dumbSQL and don't
have much to go on. ;-)

I.e., we don't know what you really need. Suppose mbb.load_csv(csvfile) loads up your
data one way or the other, what will you want to get back out? Write the specs:

 values =  mbb.get_what_andy_wants(parameters, required, by, andy_s, selection, criteria, etc)

>the appropriate key ( so that "Australia" has "Country" as the key, "Sydney"
>and "Perth" have "City" as the key, and so on.  This must be quite
>straightforward to do, but searching the 'net (including the Python Cookbook
Nothing is straight forward until you have your requirements clear.
You'll have to specify what kinds of output you want from what kinds of inputs ;-)

>site) turns up nothing along these lines .... So - any help is very much
>appreciated.
You may want to look at relational databases a bit.

Or you may want to leave the data as is in the file, and just process the whole
thing every time you want to extract some information. For something small, that
could work fine. Your interface to mbb.get_what_andy_wants() could be identical
either way. You have to decide what you want and how many different methods you
may need.

> (Oh - and the solution to this would be very useful to have on the "Python
>Cookbook" site .... :-)  Thanks in advance ...

"this" still needs to be nailed down. Requirements, requirements ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list