Newbie query - reading text file (with column headings) into dictionary

Mike Meyer mwm at mired.org
Tue Dec 10 03:03:14 EST 2002


"Terry Reedy" <tjreedy at udel.edu> writes:

> "Mike Meyer" <mwm at mired.org> wrote in message
> news:x7lm2z3h86.fsf at guru.mired.org...
> > > Fortran (at least in the past) slices arrays in this 'unusal
> > > direction'.
> > > It facilitates adding new columns (new = log(old),
> > Adding new featuers to a class is easy: my.new = log(my.old).
> If the table is 100 lines of 10 attributes, then the OP should do
> whats easiest and clearest (for the language he is using).  If the
> table is a million lines with a thousand attributes, then data
> organization can make a big difference in performance.  Unfortunately,
> the optimal organization (row versus column) depends on the operation.

Right. I wouldn't argue that it depends on your usage. That's why I
just mentioned that it didn't belong in a cookbook - because that
should be things that represent what's easiest and clearest in the
language - rather than that he was doing things the wrong way.

> So my point is that column organization can sometimes be a rational
> choice.  With column organization, 'new=log(old)' requires
> sequentially reading one of the thousand blocks and sequentially
> writing one.  With row/object organization, 'my.new = log(my.old)'
> requires reading and rewriting everything (slightly expanded).

I think something got lost. Doing it with columns as objects requires
walking the entire column generating a new one. Doing it with rows as
objects requires walking the list of rows and doing it once for each
row. How much that saves you is going to depend on the storage medium.
If it all fits in memory, there's no saving at all.

If the table has millions of lines and thousands of attributes, I'd
argue that neither organization is correct. For a scripting language,
the correct organization is to put the data in an application
optimized for dealing with such data, and then deal with that. SQL
databases work quite well, and there are even some one-file SQL
databases written in Python if you really want that.

        <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list