Transforming ascii file (pseduo database) into proper database

George Sakkis george.sakkis at gmail.com
Mon Jan 21 17:34:09 EST 2008


On Jan 21, 4:45 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> "p." <ppetr... at gmail.com> writes:
> > 1. Has anyone done anything like this before, and if so, do you have
> > any advice?
>
> Sort all the files with an external sort utility (e.g. unix sort), so
> that records with the same key are all brought together.  Then you can
> process the files sequentially.

Seconded. Unix sort can do external sorting [1] so your program will
work even if the files don't fit in memory. Once they are sorted,
itertools (especially groupby) is your friend.

George


[1] http://en.wikipedia.org/wiki/External_sort



More information about the Python-list mailing list