Transforming ascii file (pseduo database) into proper database

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Mon Jan 21 17:15:40 EST 2008


p. a écrit :
> I need to take a series of ascii files and transform the data
> contained therein so that it can be inserted into an existing
> database. The ascii files are just a series of lines, each line
> containing fields separated by '|' character. Relations amongst the
> data in the various files are denoted through an integer identifier, a
> pseudo key if you will. Unfortunately, the relations in the ascii file
> do not match up with those in the database in which i need to insert
> the data, i.e., I need to transform the data from the files before
> inserting into the database. Now, this would all be relatively simple
> if not for the following fact: The ascii files are each around 800MB,
> so pulling everything into memory and matching up the relations before
> inserting the data into the database is impossible.
> 
> My questions are:
> 1. Has anyone done anything like this before,

More than once, yes.

> and if so, do you have
> any advice?

1/ use the csv module to parse your text files

2/ use a temporary database (which schema will mimic the one in the flat 
files), so you can work with the appropriate tools - ie: the RDBMS will 
take care of disk/memory management, and you'll have a specialized, 
hi-level language (namely, SQL) to reassemble your data the right way.


> 2. In the abstract, can anyone think of a way of amassing all the
> related data for a specific identifier from all the individual files
> without pulling all of the files into memory and without having to
> repeatedly open, search, and close the files over and over again?

Answer above.



More information about the Python-list mailing list