Fast file data retrieval?

Jon Clements joncle at googlemail.com
Mon Mar 12 23:38:25 EDT 2012


On Monday, 12 March 2012 20:31:35 UTC, MRAB  wrote:
> On 12/03/2012 19:39, Virgil Stokes wrote:
> > I have a rather large ASCII file that is structured as follows
> >
> > header line
> > 9 nonblank lines with alphanumeric data
> > header line
> > 9 nonblank lines with alphanumeric data
> > ...
> > ...
> > ...
> > header line
> > 9 nonblank lines with alphanumeric data
> > EOF
> >
> > where, a data set contains 10 lines (header + 9 nonblank) and there can
> > be several thousand
> > data sets in a single file. In addition,*each header has a* *unique ID
> > code*.
> >
> > Is there a fast method for the retrieval of a data set from this large
> > file given its ID code?
> >
> Probably the best solution is to put it into a database. Have a look at
> the sqlite3 module.
> 
> Alternatively, you could scan the file, recording the ID and the file
> offset in a dict so that, given an ID, you can seek directly to that
> file position.

I would have a look at either bsddb, Tokyo (or Kyoto) Cabinet or hamsterdb. If it's really going to get large and needs a full blown server, maybe MongoDB/redis/hadoop...



More information about the Python-list mailing list