text file vs. cPickle vs sqlite a design question

Wed Apr 11 17:11:08 EDT 2007

On Apr 12, 7:09 am, Bruno Desthuilliers
<bdesth.quelquech... at free.quelquepart.fr> wrote:
> Dag a écrit :
>
>
>
> > I have an application which works with lists of tuples of the form
> > (id_nr,'text','more text',1 or 0).  I'll have maybe 20-50 or so of these
> > lists containing anywhere from 3 to over 30000 tuples.  The actions I
> > need to do is either append a new tuple to the end of the list, display
> > all the tuples or display all the tuples where the last element is a 1
>
> > Basically what I'm wondering is the best way to store these data stuctures
> > to disc.  As the subject mentioned I've basically got three approaches.
> > Store each list as a text file, pickle each list to file or shove the
> > whole thing into a bunch of database tables.  I can see pros and cons
> > with each approach.  Does anybody have any advice as to whether any of
> > these approaches is obviously better than any other?  On one hand I like
> > the text file approach since it lets me append without loading
> > everything into memory, on the other hand the sqlite approach makes it
> > easy to select stuff with SELECT * FROM foo WHERE... which could be
> > handy if ever need to add more advanced filtering.

s/if/when/

>
> Given your specs, I'd go for SQLite without any hesitation. Your data
> structure is obviously relational (a list of tuples is a pretty good
> definition of a relation), so a relational DBMS is the obvious solution,
> and you'll get lots of other benefits from it (SQL being only one of
> them - you can also think about free optimization, scalability, and
> interoperability). And if you don't like raw SQL and prefer something
> more pythonic, then you have SQLAlchemy and Elixir.
>
> My 2 cents...

... and a few more cents:

There are *two* relations/tables involved (at least): a "tuple" table
and a "list" table. The 20-50 or so lists need a unique name or number
each, and other attributes of a list are sure to come out of the
woodwork later. Each tuple will need a column containing the ID of the
list it belongs to. It's a bit boggling that (1) each tuple has an
id_nr but there's no requirement to query on it  (2) req. only to
"append" new tuples w/o checking id_nr already exists (3) req. to
"display" all of 30,000 tuples ...