[Tutor] Filesystem vs Database vs Lucene

Shitiz Bansal shitizb at yahoo.com
Thu Mar 29 02:13:29 CEST 2007


Thanks Alan for your reply.
I however have a few more concerns.

Memory is not an option not just because of capacity but also because I desire persistence.
Also I feel that 500 MB of data in a dictionary would typically cross 1 Gigin total used memory.That of course will depend on the Hashing algorithm used and the actual data.Anyway this program is just a module of something quite larger so...

Shelves seem nice but I would like to be sure on how it is implemented. Would it 
seek to a particular point in file and load just the desired data in memory or does it unpickle everything and create a fullblown dictionary in memory?...in which case the memory problem will resurface.

If this is addressed I would definitely opt for shelve over db/dbm as it would be simpler. Also the underlying algorithm being a hash search instead of a b-tree search/insert as in most database indexes, this is bound to outperform them.

It might be blasphemous to say this here but is there an equivalent c library as I am willing to spend an extra amount of time coding for that extra zing in performance.

Alan Gauld <alan.gauld at btinternet.com> wrote: "Shitiz Bansal"  wrote

> I need to implement a system which stores Strings(average length 50 
> chars).
> For every input String it would need to tell the user wether that 
> string
> already exists in the system. It would also need to add that input
> String to the system if it did not exist.

Sounds like a job for a dictionary, except...

> It will also be useful to know the last accessed datetime value of 
> that string.

That can be done with a bit of effort.

> The number of strings is in millions and i also need persistence
> so keeping all Strings in memory is not an option.

10 million x 50 chars = 500MB. So if you have a Gig of RAM
and not much else running on the machine memory might still
be a valid option... but if not...

This rules out a normal dictionary, but what about a shelf?
Have a look at the shelve module, it makes a file look a lot like
a dictionary. It should solve your problem. And you can store
either a string or a string/date tuple.

I'm not sure how a shelf would perform compared to a database,
but its a lot simpler to manage.

> Would it be wiser to keep these Strings in an indexed column
> of the DB or would it be better to keep these strings as filenames
> on the filesystem in a folder hiearchy of some sort.

I'd definitely go for the database approach if not using shelve.

> Please also bear in mind the time required to insert the
> strings (for eg. i tried using a database but found the insertion
> time to be very high once i indexed the particular column.

That's common, so I'd suggest not indexing. Its the rebuild of the
index that takes the time. Or if you can break the strings into
categories to reduce the size of the tables that would help.
But that depends on how easy it is to categorise the strings
such that you know where to insert/search.

Also consider using the dbm family of moidules, for simple
data access they often out perform a full SQL database.

HTH,
-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 


_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor


 
---------------------------------
Don't be flakey. Get Yahoo! Mail for Mobile and 
always stay connected to friends.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070328/97e90487/attachment-0001.htm 


More information about the Tutor mailing list