[Tutor] Filesystem vs Database vs Lucene

Alan Gauld alan.gauld at btinternet.com
Mon Mar 26 23:24:01 CEST 2007


"Shitiz Bansal" <shitizb at yahoo.com> wrote

> I need to implement a system which stores Strings(average length 50 
> chars).
> For every input String it would need to tell the user wether that 
> string
> already exists in the system. It would also need to add that input
> String to the system if it did not exist.

Sounds like a job for a dictionary, except...

> It will also be useful to know the last accessed datetime value of 
> that string.

That can be done with a bit of effort.

> The number of strings is in millions and i also need persistence
> so keeping all Strings in memory is not an option.

10 million x 50 chars = 500MB. So if you have a Gig of RAM
and not much else running on the machine memory might still
be a valid option... but if not...

This rules out a normal dictionary, but what about a shelf?
Have a look at the shelve module, it makes a file look a lot like
a dictionary. It should solve your problem. And you can store
either a string or a string/date tuple.

I'm not sure how a shelf would perform compared to a database,
but its a lot simpler to manage.

> Would it be wiser to keep these Strings in an indexed column
> of the DB or would it be better to keep these strings as filenames
> on the filesystem in a folder hiearchy of some sort.

I'd definitely go for the database approach if not using shelve.

> Please also bear in mind the time required to insert the
> strings (for eg. i tried using a database but found the insertion
> time to be very high once i indexed the particular column.

That's common, so I'd suggest not indexing. Its the rebuild of the
index that takes the time. Or if you can break the strings into
categories to reduce the size of the tables that would help.
But that depends on how easy it is to categorise the strings
such that you know where to insert/search.

Also consider using the dbm family of moidules, for simple
data access they often out perform a full SQL database.

HTH,
-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 




More information about the Tutor mailing list