Need design advice. What's my best approach for storing this data?

Mudcat mnations at gmail.com
Fri Mar 17 12:08:03 EST 2006


Hi,

I am trying to build a tool that analyzes stock data. Therefore I am
going to download and store quite a vast amount of it. Just for a
general number - assuming there are about 7000 listed stocks on the two
major markets plus some extras, 255 tradying days a year for 20 years,
that is about 36 million entries.

Obviously a database is a logical choice for that. However I've never
used one, nor do I know what benefits I would get from using one. I am
worried about speed, memory usage, and disk space.

My initial thought was to put the data in large dictionaries and shelve
them (and possibly zipping them to save storage space until the data is
needed). However, these are huge files. Based on ones that I have
already done, I estimated at least 5 gigs for storage this way. My
structure for this files was a 3 layered dictionary.
[Market][Stock][Date](Data List). That allows me to easily access any
data for any date or stock in a particular market. Therefore I wasn't
really concerned about the organizational aspects of a db since this
would serve me fine.

But before I put this all together I wanted to ask around to see if
this is a good approach. Will it be faster to use a database over a
structured dictionary? And will I get a lot of overhead if I go with a
database? I'm hoping people who have dealt with such large data before
can give me a little advice.

Thanks ahead of time,
Marc




More information about the Python-list mailing list