Memory Error while constructing Compound Dictionary

Benjamin Scott mynewjunkaccount at hotmail.com
Tue Sep 7 23:46:18 EDT 2004


Thanks for the replies.

First I will make a minor correction to the code I originally posted
and then I will describe the original problem I am trying to solve,
per Alex's request.

Correction:

for s in Lst:
     for t in nuerLst:
          for r in nuestLst:
               Dict[s][t][r]={}

...should actually be...

for s in Lst:
     for t in nuerLst:
          for r in nuestLst:
               Dict[s][t][r]=[]

That is, the object accessed by 3 keys is a list, not a 4th
dictionary.


The Original Problem:

The data set:  3 Columns and at least 100,000 rows.  However, it can
be up to 1,000,000 rows.

For the purpose of illustration let's suppose that the first column
has the name of 1,000 "Factories", i.e. there are 1,000 unique symbols
in the first column.  Likewise, suppose the second column contains a
"production date" or just a date; there are 250 unique dates in the
second column.  Finally, suppose the third column contains a
description of a "widget type"; there are 500 unique widget
descriptions.

*** i.e. each row contains the name of one factory which produced one
widget type on a particular date.  If a factory produced more than one
widget on a given date it is reflected in the data as a new row. ***

The motivation to construct the mentioned compound dictionary comes
from the fact that I need quick access to the following data sets:

Data Set for Factory #1:
Column#1: time 1, time 2, ... , time 250
Column#2: #widgets, #widgets, ... , #widgets <- same widget types

Data Set for Factory #2:
Column#1: time 1, time 2, ... , time 250
Column#2: #widgets, #widgets, ... , #widgets <- same widget types

.
.
.

Data Set for Factory #1000:
Column#1: time 1, time 2, ... , time 250
Column#2: #widgets, #widgets, ... , #widgets <- same widget types

Note that if the compound dictionary was created, it would be easy to
construct these data sets like so:

File=open('DataSet', 'r')
Lst=File.readlines()
.
.
.

len(Lst[n])=3

Lst[n][0]="Factory"
Lst[n][1]="date"
Lst[n][2]="WidgetType"

for s in Lst:
     Dict[s[0]][s[1]][s[2]].append('1')
.
.
.

len(Dict["Factory"]["date"]["WidgetType"]) = #Widgets of some type
produced at a Factory on a given date.

The idea here is that I will be graphing a handful of the data sets at
a time; they will then be discarded and a new handful will be
graphed... etc.

What I might attempt next is to construct the required data in R (or
NumPy) since an array object seems better suited for the task. 
However, I'm not sure this will avert the memory error.  So, does
anyone know how to increase the RAM limit for a process?  Other
suggestions are also welcome.



-Benjamin Scott




aleaxit at yahoo.com (Alex Martelli) wrote in message news:<1gjroty.tp0ndj1vwu3e7N%aleaxit at yahoo.com>...
> Benjamin Scott <mynewjunkaccount at hotmail.com> wrote:
>    ...
> > len(Lst)=1000
> > len(nuerLst)=250
> > len(nuestLst)=500
> 
> So you want 1000*250*500 = 125 million dictionaries...?
> 
> > Specs:
> > 
> > Python 2.3.4
> > XPpro
> > 4 GB RAM
> > 
> > 
> > Python was utilizing 2.0 GB when the error was generated.  I have
> 
> So you've found out that a dictionary takes at least (about) 16 bytes
> even when empty -- not surprising since 16 bytes is typically the least
> slice of memory the system will allocate at a time.  And you've found
> out that XP so-called pro doesn't let a user program have more than 2GB
> to itself -- I believe there are slight workarounds for that, as in
> costly hacks that may let you have 3GB or so, but it's not going to help
> if you want to put any informatiion in those dictionaries, even a tiny
> amount of info per dict will easily bump each dict's size to 32 bytes
> and overwhelm your 32-bit processor's addressing capabilities (I'm
> assuming you have a 32-bit CPU -- you don't say, but few people use
> 64-bitters yet).
> 
> What problem are you really trying to solve?  Unless you can splurge
> into a 64-bit CPU with an adequate OS (e.g., AMD 64 with a Linux for it,
> or a G5-based Mac) anything requiring SO many gigabytes probably needs a
> radical rethink of your intended architecture/strategy, and it's hard to
> give suggestions without knowing what problem you need to solve.
> 
> 
> Alex



More information about the Python-list mailing list