[Tutor] dictionaries and memory handling

Arild B. Næss arildna at stud.ntnu.no
Mon Feb 26 13:27:13 CET 2007


Thanks a lot for your replies. Using a dbm seems to be a very good  
solution in some cases.

But most of my dictionaries are nested, and since both keys and  
values in the dbm 'dictionaries' have to be strings, I can't  
immediately see how I could get it to work.


A bit more detail: I deal with conditional probabilities, with up to  
4 parameters. These parameters are numbers or words and determine the  
value (which is always a number). E.g. I have a dictionary {p1:{p2: 
{p3:{p4:value}}}}, where the p's are different parameters. I  
sometimes need to sum over one or more of the parameters – for now I  
have managed to structure the dictionaries so that I only need to sum  
over the innermost parameter, although this has been a bit cumbersome.

regards,
Arild Næss


Videresendt melding:
> Fra: " Arild B. Næss " <arildna at stud.ntnu.no>
> Dato: 23. februar 2007 18.30.40 GMT+01:00
> Til: tutor at python.org
> Emne: [Tutor] dictionaries and memory handling
> Delivered-To: tutor at bag.python.org
>
> Hi,
>
> I'm working on a python script for a task in statistical language
> processing. Briefly put it all boils down to counting different
> things in very large text files, doing simple computations on these
> counts and storing the results. I have been using python's dictionary
> type as my basic data structure of storing the counts. This has been
> a nice and simple solution, but turns out to be a bad idea in the
> long run, since the dictionaries become _very_ large, and create
> MemoryErrors when I try to run my script on texts of a certain size.
>
> It seems that an SQL database would probably be the way to go, but I
> am a bit concerned about speed issues (even though running time is
> not all that crucial here). In any case it would probably take me a
> while to get a database up and running and I need to hand in some
> preliminary results pretty soon, so for now I think I'll postpone the
> SQL and try to tweak my current script to be able to run it on
> slightly longer texts than it can handle now.
>
> So, enough beating around the bush, my questions are:
>
> - Will the dictionaries take up less memory if I use numbers rather
> than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory
> than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so:
> Slightly less, or substantially less memory?
>
> - What are common methods to monitor the memory usage of a script?
> Can I add a snippet to the code that prints out how many MBs of
> memory a certain dictionary takes up at that particular time?
>
> regards,
> Arild Næss
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070226/a10cfe4e/attachment-0001.html 


More information about the Tutor mailing list