nntplib, huge xover object

carroll at tjc.com carroll at tjc.com
Thu Apr 3 03:15:37 EST 2003


On Tue, 1 Apr 2003 23:52:52 -0500, David Sfiligoi
<webmaster at quanta1.world--vr.com> wrote:

>I built a small script that use the xover function in the nntplib module.   
>The problem that I came across is that what xover can return a huge tuple 
>when there is 1000s of article in a newsgroup(which is frequent)
>
>testxover_resp,testxover_subs = s.xover(start,end)
>
>On my system it's not an issue I have 768 Mb of RAM.... but I have to 
>believe that there is a way to optimise this while keeping all this simple.
>
>How can I limit the amount of memory xover would take. the other I did an 
>xover of a huge group and the python process was taking about 650Mb Res. 
>memory.  

This is a tough one; the problem is, if you were actually issuing the
XOVER command to the NNTP server, you could handle it line-by-line as
it come in.  But nntplib does that on your behalf and hands you the
whole tuple, filled with all the headers.

So as long as you're using nntplib instead of rolling your own (and I
think that using nntplib the way to go), you're going to have the
constraint that you have to have the resources to handle that big
tuple.

>Then I go on .and put  the articlenumbers and subjects into a massive 
>dictionary which I agree it's not  a very inteligent way to do this...but 
>right now it works, I will later get rid of this huge dictonary and replace 
>it with something more memory efficient.

I'm doing something on a smaller scale, with the Unicode Unihan table.
I have one program that reads in the entire Unihan.txt file (about 26
meg), which takes about 15 seconds or so.  I save only the things
entries and attributes I need in a big dictionary, and then pickle the
dictionary to a file.  Later, when I run programs against the Unihan
data, first thing I do is load in the pickled file into the
dictionary, and then run against the dictionary.

Mind you, then you're still back to having the entire dictionary back
in memory.

To avoid this, you might want to take a look at the various dbm
modules.  I haven't used them yet, but I'd start at
http://www.python.org/doc/current/lib/module-anydbm.html

Caveat: I've only been playing with Python for a  month or so.  YOu
may get some better responses from some of the veterans.




More information about the Python-list mailing list