[Tutor] Python NNTP Scripts?

Daniel Kinnaer Daniel.Kinnaer@Advalvas.be
Thu, 19 Apr 2001 22:42:26 +0200


Below is a very straightforward script I tried to write which scans for a
single word in the NNTP-Headers and only downloads the MessageBody to disk
if the specific word is found in the Headers.  I must admit that I copied
much from the EffBot Guide: The Standard Python Library by /F :) It is my
first 'real' Python program, so don't be too hard on me :)


##############################################################
# script : NNTP_ScanHeaders_GetBody_on_found.py
# author : Daniel Kinnaer
# ref    : The EffBot Guide: The Standard Python Library by /F
# start  : 26 December 2000
# finish : 28 December 2000
# updates:
#
# purpose: connect to an NNTP Server
#          check all de messages kept on the server for
#          authors named "Daniel". If found, download
#          that message and save it on disk
#
##############################################################

#get libraries
from nntplib import *
import string, StringIO, rfc822

#name variables
SERVER="news.skynet.be"
GROUP="comp.lang.python"
KEYWORD="Daniel"
THEPATH="C:/Data/Py/AProject/NNTP/NNTPBodies/" #Slash, not BackSlash!

print " "

#connect to server
server=NNTP(SERVER)
print "### Connecting to " + SERVER + "..."

#select a newsgroup
print "### reading group " + GROUP + "..."
resp,count,first,last,name=server.group(GROUP)
print "count=%s  first=%s  last=%s " %(count,first,last)

#get every item from id=first to id=last
resp, items = server.xover(first,last)

#search every item for the KEYWORD
print "### now searching for keyword " + KEYWORD + " in each article from
group " + name
for id,subject,author,date,message_id,references,size,lines in items:
    if string.find(author,KEYWORD)>=0:
        resp,id,message_id,text=server.article(id)

        #save body
        artikellijnen =str(len(text))
        FileNaam=THEPATH + str(id)+'.txt'
        f=open(FileNaam,'w')
        print "writing %s lines to file %s "  %(artikellijnen,FileNaam)

        author=author+'\n'
        f.write(author)
        subject=subject+'\n'
        artikellijnen=artikellijnen + "\n"
        f.write(artikellijnen)

        text=string.join(text,"\n")
        file=StringIO.StringIO(text)
        message=rfc822.Message(file)
        for k,v in message.items():
            lijn=k,"=",v+'\n'
            f.writelines(lijn)
        lijnen=message.fp.read()
        f.writelines(lijnen)
        f.write( "\r\n")
        f.close()
print "End of Script..."


How can this be improved with 'regex'?  I'd like to scan on several parts of
the header (like must contain A and B or C). What is the best way to improve
this?

Hope to read you soon.  Best regards,  Daniel...