[Tutor] Python NNTP Scripts?
Daniel Kinnaer
Daniel.Kinnaer@Advalvas.be
Thu, 19 Apr 2001 22:42:26 +0200
Below is a very straightforward script I tried to write which scans for a
single word in the NNTP-Headers and only downloads the MessageBody to disk
if the specific word is found in the Headers. I must admit that I copied
much from the EffBot Guide: The Standard Python Library by /F :) It is my
first 'real' Python program, so don't be too hard on me :)
##############################################################
# script : NNTP_ScanHeaders_GetBody_on_found.py
# author : Daniel Kinnaer
# ref : The EffBot Guide: The Standard Python Library by /F
# start : 26 December 2000
# finish : 28 December 2000
# updates:
#
# purpose: connect to an NNTP Server
# check all de messages kept on the server for
# authors named "Daniel". If found, download
# that message and save it on disk
#
##############################################################
#get libraries
from nntplib import *
import string, StringIO, rfc822
#name variables
SERVER="news.skynet.be"
GROUP="comp.lang.python"
KEYWORD="Daniel"
THEPATH="C:/Data/Py/AProject/NNTP/NNTPBodies/" #Slash, not BackSlash!
print " "
#connect to server
server=NNTP(SERVER)
print "### Connecting to " + SERVER + "..."
#select a newsgroup
print "### reading group " + GROUP + "..."
resp,count,first,last,name=server.group(GROUP)
print "count=%s first=%s last=%s " %(count,first,last)
#get every item from id=first to id=last
resp, items = server.xover(first,last)
#search every item for the KEYWORD
print "### now searching for keyword " + KEYWORD + " in each article from
group " + name
for id,subject,author,date,message_id,references,size,lines in items:
if string.find(author,KEYWORD)>=0:
resp,id,message_id,text=server.article(id)
#save body
artikellijnen =str(len(text))
FileNaam=THEPATH + str(id)+'.txt'
f=open(FileNaam,'w')
print "writing %s lines to file %s " %(artikellijnen,FileNaam)
author=author+'\n'
f.write(author)
subject=subject+'\n'
artikellijnen=artikellijnen + "\n"
f.write(artikellijnen)
text=string.join(text,"\n")
file=StringIO.StringIO(text)
message=rfc822.Message(file)
for k,v in message.items():
lijn=k,"=",v+'\n'
f.writelines(lijn)
lijnen=message.fp.read()
f.writelines(lijnen)
f.write( "\r\n")
f.close()
print "End of Script..."
How can this be improved with 'regex'? I'd like to scan on several parts of
the header (like must contain A and B or C). What is the best way to improve
this?
Hope to read you soon. Best regards, Daniel...