URGENT: REALLY NEED HELP: Feel Helpless
dont bother
dontbotherworld at yahoo.com
Tue Mar 9 04:25:05 EST 2004
Hey Friends,
I am stuck up. I have to finish this class project. I
went up trying out python in the only few days I had,
and now I feel a bit nervous about . I have to
complete my project amidst finals. The only problem I
have is in generating feature vectors for spam
classification. The rest I can do myself with a C
engine.
Heres the python code, for generating a dictionary and
a vector . My problem is that when a new email
arrives, I have to parse it, remove html tags and
compare the words in the payload with the words in the
dictionary. (which my program is doing)
If there is a match, I want the exact index of the
word in the dictionary. I have to figure out only
this:
Rest is not so difficult.
I am listing my previous email here, and I will be
really grateful if someone can help me getting around
with this.
I promise myself to be a big python player in the days
ahead.
Thanks
Dont
----------------------------------------------------
# python code for creating dictionary of words from an
input file
import string, StringIO
import mailbox, email, re
import os
import sys
import re
import mailbox
import email.Parser
import email.Message
import getopt
fp=open(sys.argv[1], 'r')
msg=email.message_from_file(fp)
msg=msg.get_payload()
dictpos={}
wordcount={}
#get rid of anything that isn't a letter, and make it
all lowercase:
lower = ''.join(map(chr, range(97, 123)))
fixed_body = msg.translate(65*' '+lower+6*'
'+lower+133*' ')
#words_in_body = fixed_body.split()
msg = fixed_body.split()
for i, w in enumerate(file('dictionary_index')):
dictpos[w.strip()]=i
#print i
#print w
for w in msg:
try:
wordcount[w]+=1
#print wordcount
except KeyError:
wordcount[w]=1
#print wordcount
for w, c in wordcount.iteritems():
try:
print dictpos[w],':',c
except KeyError:
pass
#print wordcount
#print dictpos
#print '\n'
But this does not give me anything. I get no output at
all. I dont really understand, if this is doing the
matching in the words in the email message with the
words in the dictionary and "Yes" if it does, it
should give me the corresponding index.
I have a piece of code, which does check for matching
but the problem as I mentioned, I need the index in
the dictionary not in the index of the word in the
message.
heres the code which gives me the vector, matching the
word in the email message by comparing with the words
in the dictionary:
import string, StringIO
import mailbox, email, re
import os
import sys
import re
import mailbox
import email.Parser
import email.Message
import getopt
#load up external dictionary:
words = open('dictionary_index', 'r').read().split()
dct = {}
for i in xrange(len(words)):
dct[words[i]] = i
print dct.values()
#make vector:
vector = {}
fp=open(sys.argv[1], 'r')
msg=email.message_from_file(fp)
msg=msg.get_payload()
#a = float(len(fp))
#a = float(len(words_in_body))
#get rid of anything that isn't a letter, and make it
all lowercase:
lower = ''.join(map(chr, range(97, 123)))
fixed_body = msg.translate(65*' '+lower+6*'
'+lower+133*' ')
#words_in_body = fixed_body.split()
msg = fixed_body.split()
a = float(len(msg))
print a
for i in msg:
if i in dct:
try:
vector[i] += 1
except:
vector[i] = 1
for v,i in enumerate(vector):
vector[i] /= a
print v,i, vector[i]
#; if u want to see the word too that was commmon
#print v, ":",vector[i]
#rint "\n"
#1.write(s)
#1.close()
-----------------------------------------------
__________________________________
Do you Yahoo!?
Yahoo! Search - Find what youre looking for faster
http://search.yahoo.com
More information about the Python-list
mailing list