beginner's question

Sean Ross sross at connectmail.carleton.ca
Fri May 14 13:24:31 EDT 2004


"Hadi" <hadi at nojunk.com.au> wrote in message
news:c82s64$f66$1 at lust.ihug.co.nz...
> Sean,
>
> I appreciate for your help. Your script works fine. I have one last
> question.
> What do I have to modify on the script, so the script will work for more
> than one line?
> Because the file-1 has about 500 lines.
> This is for my Machine Learning assignment which I use this for the Naive
> Bayes classifier for the spam mail classification.
> Thank you again and I would appreciate very much one more help from you.
>
> Regards,
> Halit
>
[snip]

Well, you have a file (file-1), which you now know how to open.

src = file("file-1.txt")

There are a few ways to proceed, but I'll choose one and stick with that.
You'll want to go through the file, one line at a time (it's possible to
read in the entire file and process it that way, but we'll skip that for
now). You can go through a file one line at a time using a for loop:

for line in src:
    ... do stuff with line ...


Now, in the code I posted earlier I used a list comprehension where I read
one line of the file and extracted the words (storing them in a list called
words).

words = [w.strip() for w in src.readline().split(',')]


Using the for loop, that line has already been read so you do not need to
use src.readline() any more, but you still need to do the processing on the
line to extract the words. I think you should be able to figure out how to
do that, so I won't show the code changes here. Once you have the list of
words for that line, you need to see whether they are in your lexicon and
write True or False to your output file (file-2.txt, in this case). The code
I provided earlier gives an example of how to do that. If you need the
output to be in a certain format, other than what the example shows, you
should be able to figure out how to get that done as well.

So, to recap, you'll be using a for loop to go through the input file, one
line at a time. For each line, you will extract the words and store them in
a list. Then, for each word in that list, you will check whether it is in
your lexicon, and write either True or False to your output file. When
you're done, you can close both files explicitly, or let Python take care of
it for you.

That should be enough to get you started. If you have more difficulties,
post your code, and people will help point you in the right direction.

Good luck,
Sean





More information about the Python-list mailing list