Building a word list from multiple files

Jeff Shannon jeff at ccvcorp.com
Thu Nov 18 23:41:41 EST 2004


Manu wrote:

>hi,
>  
>
>>1) How large are the files you are reading (e.g. can they
>>fit in memory)?
>>    
>>
>
>The files are email messages.
>I will using the the builtin email module to extract only the content
>type which is plain text or in html.So no line by line processing is
>possible unless
>i write my own parser for email.
>  
>

The email package can do that parsing for you -- it's not too difficult 
to feed it a raw message file and get back only the text and/or html 
payload.


>>If not, preprocess the files and use shelve to save a
>>dictionary that has already been processed.  When you
>>    
>>
>
>This is what i was planning to do.Once the processing is done for a
>set of files they are never processed again.I was going to store the
>dict as a string in a file and then use eval() to get it back.
>  
>

Use the shelve module instead of eval()ing it yourself -- the shelve 
authors have already done all of the hard work for you.  It'll act 
almost like a regular dictionary, but is extremely easy to save to disk 
and reload later.

This is why Python is called "batteries included".  :)

Jeff Shannon
Technician/Programmer
Credit International




More information about the Python-list mailing list