Help With EOF character and regular expression matching: URGENT
Robert Brewer
fumanchu at amor.org
Sun Feb 22 19:58:22 EST 2004
> I want to strip off the headers, like: To From
> Returned Path etc...
Have a look at the 'email' module in the Library.
> and also the characters that are not ASCII and also
> the characters that are between <> so as to avoid HTML
> Tags.
> I have zero experience with regular expressions
> but if you or some one can give me an idea/snippet I
> think I can make it work.
import re
text = "look ma, <b>no</b> html!"
cleaned = re.sub(r'<[^>]*>', '', text)
print cleaned
> Also while I can write the words extracted to a file
> what are the advisable ways to associate them with the
> index? Also I want to avoid writing in the dictionary
> the same 2 words with different indexes?
Look at the 'sets' module in the Library.
More information about the Python-list
mailing list