Help With EOF character: URGENT

dont bother dontbotherworld at yahoo.com
Sun Feb 22 21:22:18 EST 2004


I am trying to build an engine for text classification
with AI techniques. I am also trying to do email
classification using this: 
How can i use a HTML Parser with Python?
Any hints?
I do agree that HTML doc has lot of attributes and its
not an easy game to do parsing: I had thought python
would help, But I am still wondering: I have a good
hand at C but I migrated just in case python has some
modules which can assist me
Thanks
Dont
--- Jorge Godoy <godoy at ieee.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Sunday 22 February 2004 22:36, you wrote:
> > Can you tell me something for stripping a text
> > for example:
> >
> > jorgeGodoy<b>Tall<\b><b>file<\b>
> > I want to strip this string in:
> >
> > 1 jorgeGodoy
> > 2 Tall
> > 3 file
> >
> > ignoring htmls and associating index for each of
> them
> >
> > Thanks a Ton
> > Dont
> 
> You'd better go with an HTML parser or something
> like that. 
> 
> Parsing HTML is not all that easy, specially if it
> has attributes and other 
> stuff. 
> 
> With regards to indexing each word, you might use a
> dictionary. You can even 
> count how many times a word appears with that.
> 
> There's a nice example at divingintopython.org,
> IIRC. 
> 
> 
> Be seeing you,
> - -- 
> Godoy.     <godoy at ieee.org>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.3 (GNU/Linux)
> 
>
iD8DBQFAOVysEzC+baSjBiURAnbSAJ42r8NzidJyyMC60Ydn0ok1vcklUQCeK3k/
> 3e3vPeMRQZLA1ht0zR/8Pj0=
> =6yeL
> -----END PGP SIGNATURE-----


__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools




More information about the Python-list mailing list