Help With EOF character: URGENT
dont bother
dontbotherworld at yahoo.com
Sun Feb 22 21:22:18 EST 2004
I am trying to build an engine for text classification
with AI techniques. I am also trying to do email
classification using this:
How can i use a HTML Parser with Python?
Any hints?
I do agree that HTML doc has lot of attributes and its
not an easy game to do parsing: I had thought python
would help, But I am still wondering: I have a good
hand at C but I migrated just in case python has some
modules which can assist me
Thanks
Dont
--- Jorge Godoy <godoy at ieee.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sunday 22 February 2004 22:36, you wrote:
> > Can you tell me something for stripping a text
> > for example:
> >
> > jorgeGodoy<b>Tall<\b><b>file<\b>
> > I want to strip this string in:
> >
> > 1 jorgeGodoy
> > 2 Tall
> > 3 file
> >
> > ignoring htmls and associating index for each of
> them
> >
> > Thanks a Ton
> > Dont
>
> You'd better go with an HTML parser or something
> like that.
>
> Parsing HTML is not all that easy, specially if it
> has attributes and other
> stuff.
>
> With regards to indexing each word, you might use a
> dictionary. You can even
> count how many times a word appears with that.
>
> There's a nice example at divingintopython.org,
> IIRC.
>
>
> Be seeing you,
> - --
> Godoy. <godoy at ieee.org>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.3 (GNU/Linux)
>
>
iD8DBQFAOVysEzC+baSjBiURAnbSAJ42r8NzidJyyMC60Ydn0ok1vcklUQCeK3k/
> 3e3vPeMRQZLA1ht0zR/8Pj0=
> =6yeL
> -----END PGP SIGNATURE-----
__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
More information about the Python-list
mailing list