Check URL --> Simply? (fwd)

Brian Quinlan BrianQ at ActiveState.com
Thu Aug 16 13:21:25 EDT 2001


> Ever more opportunity at shameless self-promotion.  This
> zillion special cases of 404-ish pages is something I use
> as an example in my forthcoming book _Text Processing in
> Python_ (a few more months until done).  Here's the code
> I present as an attempt at recognizing what only humans
> can:
>
> [code snipped]

Cool. The more books the better. Quick question: why are all of the
probabilities cumulative?
For example, did I correctly analyze these scenarios?

<BODY>
	<H1>404</H1>
</BODY>
=> 0.10 + 0.15 = 0.25

and:

<TITLE>
	Not Found
</TITLE>
=> 0.80 + 0.40 = 1.20

Cheers,
Brian







More information about the Python-list mailing list