high performance hyperlink extraction

felipevaldez auditor400 at gmail.com
Tue Sep 13 16:15:51 EDT 2005




pretty nice, however, u wont capture the more and more common
javascripted redirections, like



<b onclick='location.href="http://www.nowhere.com"'>click me</b>

nor

<form action="http://www.yahoo.com">
<input type=submit value="clickme">
</form>

nor

<form action="http://www.yahoo.com" name=x>
<input type=button value="clickme" onclick=document.x.submit()>
</form>

.

im guessing it also wont handle correctly thing like:

<a href='javascript:alert("...")'>click</a>


but you probably already knew all this stuff, didnt you?


well, anyway, my 2 cents are, that instead of parsing the html looking
for
urls, like http://XXXX.XXXXXX.XXX/XXX?xXXx=xXx#x

or something like that.


//f3l




More information about the Python-list mailing list