it is difficult to get all URL's in a page you can use sgmllib module to parse html files can get the standard href .