spider, why isnt it finding the url?

notnorwegian at yahoo.se notnorwegian at yahoo.se
Thu May 22 20:02:35 EDT 2008


this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
    obj = url.search(row)
    if obj != None:
        print obj.group()



More information about the Python-list mailing list