[Tutor] Pictures

naheed arafat naheedcse at gmail.com
Thu Apr 28 17:03:58 CEST 2011


Observing the page source i think :

    page=urllib.urlopen('http://finance.blog.lemonde.fr').read()

    x=re.findall(r"<img\s+src='([\S]+)'",page)
    #matches image source of the pattern like:
    #<img src='
http://finance.blog.lemonde.fr/filescropped/7642_300_400/2011/04/1157.1301668834.jpg
'
    y=re.findall(r"<img\s+src=\"([\S]+)\"",page)
    # matches image source of the pattern like:
    # <img src="
http://s2.lemde.fr/image/2011/02/16/87x0/1480844_7_87fe_bandeau-lycee-electrique.jpg
"
    x.extend(y)
    x=list(set(x))
    for img in x:
        image=img.split('.')[-1]
        if image=='jpg':
            print img
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110428/8021a8a5/attachment.html>


More information about the Tutor mailing list