Parsing

John Hunter jdhunter at ace.bsd.uchicago.edu
Thu Jul 10 13:59:19 EDT 2003


>>>>> "Michael" == Michael  <whatsupg21 at hotmail.com> writes:

    Michael> I have been assigned a project to parse a webpage for
    Michael> data using Python. I have finished only basic
    Michael> tutorials. Any suggestions as to where I should go from
    Michael> here? Thanks in advance.  --
    Michael> http://mail.python.org/mailman/listinfo/python-list

Check out this article

  http://www.unixreview.com/documents/s=7822/ur0302h/


and then search the comp.lang.python achives for "web scraping"

  http://www.google.com/groups?as_q=web%20scraping&safe=off&ie=UTF-8&oe=UTF-8&as_ugroup=*python*&lr=&num=20&hl=en

See, for example,

  http://www.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&threadm=mailman.1055971051.1456.python-list%40python.org&rnum=1&prev=/groups%3Fnum%3D20%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26safe%3Doff%26q%3Dparsing%2Bcomplex%2Bweb%2Bpages%2Bgroup%253A*python*%26btnG%3DGoogle%2BSearch


You should look at the string and re modules for relatively simple
information retrieval, and the htmllib and formatter modules for more
complicated solutions where parsing is required.  The best solution
depends on the format of the web pages you need to parse.

John Hunter





More information about the Python-list mailing list