Parsing
John Hunter
jdhunter at ace.bsd.uchicago.edu
Thu Jul 10 13:59:19 EDT 2003
>>>>> "Michael" == Michael <whatsupg21 at hotmail.com> writes:
Michael> I have been assigned a project to parse a webpage for
Michael> data using Python. I have finished only basic
Michael> tutorials. Any suggestions as to where I should go from
Michael> here? Thanks in advance. --
Michael> http://mail.python.org/mailman/listinfo/python-list
Check out this article
http://www.unixreview.com/documents/s=7822/ur0302h/
and then search the comp.lang.python achives for "web scraping"
http://www.google.com/groups?as_q=web%20scraping&safe=off&ie=UTF-8&oe=UTF-8&as_ugroup=*python*&lr=&num=20&hl=en
See, for example,
http://www.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&threadm=mailman.1055971051.1456.python-list%40python.org&rnum=1&prev=/groups%3Fnum%3D20%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26safe%3Doff%26q%3Dparsing%2Bcomplex%2Bweb%2Bpages%2Bgroup%253A*python*%26btnG%3DGoogle%2BSearch
You should look at the string and re modules for relatively simple
information retrieval, and the htmllib and formatter modules for more
complicated solutions where parsing is required. The best solution
depends on the format of the web pages you need to parse.
John Hunter
More information about the Python-list
mailing list