Access to database other web sites

Cousin Stanley CousinStanley at hotmail.com
Fri Sep 26 14:22:26 EDT 2003


| IIUYC, what you're contemplating is called "web scraping"
| ....

John .... 

    I did a bit of web scraping over the past week end
    for a friend that is interested in Lotto numbers .... 
 
    The Lotto numbers were readily available on the web
    and presented as well-formed and readable HTML tables .... 

    The primary problem I found up front was to be able
    parse and transform this data into something
    that Python, or any other language, might be able
    to cope with for subsequent analysis .... 

    Since the number of records that I was dealing with
    in this case was relatively small, only a couple of thousand,
    I could manage the initial data transformations 
    using my genetically encoded EyeBall parser, 
    a text editor, and a couple of one-off Python scripts ....
    
    The first step in each case for the source files 
    was using HTML Tidy to ... 

        "clean up the horrid HTML you'll find on the web "

    I'd like to empashize for the benefit of the original poster
    that the initial data parsing will probably entail a fair amount
    of non-trivial work and that the subsequent data analysis
    and reporting will seem almost trivial by comparison ....  

    Thanks for posting the info regarding different approaches,
    as I think it will be useful for me when I get around
    to replacing my EyeBall parser with something more effective ....  
     
-- 
Cousin Stanley
Human Being
Phoenix, Arizona





More information about the Python-list mailing list