Access to database other web sites
Cousin Stanley
CousinStanley at hotmail.com
Fri Sep 26 14:22:26 EDT 2003
| IIUYC, what you're contemplating is called "web scraping"
| ....
John ....
I did a bit of web scraping over the past week end
for a friend that is interested in Lotto numbers ....
The Lotto numbers were readily available on the web
and presented as well-formed and readable HTML tables ....
The primary problem I found up front was to be able
parse and transform this data into something
that Python, or any other language, might be able
to cope with for subsequent analysis ....
Since the number of records that I was dealing with
in this case was relatively small, only a couple of thousand,
I could manage the initial data transformations
using my genetically encoded EyeBall parser,
a text editor, and a couple of one-off Python scripts ....
The first step in each case for the source files
was using HTML Tidy to ...
"clean up the horrid HTML you'll find on the web "
I'd like to empashize for the benefit of the original poster
that the initial data parsing will probably entail a fair amount
of non-trivial work and that the subsequent data analysis
and reporting will seem almost trivial by comparison ....
Thanks for posting the info regarding different approaches,
as I think it will be useful for me when I get around
to replacing my EyeBall parser with something more effective ....
--
Cousin Stanley
Human Being
Phoenix, Arizona
More information about the Python-list
mailing list