searching string url

Mike Meyer mwm at mired.org
Wed Jul 27 23:38:54 EDT 2005


googlinggoogler at hotmail.com writes:

> Anyway to the orginally replier - I wish it was homework ;-), that
> would mean I wouldnt be trying to find myself a job as a recent
> graduate... I decided to crawl something similar to the yellow pages
> (do you have them in the US?)  for my select area and then find all
> pages corresponding to my ideal field of work, and grab their details
> into a txt file.

I'm actually working on a general framework for doing this kind of
thing. It's designed specifically for walking through a collection of
pages from a web-based search engine, applying extra criteria to the
results, and then running a bit of code on any that pass that check.

It works for one site, but my attempt to try it on a second site
turned up a fundamental flaw. My first site used full URLs for
everything, so I happily passed soup between various methods. The
second site used relative urls for everything, and it all broke.

> Trouble is I keep thinking of cool new bits to add, python truely is a
> beautifal language. Ideally would like to somehow write all the
> information into a word mail merge - but I think that requires more
> research!

Given a working scrape, the only extra work is how to get it into a
mail merge. That depends on your platform and the software you're
using to send the mail. Shouldn't be all that hard.

      <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list