scraping from bundes-telefonbuch.de with python

Sat Jun 19 08:16:45 EDT 2010

On 19 lip, 12:23, davidgp <davidvanijzendo... at gmail.com> wrote:
> hello, i'm new on this group, and quiet new to python!
> i'm trying to scrap some adress data from bundes-telefonbuch.de but i
> run into a problem:
> the link is like this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
> and it is basically the same for every search query.
> thus i need to submit post data to the webserver, i try to do this
> like this:
>
> opener = urllib2.build_opener()
> opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
> Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
> urllib2.install_opener(opener)
>
> data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
> G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
> starten','S': '1','translationtemplate': 'checkstrasse'})
>
> url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
> response = urllib2.urlopen(url, data)
>
> this returns a page saying i have to reenter my search terms..
> what's going wrong here?
>
> Thanks!!

Try mechanize : http://wwwsearch.sourceforge.net/mechanize/

import mechanize
response = mechanize.urlopen("http://www.bundes-telefonbuch.de/")
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
form["F0"] = "query" #enter query
html =  mechanize.urlopen(form.click()).read()
f = open("tmp.html","w")
f.writelines(html)
f.close()

Or you can try to parse response but I think that their HTML is not
valid