urllib accept-language doesn't have any effect

Martin Bachwerk martin.bachwerk at rwth-aachen.de
Thu Oct 16 10:07:46 EDT 2008


Hey Philip,

thanks for the snipplet, but I have tried that code already. It does 
indeed give me a swedish version.. of www.google.de :) That's the beauty 
about Google that they have all languages for all domains available.

However if I try it with www.gizmodo.com (a tech blog in several 
languages) I still get the German version.

Both sites obviously redirect the client to the country-based version 
according to the IP first, and Google presents that page in the desired 
language AFTER that.. most other multihost sites won't have a Swedish 
version of the .de site, so this doesn't quit help :(

Thanks anyway,

Martin
>
> On Oct 16, 2008, at 6:50 AM, Martin Bachwerk wrote:
>
>> Hmm, thanks for the ideas,
>>
>> I've checked the requests in Firefox one more time after deleting all 
>> the cookies and both google.com and gizmodo.com do indeed forward me 
>> to the German site without caring about the browser settings.
>>
>> wget shows me that the server does a 302 redirect straight away.. soo..
>
> I'm not sure what you mean by this. In my experiment with wget, Google 
> respects the Accept-Language header. On other words, this returns a 
> Swedish page even though I'm executing it from a U.S. IP address:
>
> wget  "--header=Accept-Language: sv" http://www.google.com/
>
>
> I see the same behavior from urllib2, although my code is slightly 
> different from yours. Here's my code. If I use "sv" in the header I 
> get Swedish, "pl" gives me Polish, etc.  I get the same result when I 
> add your Mozilla user-agent string.
>
> ----------------------------------------
> import urllib2
>
> headers = { "Accept-Language" : "sv" }
>
> req = urllib2.Request("http://www.google.com/", None, headers)
> f = urllib2.urlopen(req)
> content = f.read()
> f.close()
>
> print content
> ----------------------------------------
>
>
> Do you get different results with this same code in Germany?
>
> Cheers
> Philip
>
>
>
>>
>>>
>>> On Oct 15, 2008, at 9:50 AM, Martin Bachwerk wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to load a couple of pages using the urllib2 module. The 
>>>> problem is that I live in Germany and some sites seem to look at 
>>>> the IP of the client and forward him to a localized page.. Here's 
>>>> an example of the code, how I want to access google.com main 
>>>> english page, but get German instead. (For those of you who live in 
>>>> US, you will probably get correct results.. try emulating with 'fr' 
>>>> in accepted languages or something)
>>>>
>>>> opener = urllib2.build_opener()
>>>> opener.addheaders = [('Host', 'www.google.com'), 
>>>> ('Accept-Language','en-gb,en;q=0.5'), ('User-agent', 'Mozilla/5.0 
>>>> (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1) Gecko/2008070208 
>>>> Firefox/3.0.1')]
>>>> webfile = opener.open(url)
>>>
>>> Martin,
>>> It looks to me like what you're sending is correct. Debugging 
>>> suggestions --
>>>
>>> - Set up a Web server on 127.0.0.1 and see what that server receives 
>>> when your Python code connects to it. Maybe you're not sending quite 
>>> what you think.
>>> - Try emulating your Python code with wget or a similar command line 
>>> tool that lets you set headers.
>>> - Sniff the conversation you're having with google using Wireshark. 
>>> Maybe you're getting redirected by the remote server.
>>>
>>> Good luck
>>> Philip
>>>
>>
>
>




More information about the Python-list mailing list