Getting a 401 from requests.get, but not when logging in via the browser.

Tue Apr 21 16:25:37 EDT 2020

> On 21 Apr 2020, at 20:47, dcwhatthe at gmail.com wrote:
> 
> On Tuesday, April 21, 2020 at 3:16:51 PM UTC-4, Barry Scott wrote:
>>> On 21 Apr 2020, at 18:11, dc wrote:
>>> 
>>> On Tuesday, April 21, 2020 at 12:40:25 PM UTC-4, Dieter Maurer wrote:
>>>> dc wrote at 2020-4-20 14:48 -0700:
>>>>> ...
>>>>> I tried telneting the landing page, i.e. without the specific node that requires the login.  So e.g.
>>>>> 
>>>>> Telnet thissite.oh.gov 80
>>>>> 
>>>>> , but it returns a 400 Bad Request.  Before that, the Telnet screen is completely blank ; I have to press a key before it returns the Bad Request.
>>>>> 
>>>>> 
>>>>> Roger on knowing what the site is asking for.  But I don't know how to determine that.
>>>> 
>>>> I use `wget -S` to learn about server responses.
>>>> I has the advantage (over `telnet`) to know the HTTP protocl.
>>> 
>>> Sure enough, wget DOES return a lot of information.  In fact, although an initial response of 401 is returned, it waits for the response and finally returns a 200.
>>> 
>>> So, I guess the question finally comes down to:  How do we make the requests.get() wait for a response?  The timeout value isn't the same thing that I thought it was.  So how do we tell .get() to wait 20 or 30 seconds for an OK response?
>> 
>> The way HTTP protocol works is that you send a request and get a response. 1 in 1 out.
>> The response can tell you that you need to do more work, like add authentication data.
>> 
>> The only use of the timeout is to allow you to give up if a response does not comeback
>> before you get bored waiting.
>> 
>> In the case of the 401 you can read what it means here: https://httpstatuses.com/401
>> 
>> It is then up to your code to issue a new request with the requirer authentication headers.
>> The headers you got back in the first response will tell you what type of authentication is requires,
>> basic, digest etc.
>> 
>> The library you are using should be able to handle this if you provide what the library requires from
>> you to do the authenticate.
>> 
>> Personally I debug stuff using the curl command. curl -v <url> shows you the request and the response.
>> You can then add curl options to provide authenicate data (username/password) and how to use it --basic
>> and --digest for example.
>> 
>> Oh and the other status that needs handling is a 302 redirect. This allows a web site to more a page
>> and tell you the new location. Again you have to allow your library to do this for you.
>> 
>> Barry
>> 
>> 
>> 
>>> 
>>> -- 
>>> https://mail.python.org/mailman/listinfo/python-list
>>> 
> 
> Barry, Thanks.  I'm starting to get a bigger picture, now.
> 
> So I really do need to raise the status, in order to get the headers  I had put this in orginally, but then thought it wasn't necessary.

In a response you always get a status line, headers and a body. In case of a response that is not a 200 there is often
important information in the headers. The body is usually for showing to humans when the program does not know
how to handle the status code.

> 
> So in the case of this particular site, if I understand correctly, I would be using the NTLM to decide which type of Authentication to follow up with (I think).
> 
> Content-Length:          1293
> Content-Type:            text/html
> WWW-Authenticate:        Negotiate, NTLM

Yep that is right. The site wants you to use NTLM to authenticate with it.
NTLM is not always supported, you will need to check your library docs to see if it supports NTLM.

Barry

> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>