Getting a 401 from requests.get, but not when logging in via the browser.

Tue Apr 21 16:48:14 EDT 2020

On Tuesday, April 21, 2020 at 4:38:52 PM UTC-4, Chris Angelico wrote:
> On Wed, Apr 22, 2020 at 6:30 AM Barry Scott barr wrote:
> >
> >
> >
> > > On 21 Apr 2020, at 20:47, dc wrote:
> > >
> > > On Tuesday, April 21, 2020 at 3:16:51 PM UTC-4, Barry Scott wrote:
> > >>> On 21 Apr 2020, at 18:11, dc wrote:
> > >>>
> > >>> On Tuesday, April 21, 2020 at 12:40:25 PM UTC-4, Dieter Maurer wrote:
> > >>>> dc wrote at 2020-4-20 14:48 -0700:
> > >>>>> ...
> > >>>>> I tried telneting the landing page, i.e. without the specific node that requires the login.  So e.g.
> > >>>>>
> > >>>>> Telnet thissite.oh.gov 80
> > >>>>>
> > >>>>> , but it returns a 400 Bad Request.  Before that, the Telnet screen is completely blank ; I have to press a key before it returns the Bad Request.
> > >>>>>
> > >>>>>
> > >>>>> Roger on knowing what the site is asking for.  But I don't know how to determine that.
> > >>>>
> > >>>> I use `wget -S` to learn about server responses.
> > >>>> I has the advantage (over `telnet`) to know the HTTP protocl.
> > >>>
> > >>> Sure enough, wget DOES return a lot of information.  In fact, although an initial response of 401 is returned, it waits for the response and finally returns a 200.
> > >>>
> > >>> So, I guess the question finally comes down to:  How do we make the requests.get() wait for a response?  The timeout value isn't the same thing that I thought it was.  So how do we tell .get() to wait 20 or 30 seconds for an OK response?
> > >>
> > >> The way HTTP protocol works is that you send a request and get a response. 1 in 1 out.
> > >> The response can tell you that you need to do more work, like add authentication data.
> > >>
> > >> The only use of the timeout is to allow you to give up if a response does not comeback
> > >> before you get bored waiting.
> > >>
> > >> In the case of the 401 you can read what it means here: https://httpstatuses.com/401
> > >>
> > >> It is then up to your code to issue a new request with the requirer authentication headers.
> > >> The headers you got back in the first response will tell you what type of authentication is requires,
> > >> basic, digest etc.
> > >>
> > >> The library you are using should be able to handle this if you provide what the library requires from
> > >> you to do the authenticate.
> > >>
> > >> Personally I debug stuff using the curl command. curl -v <url> shows you the request and the response.
> > >> You can then add curl options to provide authenicate data (username/password) and how to use it --basic
> > >> and --digest for example.
> > >>
> > >> Oh and the other status that needs handling is a 302 redirect. This allows a web site to more a page
> > >> and tell you the new location. Again you have to allow your library to do this for you.
> > >>
> > >> Barry
> > >>
> > >>
> > >>
> > >>>
> > >>> --
> > >>> https://mail.python.org/mailman/listinfo/python-list
> > >>>
> > >
> > > Barry, Thanks.  I'm starting to get a bigger picture, now.
> > >
> > > So I really do need to raise the status, in order to get the headers  I had put this in orginally, but then thought it wasn't necessary.
> >
> > In a response you always get a status line, headers and a body. In case of a response that is not a 200 there is often
> > important information in the headers. The body is usually for showing to humans when the program does not know
> > how to handle the status code.
> >
> > >
> > > So in the case of this particular site, if I understand correctly, I would be using the NTLM to decide which type of Authentication to follow up with (I think).
> > >
> > > Content-Length:          1293
> > > Content-Type:            text/html
> > > WWW-Authenticate:        Negotiate, NTLM
> >
> > Yep that is right. The site wants you to use NTLM to authenticate with it.
> > NTLM is not always supported, you will need to check your library docs to see if it supports NTLM.
> >
> 
> I believe the 'requests' library supports NTLM, although I haven't
> personally used it so I can't check.
> 
> ChrisA

In this case, I used the requests_ntlm import:

    from requests_ntlm import HttpNtlmAuth