[New-bugs-announce] [issue35891] urllib.parse.splituser has no suitable replacement

Sun Feb 3 10:11:00 EST 2019

New submission from Jason R. Coombs <jaraco at jaraco.com>:

The removal of splituser (issue27485) has the undesirable effect of leaving the programmer without a suitable alternative. The deprecation warning states to use `urlparse` instead, but `urlparse` doesn't provide the access to the `credential` or `address` components of a URL.

Consider for example:

>>> import urllib.parse
>>> url = 'https://user:password@host:port/path'
>>> parsed = urllib.parse.urlparse(url)
>>> urllib.parse.splituser(parsed.netloc)
('user:password', 'host:port')

It's not readily obvious how one might get those two values, the credential and the address, from `parsed`. Sure, you can get `username` and `password`. You can get `hostname` and `port`. But if what you want is to remove the credential and keep the address, or extract the credential and pass it unchanged as a single string to something like an `_encode_auth` handler, that's no longer possible without some careful handling--because of possible None values, re-assembling a username/password into a colon-separated string is more complicated than simply doing a ':'.join.

This recommendation and limitation led to issues in production code and ultimately the inline adoption of the deprecated function, [summarized here](https://github.com/pypa/setuptools/pull/1670).

I believe if splituser is to be deprecated, the netloc should provide a suitable alternative - namely that a `urlparse` result should supply `address` and `userinfo`. Such functionality would make it easier to transition code that currently relies on splituser for more than to parse out the username and password.

Even better would be for the urlparse result to support `_replace` operations on these attributes... so that one wouldn't have to construct a netloc just to construct a URL that replaces only some portion of the netloc, so one could do something like:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(userinfo=None).geturl()
>>> alt_port = parsed._replace(port=443).geturl()

I realize that because of the nesting of abstractions (namedtuple for the main parts), that maybe this technique doesn't extend nicely, so maybe the netloc itself should provide this extensibility for a usage something like this:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl()
>>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl()

It's not as elegant, but likely simpler to implement, with netloc being extended with a _replace method to support replacing segments of itself (and still immutable)... and is dramatically less error-prone than the status quo without splituser.

In any case, I don't think it's suitable to leave it to the programmer to have to muddle around with their own URL parsing logic. urllib.parse should provide some help here.

----------
components: Library (Lib)
messages: 334793
nosy: jason.coombs
priority: normal
severity: normal
status: open
title: urllib.parse.splituser has no suitable replacement
type: behavior

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35891>
_______________________________________