[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Gregory P. Smith report at bugs.python.org
Thu May 6 16:57:20 EDT 2021


Gregory P. Smith <greg at krypto.org> added the comment:

Of note: If we had chosen to raise a ValueError (or similar) for these characters by default, the cloud-init code would also fail to behave as intended today (based on what I see in https://github.com/canonical/cloud-init/commit/c478d0bff412c67280dfe8f08568de733f9425a1)

Recommendation for cloud-init - Do your hostname transformation early using as simple as possible logic.  By virtue of accepting (and encouraging?) invalid characters and transforming them, what you have today that you call urlsplit on is more of a url template, not really a url.  something like this:

```
if m := re.search(r'^(?P<scheme_colon_slashes>[^/:]+://|)(?P<hostname>[^/]+)(?P<rest>/.*)', url_template):
    start, hostname, end = m.groups()
    for transformation in transformations:
        ... fixup hostname ...
    url = f'{start}{hostname}{end}'
else:
    ... # doesn't look like a URL template
```

yes this simplicity would allow your transformations to apply to the :port number.  you could simplify further by including the scheme_colon_slashes in the part transformed.  as your values are coming from user written config files, do you need to care about invalid characters in those transforming into invalid in the scheme or port number - characters in the resulting url anyways?

after that, do you even need urlsplit at all in your `_apply_hostname_transformations_to_url()` function?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________


More information about the Python-bugs-list mailing list