[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Fri May 7 14:51:59 EDT 2021

Senthil Kumaran <senthil at uthcode.com> added the comment:

Hello All,

I think, the current striping of ASCII newline and tab is a _reasonable_ solution given it was a security issue. 

It also follows the guidelines of "WHATWG" (Specifically Point 3)

> 2. If input contains any ASCII tab or newline, validation error.
> 3. Remove all ASCII tab or newline from input.

And as per WHATWG, "A validation error does not mean that the parser terminates.  Termination of a parser is always stated explicitly, e.g., through a return statement."

I agree that terms used in spec vs representing it with library code may not be 1:1, but we tried to stay close and followed the existing behavior of widely used clients.

This is a fix, per a security report, and per an evolv{ed,ing} standard recommendation. When dealing with security fixes, there could be simple or more complex migration involvements. 

My reading of the previous message was, even if we raised exception or gave as a parameter, it wont be any better for certain downstream users, as we let the security problem open, and have it only as opt-in fix.

With respect to control 

The comment in the review - https://github.com/python/cpython/pull/25595#pullrequestreview-647122723 was to make these characters available in module level parameters, so that if users prefer to override, they could patch it.

so a revert may not be necessary for the reason of lack of control.

In short, at this moment, I still feel that is reasonable fix at this moment for the problem report, and intention to move closer to whatwg spec.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________