[issue28945] get_boundary invokes unquote twice

Mon Dec 19 19:20:28 EST 2016

bpoaugust added the comment:

I agree that strictly speaking the boundary is invalid.
However:
'Be strict in what you generate, be liberal in what you accept'
The mail package should never create such boundaries.
However it should process them if possible.

If the boundary definition is mangled by stripping out all the invalid characters, then it won't match the markers. So it won't solve the issue.

Whereas ensuring that only a single level of quotes is removed does fix the issue.

This is what works for me:

def get_boundary(self, failobj=None):
    missing = object()
    # get_param removes the outer quotes
    boundary = self.get_param('boundary', missing)
    if boundary is missing:
        return failobj
    # Don't unquote again in collapse_rfc2231_value
    if not isinstance(boundary, tuple) or len(boundary) != 3:
        return boundary
    # RFC 2046 says that boundaries may begin but not end in w/s
    return utils.collapse_rfc2231_value(boundary).rstrip()

I think the bug is actually in collapse_rfc2231_value - that should not do any unquoting, as the value will be passed to it already unquoted (at least in this case). However there might perhaps be some cases where collapse_rfc2231_value is called with a quoted value, so to fix the immediate problem it's safer to fix get_boundary. (I could have re-quoted the value instead, and let collapse_rfc2231_value do its thing.)

unquote is correct as it stands - it should only remove the outer quotes. There may be a need to quote strings that just happen to be enclosed in quote chars.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28945>
_______________________________________