[New-bugs-announce] [issue22264] Add wsgiref.fix_encoding

Sun Aug 24 14:45:41 CEST 2014

New submission from Nick Coghlan:

The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: http://www.python.org/dev/peps/pep-3333/#unicode-issues

This means that many WSGI headers will in fact contain *improperly encoded data*. Developers working directly with WSGI (rather than using a WSGI framework like Django, Flask or Pyramid) need to convert those strings back to bytes and decode them properly before passing them on to user applications.

I suggest adding a simple "fix_encoding" function to wsgiref that covers this:

    def fix_encoding(data, encoding, errors="surrogateescape"):
        return data.encode("latin-1").decode(encoding, errors)

The primary intended benefit is to WSGI related code more self-documenting. Compare the proposal with the status quo:

    data = wsgiref.fix_encoding(data, "utf-8")
    data = data.encode("latin-1").decode("utf-8", "surrogateescape")

The proposal hides the mechanical details of what is going on in order to emphasise *why* the change is needed, and provides you with a name to go look up if you want to learn more.

The latter just looks nonsensical unless you're already familiar with this particular corner of the WSGI specification.

----------
messages: 225814
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: Add wsgiref.fix_encoding
type: enhancement
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22264>
_______________________________________