[issue30564] Base64 decoding gives incorrect outputs.

Sun Jun 4 11:28:17 EDT 2017

Gareth Rees added the comment:

RFC 4648 section 3.5 says:

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.

If decoders may choose to reject non-canonical encodings, then they may also
choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think
Python's behaviour is conforming to the standard.

----------
nosy: +gdr at garethrees.org

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30564>
_______________________________________