[Mailman-Users] cause of bounces

Wed Oct 18 15:32:48 EDT 2017

On 10/18/2017 01:07 PM, Dimitri Maziuk wrote:
> 17 == 0x11. "17" != "0x11". Which was precisely the point: if your MTA, 
> say, does unicodedata.normalize( 'NFKD' ... ), and turns u-umlaut into a 
> regular "u", you may consider it benign. Many won't.

I would not consider that benign at all.

I'm referring to the difference between:

  - ü      - ASCII (?)
  - =C3=BC - quoted-printable
  - w7w=   - base 64
  - ü - HTML

All four representations are for the *same* letter / character / glyph / 
byte(s).

I consider those to be (effectively) benign content encoding changes.  - 
  Note the content is the same, with the only difference being how it's 
encoded.

> Most importantly, crypto signature will change, and DKIM check will fail.

DKIM, by design will fail if anything that is signed changes.  (See the 
ROPEMAKER attack for a better explanation about anything signed.)

> Benign is in the eye of the beholder.

~eh~ ... Okay.

> We're inserting this stuff into a
> database where a search for "Wutrich" will find neither "Wütrich" nor 
> "W\u0308trich" so I wouldn't consider it benign at all.

I do not consider "Wutrich" and "Wütrich" to be the same string.  The 
former may be considered a poor representation of the latter.

I'm not sure which Unicode code point 308 is, but I doubt that it is the 
same as <ü> 252, Hex 00fc, Octal 374.  (I would have to look it up to 
know for sure.)

I would hope that data would be normalized to the same encoding in the 
database.  I.e. "=C3=BC" (quoted-printable) would be normalized to "ü" 
and stored in the database as such.

I would further hope that any search of the database would be able to do 
something like a character class (type) search so that it could match on 
"W[üu]trich".  (Adjust as necessary.)

-- 
Grant. . . .
unix || die