[Mailman-Users] cause of bounces
Grant Taylor
gtaylor at tnetconsulting.net
Wed Oct 18 15:32:48 EDT 2017
On 10/18/2017 01:07 PM, Dimitri Maziuk wrote:
> 17 == 0x11. "17" != "0x11". Which was precisely the point: if your MTA,
> say, does unicodedata.normalize( 'NFKD' ... ), and turns u-umlaut into a
> regular "u", you may consider it benign. Many won't.
I would not consider that benign at all.
I'm referring to the difference between:
- ü - ASCII (?)
- =C3=BC - quoted-printable
- w7w= - base 64
- ü - HTML
All four representations are for the *same* letter / character / glyph /
byte(s).
I consider those to be (effectively) benign content encoding changes. -
Note the content is the same, with the only difference being how it's
encoded.
> Most importantly, crypto signature will change, and DKIM check will fail.
DKIM, by design will fail if anything that is signed changes. (See the
ROPEMAKER attack for a better explanation about anything signed.)
> Benign is in the eye of the beholder.
~eh~ ... Okay.
> We're inserting this stuff into a
> database where a search for "Wutrich" will find neither "Wütrich" nor
> "W\u0308trich" so I wouldn't consider it benign at all.
I do not consider "Wutrich" and "Wütrich" to be the same string. The
former may be considered a poor representation of the latter.
I'm not sure which Unicode code point 308 is, but I doubt that it is the
same as <ü> 252, Hex 00fc, Octal 374. (I would have to look it up to
know for sure.)
I would hope that data would be normalized to the same encoding in the
database. I.e. "=C3=BC" (quoted-printable) would be normalized to "ü"
and stored in the database as such.
I would further hope that any search of the database would be able to do
something like a character class (type) search so that it could match on
"W[üu]trich". (Adjust as necessary.)
--
Grant. . . .
unix || die
More information about the Mailman-Users
mailing list