[New-bugs-announce] [issue31170] expat: utf8_toUtf8 cannot properly handle exhausting buffer
Lin Tian
report at bugs.python.org
Thu Aug 10 00:48:25 EDT 2017
New submission from Lin Tian:
utf8_toUtf8(const ENCODING *UNUSED_P(enc),
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
char *to;
const char *from;
const char *fromLimInitial = fromLim;
/* Avoid copying partial characters. */
align_limit_to_full_utf8_characters(*fromP, &fromLim);
for (to = *toP, from = *fromP; (from < fromLim) && (to < toLim); from++, to++)
*to = *from;
*fromP = from;
*toP = to;
if (fromLim < fromLimInitial)
return XML_CONVERT_INPUT_INCOMPLETE;
else if ((to == toLim) && (from < fromLim))
// <===== Bug is here. In case (to == toLim), it's possible that
// from is still pointing to partial character. For example,
// a character with 3 bytes (A, B, C) and form is pointing to C.
// It means only A and B is copied to output buffer. Next
// scanning will start with C which could be considered as invalid
// byte and got dropped. After this, only "AB" is kept in memory
// and thus it will lead to invalid continuation byte.
return XML_CONVERT_OUTPUT_EXHAUSTED;
else
return XML_CONVERT_COMPLETED;
}
----------
components: Library (Lib)
messages: 300043
nosy: Lin Tian
priority: normal
severity: normal
status: open
title: expat: utf8_toUtf8 cannot properly handle exhausting buffer
type: behavior
versions: Python 3.6, Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue31170>
_______________________________________
More information about the New-bugs-announce
mailing list