[New-bugs-announce] [issue23997] unicodedata_UCD_lookup() has theoretical buffer overflow

Sun Apr 19 00:32:38 CEST 2015

New submission from Christian Heimes:

Coverity has found a potential buffer overflow in the unicodedata module. The function call _getcode() which calls _cmpname(). _cmpname() copies data into fixed size buffer of length NAME_MAXLEN. Neither lookup() nor _getcode() limit name_length to NAME_MAXLEN. Therefore the buffer could theoretical overflow.

In practice the buffer overflow can't be abused because Tools/unicode/makeunicodedata.py already limits max name length. I still like to fix the bug because it is a low hanging fruit. In most versions of Python the code already checks that name_length fits in INT_MAX.

CID 1295028 (#1 of 1): Out-of-bounds access (OVERRUN)
overrun-call: Overrunning callee's array of size 256 by passing argument (int)name_length (which evaluates to 2147483647) in call to _getcode

----------
files: unicode_name_maxlen.patch
keywords: patch
messages: 241461
nosy: benjamin.peterson, christian.heimes, ezio.melotti, haypo, lemburg, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: unicodedata_UCD_lookup() has theoretical buffer overflow
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file39109/unicode_name_maxlen.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23997>
_______________________________________