[Patches] [ python-Patches-1101726 ] Patch for potential buffer overrun in tokenizer.c

Thu Jan 13 16:45:20 CET 2005

Patches item #1101726, was opened at 2005-01-13 06:45
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470

Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch for potential buffer overrun in tokenizer.c

Initial Comment:
The fp_readl function in tokenizer.c has a potential
buffer overrun; see:

www.python.org/sf/1089395

It is also triggered by trying to generate the pycom
file for the Excel 11.0 typelib, which is where I ran
into it.

The attached patch allows successful generation of the
Excel file; it also runs the fail.py script from the
above report without an access violation.  It also
doesn't break any of the tests in the regression suite
(probably something like fail.py should be added as a
test).

It is not as efficient as it might be; with a function
for determining the number of unicode characters in a
utf8 string, you could avoid some memory allocations. 
Perhaps such a function should be added to unicodeobject.c?

And, of course, the patch definitely needs review.  I'm
especially concerned that my use of
tok->decoding_buffer might be violating some kind of
assumption that I missed.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470