[Patches] [ python-Patches-549375 ] Compromise PyUnicode_EncodeUTF8

noreply@sourceforge.net noreply@sourceforge.net
Sat, 27 Apr 2002 11:05:13 -0700


Patches item #549375, was opened at 2002-04-27 00:35
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=549375&group_id=5470

Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Tim Peters (tim_one)
Summary: Compromise PyUnicode_EncodeUTF8

Initial Comment:
This combines various ideas from Python-Dev.  It 
overallocates, but:

1) For short strings it does the conversion into a 
stack buffer, and allocates exactly as much string 
space as it turns out it needs at the end.  So it 
should be faster, but not waste any small-block memory.

2) For long strings it knows it's going to end up in 
the system malloc/realloc, so it asks for the maximum 
possibly needed at the start, returning the excess 
untouched at the end.  This gets rid of all the 
embedded "but did I really get enough memory yet?" 
tests and reallocations.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-04-27 14:05

Message:
Logged In: YES 
user_id=31435

I added runtime release-build verification that 4*size 
doesn't overflow a C int, and cleaned up the patch a 
little.  Since you and Martin both seem basically happy 
with it, I just checked it in:

Objects/unicodeobject.c new revision: 2.146

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-04-27 13:41

Message:
Logged In: YES 
user_id=31435

Well, the overallocation is exactly the same whether it's 
on the stack or on the heap:  where size is the # of 
Unicode characters, it's guaranteed that 4*size bytes are 
available for writing.  The PyString_xyz routines guarantee 
to make an additional byte available to store a trailing 
\0, and indeed they add a trailing \0 automatically.

So the only question remaining is whether 4*size is a 
correct upper bound.  I think it's clear enough from your 
code that it is, and so I'm happy to leave verification of 
that to the debug build.  What it could use more is runtime 
release-build verfication that 4*size doesn't overflow a C 
int.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-04-27 10:53

Message:
Logged In: YES 
user_id=38388

Cool. I like it.

You better make sure the stack buffer doesn't overrun though
-- I've only skimmed the implementation, but would suggest
to an explicit test for this which is not only executed in
the debug build.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=549375&group_id=5470