[Patches] [ python-Patches-549375 ] Compromise PyUnicode_EncodeUTF8
noreply@sourceforge.net
noreply@sourceforge.net
Sat, 27 Apr 2002 11:05:13 -0700
Patches item #549375, was opened at 2002-04-27 00:35
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=549375&group_id=5470
Category: Core (C code)
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Tim Peters (tim_one)
Summary: Compromise PyUnicode_EncodeUTF8
Initial Comment:
This combines various ideas from Python-Dev. It
overallocates, but:
1) For short strings it does the conversion into a
stack buffer, and allocates exactly as much string
space as it turns out it needs at the end. So it
should be faster, but not waste any small-block memory.
2) For long strings it knows it's going to end up in
the system malloc/realloc, so it asks for the maximum
possibly needed at the start, returning the excess
untouched at the end. This gets rid of all the
embedded "but did I really get enough memory yet?"
tests and reallocations.
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2002-04-27 14:05
Message:
Logged In: YES
user_id=31435
I added runtime release-build verification that 4*size
doesn't overflow a C int, and cleaned up the patch a
little. Since you and Martin both seem basically happy
with it, I just checked it in:
Objects/unicodeobject.c new revision: 2.146
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-04-27 13:41
Message:
Logged In: YES
user_id=31435
Well, the overallocation is exactly the same whether it's
on the stack or on the heap: where size is the # of
Unicode characters, it's guaranteed that 4*size bytes are
available for writing. The PyString_xyz routines guarantee
to make an additional byte available to store a trailing
\0, and indeed they add a trailing \0 automatically.
So the only question remaining is whether 4*size is a
correct upper bound. I think it's clear enough from your
code that it is, and so I'm happy to leave verification of
that to the debug build. What it could use more is runtime
release-build verfication that 4*size doesn't overflow a C
int.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2002-04-27 10:53
Message:
Logged In: YES
user_id=38388
Cool. I like it.
You better make sure the stack buffer doesn't overrun though
-- I've only skimmed the implementation, but would suggest
to an explicit test for this which is not only executed in
the debug build.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=549375&group_id=5470