[Python-Dev] Re: [Python-checkins] python/dist/src/Objectsunicodeobject.c,2.139,2.140

M.-A. Lemburg mal@lemburg.com
Sun, 21 Apr 2002 16:31:10 +0200


Tim Peters wrote:
> 
> I expect Martin checked in this change because of the unhappy hours he spent
> determining that the previous two versions of this function wrote beyond the
> memory they allocated.  Since the most recent version still didn't bother to
> assert that it wasn't writing out of bounds, I can't blame Martin for
> checking in a version that does so assert; since I spent hours on this too,
> and this function has a repeated history of bad memory behavior, I viewed
> the version Martin replaced as unacceptable.

Are you sure, you're talking about the latest version I checked in ?
I spent hours on this too and I'm pretty sure to have fixed the
buffer overruns now. 

> However, the slowdown on large strings isn't attractive, and the previous
> version could easily enough have asserted its memory correctness.

So, why not just add the assert to my original version ?
 
> > -----Original Message-----
> > From: python-checkins-admin@python.org
> > [mailto:python-checkins-admin@python.org]On Behalf Of M.-A. Lemburg
> > Sent: Saturday, April 20, 2002 11:26 AM
> > To: loewis@sourceforge.net
> > Cc: python-checkins@python.org
> > Subject: Re: [Python-checkins] python/dist/src/Objects
> > unicodeobject.c,2.139,2.140
> >
> >
> > loewis@sourceforge.net wrote:
> >>
> >> Update of /cvsroot/python/python/dist/src/Objects
> >> In directory usw-pr-cvs1:/tmp/cvs-serv30961
> >>
> >> Modified Files:
> >>         unicodeobject.c
> >> Log Message:
> >> Patch #495401: Count number of required bytes for encoding UTF-8
> >> before allocating the target buffer.
> >
> > Martin, please back out this change again. We have discussed this
> > quite a few times and I am against using your strategy since
> > it introduces a performance hit which does not relate to the
> > gained advantage of (temporarily) using less memory.
> >
> > Your timings also show this, so I wonder why you checked in this
> > patch, e.g. from the patch log:
> > """
> > For the current
> > CVS (unicodeobject.c 2.136: MAL's change to use a variable
> > overalloc), I get
> >
> > 10 spaces                      20.060
> > 100 spaces                     2.600
> > 200 spaces                     2.030
> > 1000 spaces                    0.930
> > 10000 spaces                   0.690
> > 10 spaces, 3 bytes             23.520
> > 100 spaces, 3 bytes            3.730
> > 200 spaces, 3 bytes            2.470
> > 1000 spaces, 3 bytes           0.980
> > 10000 spaces, 3 bytes          0.690
> > 30 bytes                       24.800
> > 300 bytes                      5.220
> > 600 bytes                      3.830
> > 3000 bytes                     2.480
> > 30000 bytes                    2.230
> >
> > With unicode3.diff (that's the one you checked in), I get
> >
> > 10 spaces                      19.940
> > 100 spaces                     3.260
> > 200 spaces                     2.340
> > 1000 spaces                    1.650
> > 10000 spaces                   1.450
> > 10 spaces, 3 bytes             21.420
> > 100 spaces, 3 bytes            3.410
> > 200 spaces, 3 bytes            2.420
> > 1000 spaces, 3 bytes           1.660
> > 10000 spaces, 3 bytes          1.450
> > 30 bytes                       22.260
> > 300 bytes                      5.830
> > 600 bytes                      4.700
> > 3000 bytes                     3.740
> > 30000 bytes                    3.540
> > """
> >
> > The only case where your patch is faster is for very short
> > strings and then only by a few percent, whereas for all
> > longer strings you get worse timings, e.g. 3.74 seconds
> > compared to 2.48 seconds -- that's a 50% increase in
> > run-time !
> >
> > Thanks,
> > --
> > Marc-Andre Lemburg
> > CEO eGenix.com Software GmbH
> > ______________________________________________________________________
> > Company & Consulting:                           http://www.egenix.com/
> > Python Software:                   http://www.egenix.com/files/python/
> >
> >
> > _______________________________________________
> > Python-checkins mailing list
> > Python-checkins@python.org
> > http://mail.python.org/mailman/listinfo/python-checkins

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/