yEnc implementation in Python, bit slow

Wed Aug 6 04:34:39 EDT 2003

On 5 Aug 2003 14:37:55 +1000, Freddie <oinkfreddie at oinkshlick.oinknet> wrote:

>Freddie <oinkfreddie at oinkshlick.oinknet> wrote in
>news:Xns93CE8D81747C5freddiethescaryeleph at 218.100.3.9: 
>
>Arr. There's an error here, the [n+i+256+1] shouldn't have a 1. I always get 
>that wrong :) The posted files actually decode now, and the yEncode() 
>overhead is a lot lower.
>
><snip>
>
>>      encoded = []
>>      n = 0
>>      for i in range(0, len(translated), 256):
>>           chunk = translated[n+i:n+i+256]
>>           if chunk[-1] == '=':
>>                chunk += translated[n+i+256]       <<< this line
>>                n += 1
>>           encoded.append(chunk)
>>           encoded.append('\n')
>
>-- 
>Remove the oinks!

Ok, I was't going to get too involved, but I thought first to try something different,
using a unicode translation table in the attempt to get all the translations in one sweep,
including the = escapes. It works (after hacking a while re encoding stuff ;-/ ), but it's
not that fast. But for small input strings it is faster than yours above, due to some things
which you can see I changed for yEncode2x below. That generally was fastest. I noticed big
differences according to input string length, so I decided to parameterize the testing,
to control the number of loops and the input string length by a steppable factor times
the 256-byte string you defined. Kind of got carried away with histograms to show throughput
comparisons, and ratios etc.

As you might expect, with large inputs, most of the time is spent in the translate routines,
and the setup and other overhead gets asymptoted out (verbing weirds language). The unicode
version's apparent throughput settles relatively early to something fairly constant. I thought
it could possibly beat the other versions because of the multiple replace passes, but not so.
Not sure where the major slowdown is.

On my old 300mhz P2 w/320MB ram and NT4 and Python 2.3, for your 256*1562 test strings,
I get:

[ 1:22] C:\pywk\clp>testyenc.py  10
Ratios of total test times over version 2x:
yEncode2:  1.888619 ratio/2x: 1.008296
yEncode3:  4.692256 ratio/2x: 2.505102
yEncode2x: 1.873080 ratio/2x: 1.000000

I.e., the average is practically identical for 2 and 2x, whereas the unicode (3)
is 2.5 times slower. Now for a minimal input (1 block of 256 bytes input):

[ 1:25] C:\pywk\clp>testyenc.py  10 1
Ratios of total test times over version 2x:
yEncode2:  0.019924 ratio/2x: 11.664868
yEncode3:  0.003890 ratio/2x: 2.277723
yEncode2x: 0.001708 ratio/2x: 1.000000

2x is over ten times faster than 2! And not quite as much faster than the unicode.

Comparing thoughput as ratios of how many times faster x2 is than 2 or 3
for 1-15 x 256 bytes input, 10 loops:

[ 1:25] C:\pywk\clp>testyenc.py -trh 10 1 15
  1  13.254171 x/2: 222222222222222222222222222222222222222222222222222222222222222222
  1   2.377331 x/3: 33333333333
  2   8.490966 x/2: 222222222222222222222222222222222222222222
  2   2.816887 x/3: 33333333333333
  3   6.828756 x/2: 2222222222222222222222222222222222
  3   2.932418 x/3: 33333333333333
  4   5.663449 x/2: 2222222222222222222222222222
  4   2.985414 x/3: 33333333333333
  5   4.859705 x/2: 222222222222222222222222
  5   3.008702 x/3: 333333333333333
  6   4.408558 x/2: 2222222222222222222222
  6   3.136186 x/3: 333333333333333
  7   4.052319 x/2: 22222222222222222222
  7   3.181253 x/3: 333333333333333
  8   3.694041 x/2: 222222222222222222
  8   3.220882 x/3: 3333333333333333
  9   3.462013 x/2: 22222222222222222
  9   3.271515 x/3: 3333333333333333
 10   3.263672 x/2: 2222222222222222
 10   3.283939 x/3: 3333333333333333
 11   3.082912 x/2: 222222222222222
 11   3.298241 x/3: 3333333333333333
 12   2.951584 x/2: 22222222222222
 12   3.319393 x/3: 3333333333333333
 13   2.799223 x/2: 2222222222222
 13   3.486121 x/3: 33333333333333333
 14   2.653915 x/2: 2222222222222
 14   3.383072 x/3: 3333333333333333

Now just kb/sec throughput, for input sizes of factor*256, steppin factor from 1 to 1562 by 100:
Oops, has to hand-adjust for 4 digits in factor (format had 3) :-/

I guess I'd recommend version yEncode2x ;-)

[ 1:30] C:\pywk\clp>testyenc.py -tkh 10 1 1562 100
   1   1466.546 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   1    124.638 : 22
   1    640.256 : 333333333333
 101   2775.813 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 101   2142.211 : 222222222222222222222222222222222222222222
 101    913.269 : 333333333333333333
 201   2699.580 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 201   2307.571 : 2222222222222222222222222222222222222222222222
 201    901.330 : 333333333333333333
 301   2719.819 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 301   2417.963 : 222222222222222222222222222222222222222222222222
 301    898.533 : 33333333333333333
 401   2469.232 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 401   2459.443 : 2222222222222222222222222222222222222222222222222
 401    899.486 : 33333333333333333
 501   2493.177 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 501   2326.776 : 2222222222222222222222222222222222222222222222
 501    824.358 : 3333333333333333
 601   2519.179 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 601   2177.450 : 2222222222222222222222222222222222222222222
 601    867.528 : 33333333333333333
 701   2485.202 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 701   2188.600 : 2222222222222222222222222222222222222222222
 701    840.751 : 3333333333333333
 801   2383.141 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 801   2283.640 : 222222222222222222222222222222222222222222222
 801    853.356 : 33333333333333333
 901   2462.079 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 901   2140.306 : 222222222222222222222222222222222222222222
 901    837.844 : 3333333333333333
1001   2326.644 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1001   2167.332 : 2222222222222222222222222222222222222222222
1001    846.462 : 3333333333333333
1101   2154.616 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1101   2031.622 : 2222222222222222222222222222222222222222
1101    841.033 : 3333333333333333
1201   2015.904 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1201   2008.703 : 2222222222222222222222222222222222222222
1201    830.437 : 3333333333333333
1301   2102.667 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1301   2048.542 : 2222222222222222222222222222222222222222
1301    830.532 : 3333333333333333
1401   2117.665 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1401   2036.031 : 2222222222222222222222222222222222222222
1401    831.369 : 3333333333333333
1501   2106.134 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1501   2030.204 : 2222222222222222222222222222222222222222
1501    823.828 : 3333333333333333

====< testyenc.py >==================================================================
## testyenc.py
## Orig as posted by Freddie + his correction:
## except as noted, and tabs changed to 4 spaces
## Subject: yEnc implementation in Python, bit slow
## From: Freddie <oinkfreddie at oinkshlick.oinknet>
## Message-ID: <Xns93CE3ABF6418freddiethescaryeleph at 218.100.3.9>
## Date: 5 Aug 2003 00:50:58 +1000
## 
## followed by an improved version, followed by an experiment using a unicode translation table
##
def yEncode2(data):
    trans = ''
    for i in range(256):
        trans += chr((i+42)%256)

    translated = data.translate(trans)

    # escape =, NUL, LF, CR
    for i in (61, 0, 10, 13):
        j = '=%c' % (i + 64)
        translated = translated.replace(chr(i), j)

    encoded = []
    n = 0
    for i in range(0, len(translated), 256):
        chunk = translated[n+i:n+i+256]
        if chunk[-1] == '=':
            chunk += translated[n+i+256]    #  <<< this line changed per Freddie's post
#           chunk += translated[n+i+256+1]     <--was
            n += 1
        encoded.append(chunk)
        encoded.append('\n')

    result = ''.join(encoded)

#XXX#    print len(result),
    return result
#---------------------------------------------------
# above version somewhat optimized
def yEncode2x(data,
    trans = ''.join([chr((i+42)%256) for i in xrange(256)]),
    escapes = [(chr(i), '=%c'%(i+64)) for i in (61, 0, 10, 13)]
    ):
    translated = data.translate(trans)
    # escape =, NUL, LF, CR
    for cold, cnew in escapes:
        translated = translated.replace(cold, cnew)

    encoded = []
    start = 0; end = len(translated)
    while start<end:
        stop = start+256
        if translated[stop-1:stop]=='=': stop+=1
        encoded.append(translated[start:stop])
        start = stop
    encoded.append('')
    return '\n'.join(encoded)
#---------------------------------------------------

def yEncode3(data,
    # set up translation tables at def-time (expensive, but only once on import)
    utrans = u''.join([
        # >256 chars don't matter
        (u>=256 and u'\x00') or
        # escape if translated by +42 mod256 is in (=, NUL, LF, CR)
        (((u+42)%256) in (61, 0, 10, 13) and unicode('=%c'%chr((u+42+64)%256), 'utf-16')) or
        # translate others +42 mod 256
        unicode('%c\x00'%chr((u+42)%256),'utf-16') for u in xrange(256*256)]),
    identity = ''.join(map(chr,xrange(256)))
):
    utranslated = unicode(data,'latin-1').translate(utrans)
    no_lf = utranslated.encode('utf-16').translate(identity, '\x00')[2:]
    ret = []; i = 0; eod = len(no_lf)
    while i < eod:
        top = i+256
        if no_lf[top-1:top]=='=': top+=1
        ret.append(no_lf[i:top])
        i = top
    ret.append('')
    return '\n'.join(ret)

def test(nloops, factor=1562):
    from time import clock
    test = [chr(i) for i in xrange(256)]
    teststr = ''.join(test*factor)

    # sanity check
    assert yEncode2(teststr) == yEncode3(teststr) == yEncode2x(teststr)
    assert yEncode2(teststr[:-1]) == yEncode3(teststr[:-1]) == yEncode2x(teststr[:-1])
    assert yEncode2(teststr+'?') == yEncode3(teststr+'?') == yEncode2x(teststr+'?')
    t1 = clock()
    for i in xrange(nloops): dummy = yEncode2(teststr)
    t2 = clock()
    for i in xrange(nloops): dummy = yEncode3(teststr)
    t3 = clock()
    for i in xrange(nloops): dummy = yEncode2x(teststr)
    t4 = clock()
    tenc2 = t2-t1
    tenc3 = t3-t2
    tenc2x = t4-t3
    return tenc2, tenc3, tenc2x

def test2(opt,nloops, factorstart=1562, factorstop=1563, factorstep=1):
    for factor in xrange(factorstart,factorstop,factorstep):
        kb = nloops*256.0*factor/1024.
        tenc2, tenc3, tenc2x = test(nloops, factor)
        tenc2 = kb/tenc2; tenc3 = kb/tenc3; tenc2x = kb/tenc2x
        ratio2x = tenc2x/tenc2; ratio3x=tenc2x/tenc3
        if opt=='-trh':
            print '%3s %10.6f x/2: %s'%(factor, ratio2x,'2'*int(ratio2x*5))
            print '%3s %10.6f x/3: %s'%(factor, ratio3x,'3'*int(ratio3x*5))
        elif opt=='-tkb':
            print '%3s: e2x kb/s = %10.3f, e2 kb/s = %10.3f, e3 kb/s = %10.3f' %(factor,tenc2x, tenc2, tenc3)
        elif opt=='-tkh':
            print '%3s %10.3f : %s'%(factor, tenc2x,'x'*int(tenc2x/50))
            print '%3s %10.3f : %s'%(factor, tenc2,'2'*int(tenc2/50))
            print '%3s %10.3f : %s'%(factor, tenc3,'3'*int(tenc3/50))
        else: raise ValueError, 'test2 only undestands one of: -trh -tkb -tkg'
if __name__ == '__main__':
    def boxit(s): hbar = '-'*len(s); return ' +-%s-+\n | %s |\n +-%s-+' % (hbar, s, hbar)
    import sys
    args = sys.argv[1:]
    try:
        if args and args[0].startswith('-t'):
            test2(args[0], *map(int,args[1:]))
        else:
            tenc2, tenc3, tenc2x = test(*map(int,sys.argv[1:]))
            print (
                'Ratios of total test times over version 2x:\n'
                'yEncode2:  %f ratio/2x: %f\n'
                'yEncode3:  %f ratio/2x: %f\n'
                'yEncode2x: %f ratio/2x: %f' % (
                 tenc2, tenc2/tenc2x, tenc3, tenc3/tenc2x, tenc2x, tenc2x/tenc2x)
            )
    except Exception, e:
        box = boxit('%s: %s'%(e.__class__.__name__, e))
        print """
%(box)s
 Usage: testyenc.py nloops  factor <1562>
            for nloops with factor*<256-byte-test-strings>,
            outputting total times and ratios, or
        testyenc.py opt nloops [start <1562> [stop <1563> [step <1>]]]
            for testing the same with a range of factors, and
        where <> is for defaults and opt:
           -trh    for ratio histogram of enc2x kb/sec over enc2 and enc3 rates
           -tkb    for kb/sec numbers for the three versions 
           -tkh    for a kb/sec histogram
""" % vars()
=====================================================================================

Regards,
Bengt Richter