[Patches] [ python-Patches-536661 ] splitext performances improvement

noreply@sourceforge.net noreply@sourceforge.net
Fri, 29 Mar 2002 10:56:33 -0800


Patches item #536661, was opened at 2002-03-29 03:06
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Sebastien Keim (s_keim)
Assigned to: Nobody/Anonymous (nobody)
Summary: splitext performances improvement

Initial Comment:
After more thought, I must admit that the behavior change in splitext, I proposed with patch 536120 is not acceptable. So I would instead propose this one which should only improve performances without modifying behavior.
The following bench says that patched splitext is between 2x(for l1) and 25x(for l2) faster than the original one.

The diff patch also test_posixpath.py to check the pitfall described by Tim comments in patch 536120 page.

def splitext(p):
    root, ext = '', ''
    for c in p:
        if c == '/':
            root, ext = root + ext + c, ''
        elif c == '.':
            if ext:
                root, ext = root + ext, c
            else:
                ext = c
        elif ext:
            ext = ext + c
        else:
            root = root + c
    return root, ext

def splitext2(p):
    i = p.rfind('.')
    if i<=p.rfind('/'):
        return p, ''
    else:
        return p[:i], p[i:]

l1 = ('t','.t','a.b/','a.b','/a.b','a.b/.c','a.b/c.d')

l2 = (
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut.',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/.tyyttyt',
'usr/tmp.doc/list/home/sebastien/foo/bar/hghgt/yttyutyuyuttyuyut',
'reeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeyttyutyuyuttyuyut.tyyttyt',
'/iuouiiuuoiiuiikhjzekezhjzekejkejkzejkhejkhzejzehjkhjezhjkehzkhjezh.tyyttyt'
    )

for i in l1+l2:
    assert splitext2(i) == splitext(i)

import time

def test(f,args):
    t = time.clock()
    for p in args:
        for i in range(1000):
            f(p)
    return time.clock() - t

def f(p):pass

a=test(splitext, l1)
b=test(splitext2, l1)
c=test(f,l1)
print a,b,c,(a-c)/(b-c)

a=test(splitext, l2)
b=test(splitext2, l2)
c=test(f,l2)
print a,b,c,(a-c)/(b-c)


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-03-29 13:56

Message:
Logged In: YES 
user_id=31435

I like it fine so far as it goes, but I'd like it a lot 
more if it also patched the splitext and test 
implementations for other platforms.  It's not good that, 
e.g., posixpath.py and ntpath.py get more and more out of 
synch over time, and that their test suites also diverge.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-29 04:49

Message:
Logged In: YES 
user_id=21627

The patch looks good to me.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536661&group_id=5470