sorting 1172026 entries

Sun May 6 20:31:14 EDT 2012

On 06May2012 17:10, Chris Rebert <clp2 at rebertia.com> wrote:
| On Sun, May 6, 2012 at 4:54 PM, Cameron Simpson <cs at zip.com.au> wrote:
| > On 06May2012 18:36, J. Mwebaze <jmwebaze at gmail.com> wrote:
| > | > for filename in txtfiles:
| > | >    temp=[]
| > | >    f=open(filename)
| > | >    for line in f.readlines():
| > | >      line = line.strip()
| > | >      line=line.split()
| > | >      temp.append((parser.parse(line[0]), float(line[1])))
| >
| > Have you timed the different parts of your code instead of the whole
| > thing?
| >
| > Specificly, do you know the sort time is the large cost?
| >
| > I would point out that the loop above builds the list by append(), one
| > item at a time. That should have runtime cost of the square of the list
| > length, 1172026 * 1172026. Though I've just done this:
| 
| Er, what? list.append() is O(1) amortized.

I didn't mean per .append() call (which I'd expect to be O(n) for large
n), I meant overall for the completed list.

Don't the realloc()s make it O(n^2) overall for large n? The list
must get copied when the underlying space fills. I confess to being
surprised at how quick it went for me though. I suppose reallocing in
chunks (eg to double the available size) might conceal this for a while.
It should still be well over O(n) overall (not amortised per .append()
call).

| Perhaps you're confusing list.append() with list.insert(), which is indeed O(n)?

I wasn't, though .insert() is surely way more expensive.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The Borg assimilated my race and all I got was this lousy tagline.
        - Cath Lawrence <Cath_Lawrence at premium.com.au>