Issue587076
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2002-07-26 15:51 by tim.peters, last changed 2022-04-10 16:05 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
merge.patch | tim.peters, 2002-07-26 15:51 | adds .mosrt() and .hsort() | ||
merge4.patch | tim.peters, 2002-07-30 15:42 | Version 4 of the listobject.c patch | ||
timsort.txt | tim.peters, 2002-07-31 20:37 | Plain text doc file |
Messages (26) | |||
---|---|---|---|
msg40650 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-26 15:51 | |
This adds method list.msort([compare]). Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via python -O sortperf.py 15 20 1 That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead. CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files. If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers. As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it. This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get <wink>. |
|||
msg40651 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2002-07-26 16:23 | |
Logged In: YES user_id=35752 AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70 |
|||
msg40652 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-26 16:30 | |
Logged In: YES user_id=31435 Wow! Thanks, Neil! That's impressive, even if I say so myself <wink>. |
|||
msg40653 - (view) | Author: Kevin Jacobs (jacobs99) | Date: 2002-07-26 16:54 | |
Logged In: YES user_id=459565 Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205 samplesort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31 4.22 timsort: i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31 0.66 |
|||
msg40654 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-26 17:52 | |
Logged In: YES user_id=31435 Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc". samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 samplesort again (run twice by mistake) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 |
|||
msg40655 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-26 18:54 | |
Logged In: YES user_id=31435 Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04 0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19 0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38 0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78 0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60 0.35 0.73 |
|||
msg40656 - (view) | Author: Skip Montanaro (skip.montanaro) * | Date: 2002-07-26 19:50 | |
Logged In: YES user_id=44345 Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96 L.sort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96 L.msort(): i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74 |
|||
msg40657 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-26 20:38 | |
Logged In: YES user_id=31435 Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit. Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup. I'm not sure what to make of that, but we needed some real life data here <wink>. |
|||
msg40658 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-27 01:24 | |
Logged In: YES user_id=31435 I attached timsort.txt, a plain-text detailed description of the algorithm. After I dies, it's the only clue that will remain <wink>. |
|||
msg40659 - (view) | Author: Anthony Baxter (anthonybaxter) | Date: 2002-07-27 08:20 | |
Logged In: YES user_id=29957 Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7. (sort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.29 0.03 0.02 0.29 0.03 0.09 0.02 0.31 16 65536 0.66 0.05 0.05 0.68 0.05 0.20 0.05 0.71 17 131072 1.50 0.11 0.11 1.51 0.12 0.47 0.11 1.60 18 262144 3.25 0.23 0.22 3.37 0.25 1.18 0.22 3.52 19 524288 6.88 0.45 0.43 7.30 0.51 1.91 0.43 7.43 20 1048576 14.90 0.92 0.88 15.49 1.05 3.89 0.90 16.04 (timsort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.28 0.02 0.02 0.03 0.02 0.13 0.02 0.05 16 65536 0.59 0.05 0.05 0.06 0.05 0.26 0.05 0.11 17 131072 1.33 0.10 0.09 0.11 0.11 0.54 0.10 0.21 18 262144 2.92 0.22 0.20 0.22 0.21 1.10 0.20 0.44 19 524288 6.33 0.44 0.42 0.43 0.43 2.21 0.41 0.90 20 1048576 13.56 0.89 0.85 0.84 0.87 4.51 0.87 1.82 |
|||
msg40660 - (view) | Author: Anthony Baxter (anthonybaxter) | Date: 2002-07-27 11:23 | |
Logged In: YES user_id=29957 PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96 (samplesort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.07 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.17 0.02 0.06 0.01 0.19 17 131072 0.41 0.04 0.04 0.41 0.04 0.16 0.04 0.44 18 262144 0.93 0.09 0.08 0.90 0.10 0.33 0.08 0.97 19 524288 2.04 0.18 0.16 1.98 0.23 0.69 0.17 2.13 20 1048576 4.49 0.36 0.34 4.52 0.43 1.44 0.33 4.65 (timsort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.00 0.01 0.04 0.00 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.42 0.03 0.04 0.04 0.04 0.14 0.03 0.08 18 262144 0.95 0.08 0.08 0.09 0.08 0.30 0.07 0.17 19 524288 2.08 0.17 0.16 0.17 0.17 0.63 0.17 0.34 20 1048576 4.56 0.33 0.33 0.33 0.35 1.29 0.33 0.71 PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4 (samplesort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.08 0.00 0.02 0.01 0.08 16 65536 0.18 0.01 0.02 0.18 0.01 0.06 0.02 0.19 17 131072 0.41 0.04 0.04 0.39 0.04 0.16 0.04 0.44 18 262144 0.94 0.08 0.08 0.91 0.10 0.33 0.07 0.95 19 524288 2.05 0.17 0.16 2.07 0.20 0.70 0.16 2.11 20 1048576 4.50 0.34 0.32 4.30 0.42 1.41 0.32 4.61 (timsort) i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.09 0.01 0.00 0.01 0.01 0.04 0.01 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.41 0.04 0.04 0.04 0.03 0.14 0.03 0.08 18 262144 0.93 0.08 0.07 0.08 0.08 0.31 0.08 0.16 19 524288 2.07 0.15 0.15 0.16 0.16 0.63 0.16 0.34 20 1048576 4.54 0.33 0.31 0.32 0.33 1.28 0.32 0.67 |
|||
msg40661 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-28 18:08 | |
Logged In: YES user_id=31435 Deleting old doc file. |
|||
msg40662 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-28 18:09 | |
Logged In: YES user_id=31435 Adding new doc file. |
|||
msg40663 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-28 18:14 | |
Logged In: YES user_id=31435 Adding new patch, merge2.patch. Most of this is semantically neutral compared to the last version -- more asserts, better comments, minor code fiddling for clarity, got rid of the weak heapsort. There is one useful change, extracting more info out of the pre-merge "find the endpoints" searches. This helps "in theory" most of the time, but probably not enough to measure. In some odd cases it can help a lot, though. See Python-Dev for discussion. There's no strong reason to time this stuff again, if you already did it once (and thanks to those who did!). |
|||
msg40664 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-28 19:00 | |
Logged In: YES user_id=31435 Just van Rossum 400Mhz G4 PowerPC running MacOSX 10.1.5. original patch From an email report; I chopped the "n" column and removed some whitespace so it's easier to read on SF. L.sort() i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.28 0.03 0.02 0.29 0.03 0.10 0.02 0.31 16 0.65 0.05 0.04 0.65 0.06 0.20 0.05 0.71 17 1.47 0.11 0.12 1.53 0.13 0.50 0.10 1.54 18 3.19 0.24 0.25 3.19 0.29 0.98 0.23 3.39 19 6.96 0.52 0.48 7.11 0.55 2.00 0.45 7.48 20 15.15 0.99 0.94 15.96 1.12 4.20 1.02 16.32 L.msort() i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.31 0.03 0.02 0.02 0.03 0.11 0.02 0.04 16 0.64 0.04 0.04 0.05 0.05 0.25 0.06 0.11 17 1.42 0.14 0.13 0.10 0.12 0.51 0.12 0.20 18 3.01 0.26 0.21 0.23 0.22 1.07 0.19 0.46 19 6.54 0.51 0.44 0.47 0.45 2.17 0.45 0.90 20 14.27 0.98 0.96 0.96 0.96 4.34 0.95 2.04 |
|||
msg40665 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-28 19:28 | |
Logged In: YES user_id=31435 Dang! That little optimization introduced a subtle assumption that the comparison function is consistent. We can't assume that in Python (user-supplied functions can be arbitrarily goofy). Deleted merge2.patch and added merge3.patch to repair that. |
|||
msg40666 - (view) | Author: Andrew I MacIntyre (aimacintyre) * | Date: 2002-07-29 11:53 | |
Logged In: YES user_id=250749 The following results are from your original patch (the n column dropped for better SF display). System 1: Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d Fix 4 gcc 2.8.1 -O2 samplesort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.07 0.01 0.01 0.07 0.01 0.03 0.02 0.08 16 0.18 0.02 0.01 0.18 0.02 0.08 0.01 0.20 17 0.41 0.04 0.04 0.43 0.05 0.18 0.04 0.46 18 0.93 0.09 0.10 1.00 0.10 0.39 0.10 1.05 19 2.08 0.18 0.20 2.34 0.23 0.81 0.20 2.36 20 4.69 0.37 0.40 5.02 0.47 1.68 0.40 5.28 timsort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.06 0.01 0.01 0.01 0.01 0.03 0.01 0.02 16 0.15 0.03 0.01 0.02 0.02 0.06 0.02 0.04 17 0.37 0.04 0.05 0.04 0.05 0.13 0.05 0.10 18 0.88 0.10 0.09 0.10 0.10 0.28 0.10 0.19 19 1.97 0.20 0.18 0.21 0.21 0.58 0.20 0.39 20 4.40 0.41 0.40 0.42 0.40 1.21 0.40 0.81 gcc 2.95.2 -O3 samplesort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.07 0.01 0.00 0.07 0.01 0.03 0.00 0.08 16 0.17 0.01 0.03 0.17 0.02 0.09 0.02 0.19 17 0.42 0.05 0.04 0.46 0.06 0.18 0.05 0.45 18 0.99 0.09 0.09 1.05 0.12 0.40 0.09 1.05 19 2.09 0.18 0.21 2.18 0.23 0.84 0.20 2.45 20 4.73 0.39 0.41 5.13 0.47 1.70 0.40 5.38 timsort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.10 0.01 0.01 0.01 0.01 0.04 0.01 0.01 16 0.18 0.02 0.01 0.03 0.02 0.07 0.03 0.03 17 0.37 0.06 0.05 0.04 0.05 0.14 0.04 0.09 18 0.91 0.10 0.10 0.10 0.10 0.27 0.09 0.20 19 1.97 0.21 0.21 0.20 0.20 0.59 0.19 0.40 20 4.31 0.44 0.40 0.44 0.40 1.21 0.40 0.82 System 2: P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE with a patch to re-enable CPU L1 caches (SMP BIOS issue) gcc 2.95.3 -O3 samplesort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.73 0.06 0.05 0.74 0.07 0.23 0.05 0.77 16 1.60 0.12 0.12 1.66 0.13 0.48 0.12 1.71 17 3.54 0.26 0.24 3.55 0.27 1.05 0.25 3.74 18 7.63 0.52 0.51 7.73 0.58 2.12 0.50 8.05 19 16.38 1.04 1.01 17.03 1.15 4.28 1.01 17.17 20 34.94 2.09 2.02 35.04 2.37 8.62 2.02 36.58 timsort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.74 0.05 0.06 0.06 0.06 0.32 0.06 0.12 16 1.64 0.12 0.12 0.12 0.12 0.65 0.12 0.26 17 3.62 0.25 0.25 0.27 0.26 1.32 0.25 0.52 18 7.78 0.51 0.50 0.53 0.52 2.69 0.50 1.06 19 16.76 1.03 1.01 1.09 1.04 5.46 1.01 2.12 20 35.93 2.09 2.02 2.14 2.09 11.05 2.04 4.38 System 3: 486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE gcc 2.95.3 -O3 samplesort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 2.62 0.21 0.21 2.61 0.24 0.83 0.21 2.71 16 5.73 0.45 0.44 5.75 0.48 1.71 0.44 5.94 17 12.46 0.90 0.88 12.34 1.00 3.70 0.89 13.00 18 27.15 1.82 1.80 27.12 2.17 7.59 1.80 28.10 19 57.22 3.77 3.68 59.52 4.41 15.40 3.66 59.62 20 126.80 7.96 7.80 127.63 9.58 32.72 7.46 134.45 timsort i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 2.52 0.21 0.20 0.20 0.20 1.05 0.20 0.42 16 5.49 0.45 0.41 0.43 0.44 2.13 0.43 0.90 17 12.15 0.88 0.84 0.85 0.88 4.34 0.88 1.83 18 26.11 1.82 1.74 1.84 1.81 8.70 1.74 3.67 19 56.34 3.67 3.55 3.80 3.67 17.84 3.53 7.48 20 121.95 7.89 7.37 8.24 7.98 39.38 7.44 16.83 NOTES: System 2 is just starting to swap in the i=20 case. System 3 starts to swap at i=18; at i=19, process:resident size is 2:1; at i=20, process:resident size is a bit over 4:1. |
|||
msg40667 - (view) | Author: Michael Hudson (mwh) | Date: 2002-07-29 13:12 | |
Logged In: YES user_id=6656 On my iBook (600 MHz G3 with 384 megs of RAM, OS X 10.1.5): L.sort(): i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.19 0.01 0.00 0.20 0.02 0.07 0.01 0.21 16 0.45 0.05 0.04 0.43 0.04 0.15 0.05 0.47 17 1.00 0.09 0.09 1.01 0.09 0.37 0.09 1.08 18 2.16 0.16 0.16 2.26 0.22 0.75 0.18 2.35 19 4.80 0.38 0.36 5.08 0.46 1.45 0.35 5.31 20 10.65 0.79 0.79 11.83 0.89 3.33 0.78 11.88 L.msort(): i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 0.18 0.02 0.03 0.02 0.03 0.08 0.02 0.04 16 0.43 0.03 0.03 0.04 0.04 0.17 0.04 0.08 17 0.95 0.08 0.09 0.09 0.08 0.34 0.08 0.18 18 2.08 0.18 0.18 0.19 0.18 0.72 0.18 0.37 19 4.59 0.37 0.38 0.39 0.38 1.47 0.36 0.76 20 10.22 0.83 0.76 0.79 0.78 3.04 0.79 1.66 I've run this often enough to believe they're typical (inc. .msort() beating .sort() on *sort and ~sort by a small margin). Looks like an unequivocal win on this box. |
|||
msg40668 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-30 15:41 | |
Logged In: YES user_id=31435 Deleting old doc file and merge3.patch; adding new doc file. |
|||
msg40669 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-30 15:42 | |
Logged In: YES user_id=31435 Adding merge4.patch; explanation to follow. |
|||
msg40670 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-30 16:14 | |
Logged In: YES user_id=31435 In Kevin's company database (see Python-Dev), there were 8 fields we could sort on. Two fields were strongly correlated with the order of the records as given, and msort() was >6x faster on those. Two fields were weakly correlated, and msort was a major win on those (>25% speedup). One field had many duplicates, with a highly skewed distribution. msort was better than 2x faster on that. But the rest (phone#, #employees, address) were essentially randomly ordered, and msort was systematically a few percent slower on those. That wouldn't have been remarkable, except that the percentage slowdown was a few times larger than the percentage by which msort did more comparisons than sort(). I eventually figured out the obvious: the # of records wasn't an exact power of 2, and on random data msort then systematically arranged for the final merge to be between a run with a large power-of-2 size, and a run with the little bit left over. That adds a bunch of compares over perfectly balanced merges, plus O(N) pointer copies, just to get that little bit in place. The new merge4.patch repairs that as best as (I think) non- heroically possible, quickly picking a minimum run length in advance that should almost never lead to a "bad" final merge when the data is randomly ordered. In each of Kevin's 3 "problem sorts", msort() does fewer compares than sort() now, and the runtime is generally within a fraction of a percent. These all-in-cache cases still seem to favor sort(), though, and it appears to be because msort() does a lot more data movement (note that quicksorts do no more than one swap per compare, and often none, while mergesorts do a copy on every compare). The other 5 major-to-killer wins msort got on this data remain intact. The code changes needed were tiny, but the doc file changed a lot more. Note that this change has no effect on arrays with power-of- 2 sizes, so sortperf.py timings shouldn't change (and don't on my box). The code change is solely to compute a good minimum run length before the main loop begins, and it happens to return the same value as was hard-coded before when the array has a power-of-2 size. More testing on real data would be most welcome! Kevin's data was very helpful to me. |
|||
msg40671 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-31 01:12 | |
Logged In: YES user_id=31435 New doc file, with an intro at the start and a program at the end. Turns out that merge4.patch actually reversed the random-array #-of-comparisons advantage samplesort had enjoyed: it's now timsort that does 1-2% fewer comparisons on random arrays of random lengths. See the end of the file for why samplesort does 50% more comparisons on average for random arrays of length two <wink>. Near the end of the new Intro section at the start, I suggest a couple experiments people might try on boxes where ~sort is much slower under timsort. That remains baffling, but the algorithm doesn't *do* much in that case, so someone on a box where it flounders could surely figure out why. |
|||
msg40672 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-31 06:02 | |
Logged In: YES user_id=31435 ~sort gets more mysterious all the time: the mystery now is why it's *not* much slower everywhere! Here are the exact # of compares ~sort does: i n sort msort %ch lg(n!) -- ------ ------- ------- ----- -------- 15 32768 130484 188720 44.63 444255 16 65536 260019 377634 45.23 954037 17 131072 555035 755476 36.11 2039137 18 262144 1107826 1511174 36.41 4340409 19 524288 2218562 3022584 36.24 9205096 20 1048576 4430616 6045418 36.45 19458756 The last column is the information-theoretic lower bound for sorting random arrays of this size (no comparison-based algorithm can do better than than on average), showing that sort() and msort() are both getting a lot of good out of the duplicates. But sort()'s special case for equal elements is extremely effective on ~sort's specific data pattern, and msort just isn't going to get close to that (it does better than sort() on skewed distributions with lots of duplicates, though). The only thing I can think of that could transform what "should be" highly significant slowdowns into highly significant speedups on some boxes are catastrophic cache effects in samplesort. But knowing something about how both algorithms work <wink>, that's not screaming "oh, of course". |
|||
msg40673 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-07-31 20:37 | |
Logged In: YES user_id=31435 Replaced the doc file. The new one contains more info comparing msort to sort. There's nothing more I want to do here, and it looks like everyone who might time this already did. Assigned to Guido for pronouncement. I recommend replacing list.sort() with this. The only real downside is the potential for requiring 2*N temp bytes; that (and everything else <wink>) is discussed in the doc file. If this is accepted, another issue is whether to *advertise* that this sort is stable. Some people really want that, but requiring stability constrains implementations. Another possibility is to give lists two sort methods, one guaranteed stable and the other not, where in 2.3 CPython both map to this code. In no case do I want to keep both the samplesort and timsort implementations in the core -- one brain-busting sort implementation is quite enough. This one has many wonderful properties the samplesort hybrid lacks. |
|||
msg40674 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2002-07-31 20:45 | |
Logged In: YES user_id=6380 1. Go for it. 2. Advertise it as an implementation feature. |
|||
msg40675 - (view) | Author: Tim Peters (tim.peters) * | Date: 2002-08-01 03:11 | |
Logged In: YES user_id=31435 This is checked in now, so closing this patch. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:05:31 | admin | set | github: 36940 |
2002-07-26 15:51:08 | tim.peters | create |