[Python-checkins] r46100 - sandbox/trunk/stringbench/README

Tue May 23 13:07:09 CEST 2006

Author: andrew.dalke
Date: Tue May 23 13:07:09 2006
New Revision: 46100

Added:
   sandbox/trunk/stringbench/README
Log:
Info about this benchmark


Added: sandbox/trunk/stringbench/README
==============================================================================

--- (empty file)
+++ sandbox/trunk/stringbench/README	Tue May 23 13:07:09 2006
@@ -0,0 +1,67 @@
+stringbench is a set of performance tests comparing byte string
+operations with unicode operations.  The two string implementations
+are loosely based on each other and sometimes the algorithm for one is
+faster than the other.
+
+These test set was started at the Need For Speed sprint in Reykjavik
+to identify which string methods could be sped up quickly and to
+identify obvious places for improvement.
+
+Here is an example of a benchmark
+
+
+ at bench('"Andrew".startswith("A")', 'startswith single character', 1000)
+def startswith_single(STR):
+    s1 = STR("Andrew")
+    s2 = STR("A")
+    s1_startswith = s1.startswith
+    for x in _RANGE_1000:
+        s1_startswith(s2)
+
+The bench decorator takes three parameters.  The first is a short
+description of how the code works.  In most cases this is Python code
+snippet.  It is not the code which is actually run because the real
+code is hand-optimized to focus on the method being tested.
+
+The second parameter is a group title.  All benchmarks with the same
+group title are listed together.  This lets you compare different
+implementations of the same algorithm, such as "t in s"
+vs. "s.find(t)".
+
+The last is a count.  Each benchmark loops over the algorithm either
+100 or 1000 times, depending on the algorithm performance.  The output
+time is the time per benchmark call so the reader needs a way to know
+how to scale the performance.
+
+These parameters become function attributes.
+
+
+Here is an example of the output
+
+
+========== count newlines
+38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
+========== early match, single character
+1.14    1.18    96.8    ("A"*1000).find("A") (*1000)
+0.44    0.41    105.6   "A" in "A"*1000 (*1000)
+1.15    1.17    98.1    ("A"*1000).index("A") (*1000)
+
+The first column is the run time in milliseconds for byte strings.
+The second is the run time for unicode strings.  The third is a
+percentage; byte time / unicode time.  It's the percentage by which
+unicode is faster than byte strings.
+
+The last column contains the code snippet and the repeat count for the
+internal benchmark loop.
+
+The times are computed with 'timeit.py' which repeats the test more
+and more times until the total time takes over 0.2 seconds, returning
+the best time for a single iteration.
+
+The final line of the output is the cumulative time for byte and
+unicode strings, and the overall performance of unicode relative to
+bytes.  For example
+
+4079.83 5432.25 75.1    TOTAL
+
+However, this has no meaning as it evenly weights every test.