[Python-checkins] r77617 - peps/trunk/pep-3146.txt
collin.winter
python-checkins at python.org
Wed Jan 20 23:08:05 CET 2010
Author: collin.winter
Date: Wed Jan 20 23:08:04 2010
New Revision: 77617
Log:
Add PEP 3146: Merge Unladen Swallow into CPython.
Added:
peps/trunk/pep-3146.txt
Added: peps/trunk/pep-3146.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-3146.txt Wed Jan 20 23:08:04 2010
@@ -0,0 +1,1315 @@
+PEP: 3146
+Title: Merging Unladen Swallow into CPython
+Version: $Revision$
+Last-Modified: $Date$
+Author: Collin Winter <collinwinter at google.com>,
+ Jeffrey Yasskin <jyasskin at google.com>,
+ Reid Kleckner <rnk at mit.edu>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 1-Jan-2010
+Python-Version: 3.3
+Post-History:
+
+
+Abstract
+========
+
+This PEP proposes the merger of the Unladen Swallow project [#us]_ into
+CPython's source tree. Unladen Swallow is an open-source branch of CPython
+focused on performance. Unladen Swallow is source-compatible with valid Python
+2.6.4 applications and C extension modules.
+
+Unladen Swallow adds a just-in-time (JIT) compiler to CPython, allowing for the
+compilation of selected Python code to optimized machine code. Beyond classical
+static compiler optimizations, Unladen Swallow's JIT compiler takes advantage of
+data collected at runtime to make checked assumptions about code behaviour,
+allowing the production of faster machine code.
+
+This PEP proposes to integrate Unladen Swallow into CPython's development tree
+in a separate ``py3k-jit`` branch, targeted for eventual merger with the main
+``py3k`` branch. While Unladen Swallow is by no means finished or perfect, we
+feel that Unladen Swallow has reached sufficient maturity to warrant
+incorporation into CPython's roadmap. We have sought to create a stable platform
+that the wider CPython development team can build upon, a platform that will
+yield increasing performance for years to come.
+
+This PEP will detail Unladen Swallow's implementation and how it differs from
+CPython 2.6.4; the benchmarks used to measure performance; the tools used to
+ensure correctness and compatibility; the impact on CPython's current platform
+support; and the impact on the CPython core development process. The PEP
+concludes with a proposed merger plan and brief notes on possible directions
+for future work.
+
+We seek the following from the BDFL:
+
+- Approval for the overall concept of adding a just-in-time compiler to CPython,
+ following the design laid out below.
+- Permission to continue working on the just-in-time compiler in the CPython
+ source tree.
+- Permission to eventually merge the just-in-time compiler into the ``py3k``
+ branch once all blocking issues have been addressed.
+- A pony.
+
+
+Rationale, Implementation
+=========================
+
+Many companies and individuals would like Python to be faster, to enable its
+use in more projects. Google is one such company.
+
+Unladen Swallow is a Google-sponsored branch of CPython, initiated to improve
+the performance of Google's numerous Python libraries, tools and applications.
+To make the adoption of Unladen Swallow as easy as possible, the project
+initially aimed at four goals:
+
+- A performance improvement of 5x over the baseline of CPython 2.6.4 for
+ single-threaded code.
+- 100% source compatibility with valid CPython 2.6 applications.
+- 100% source compatibility with valid CPython 2.6 C extension modules.
+- Design for eventual merger back into CPython.
+
+We chose 2.6.4 as our baseline because Google uses CPython 2.4 internally, and
+jumping directly from CPython 2.4 to CPython 3.x was considered infeasible.
+
+To achieve the desired performance, Unladen Swallow has implemented a
+just-in-time (JIT) compiler [#jit]_ in the tradition of Urs Hoelzle's work on
+Self [#urs-self]_, gathering feedback at runtime and using that to inform
+compile-time optimizations. This is similar to the approach taken by the current
+breed of JavaScript engines [#v8]_, [#squirrelfishextreme]_; most Java virtual
+machines [#hotspot]_; Rubinius [#rubinius]_, MacRuby [#macruby]_, and other Ruby
+implementations; Psyco [#psyco]_; and others.
+
+We explicitly reject any suggestion that our ideas are original. We have sought
+to reuse the published work of other researchers wherever possible. If we have
+done any original work, it is by accident. We have tried, as much as possible,
+to take good ideas from all corners of the academic and industrial community. A
+partial list of the research papers that have informed Unladen Swallow is
+available on the Unladen Swallow wiki [#us-relevantpapers]_.
+
+The key observation about optimizing dynamic languages is that they are only
+dynamic in theory; in practice, each individual function or snippet of code is
+relatively static, using a stable set of types and child functions. The current
+CPython bytecode interpreter assumes the worst about the code it is running,
+that at any moment the user might override the ``len()`` function or pass a
+never-before-seen type into a function. In practice this never happens, but user
+code pays for that support. Unladen Swallow takes advantage of the relatively
+static nature of user code to improve performance.
+
+At a high level, the Unladen Swallow JIT compiler works by translating a
+function's CPython bytecode to platform-specific machine code, using data
+collected at runtime, as well as classical compiler optimizations, to improve
+the quality of the generated machine code. Because we only want to spend
+resources compiling Python code that will actually benefit the runtime of the
+program, an online heuristic is used to assess how hot a given function is. Once
+the hotness value for a function crosses a given threshold, it is selected for
+compilation and optimization. Until a function is judged hot, however, it runs
+in the standard CPython eval loop, which in Unladen Swallow has been
+instrumented to record interesting data about each bytecode executed. This
+runtime data is used to reduce the flexibility of the generated machine code,
+allowing us to optimize for the common case. For example, we collect data on
+
+- Whether a branch was taken/not taken. If a branch is never taken, we will not
+ compile it to machine code.
+- Types used by operators. If we find that ``a + b`` is only ever adding
+ integers, the generated machine code for that snippet will not support adding
+ floats.
+- Functions called at each callsite. If we find that a particular ``foo()``
+ callsite is always calling the same ``foo`` function, we can optimize the
+ call or inline it away
+
+Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how
+they are used.
+
+However, if by chance the historically-untaken branch is now taken, or some
+integer-optimized ``a + b`` snippet receives two strings, we must support this.
+We cannot change Python semantics. Each of these sections of optimized machine
+code is preceded by a `guard`, which checks whether the simplifying assumptions
+we made when optimizing still hold. If the assumptions are still valid, we run
+the optimized machine code; if they are not, we revert back to the interpreter
+and pick up where we left off.
+
+We have chosen to reuse a set of existing compiler libraries called LLVM
+[#llvm]_ for code generation and code optimization. This has saved our small
+team from needing to understand and debug code generation on multiple machine
+instruction sets and from needing to implement a large set of classical compiler
+optimizations. The project would not have been possible without such code reuse.
+We have found LLVM easy to modify and its community receptive to our suggestions
+and modifications.
+
+In somewhat more depth, Unladen Swallow's JIT works by compiling CPython
+bytecode to LLVM's own intermediate representation (IR) [#llvm-langref]_, taking
+into account any runtime data from the CPython eval loop. We then run a set of
+LLVM's built-in optimization passes, producing a smaller, optimized version of
+the original LLVM IR. LLVM then lowers the IR to platform-specific machine code,
+performing register allocation, instruction scheduling, and any necessary
+relocations. This arrangement of the compilation pipeline allows the LLVM-based
+JIT to be easily omitted from a compiled ``python`` binary by passing
+``--without-llvm`` to ``./configure``; various use cases for this flag are
+discussed later.
+
+For a complete detailing of how Unladen Swallow works, consult the Unladen
+Swallow documentation [#us-projectplan]_, [#us-llvm-notes]_.
+
+Unladen Swallow has focused on improving the performance of single-threaded,
+pure-Python code. We have not made an effort to remove CPython's global
+interpreter lock (GIL); we feel this is separate from our work, and due to its
+sensitivity, is best done in a mainline development branch. We considered
+making GIL-removal a part of Unladen Swallow, but were concerned by the
+possibility of introducing subtle bugs when porting our work from CPython 2.6
+to 3.x.
+
+A JIT compiler is an extremely versatile tool, and we have by no means
+exhausted its full potential. We have tried to create a sufficiently flexible
+framework that the wider CPython development community can build upon it for
+years to come, extracting increased performance in each subsequent release.
+
+
+Performance
+===========
+
+Benchmarks
+----------
+
+Unladen Swallow has developed a fairly large suite of benchmarks, ranging from
+synthetic microbenchmarks designed to test a single feature up through
+whole-application macrobenchmarks. The inspiration for these benchmarks has come
+variously from third-party contributors (in the case of the ``html5lib``
+benchmark), Google's own internal workloads (``slowspitfire``, ``pickle``,
+``unpickle``), as well as tools and libraries in heavy use throughout the wider
+Python community (``django``, ``2to3``, ``spambayes``). These benchmarks are run
+through a single interface called ``perf.py`` that takes care of collecting
+memory usage information, graphing performance, and running statistics on the
+benchmark results to ensure significance.
+
+The full list of available benchmarks is available on the Unladen Swallow wiki
+[#us-benchmarks]_, including instructions on downloading and running the
+benchmarks for yourself. All our benchmarks are open-source; none are
+Google-proprietary. We believe this collection of benchmarks serves as a useful
+tool to benchmark any complete Python implementation, and indeed, PyPy is
+already using these benchmarks for their own performance testing
+[#pypy-bmarks]_, [#us-wider-perf-issue]_. We welcome this, and we seek
+additional workloads for the benchmark suite from the Python community.
+
+We have focused our efforts on collecting macrobenchmarks and benchmarks that
+simulate real applications as well as possible, when running a whole application
+is not feasible. Along a different axis, our benchmark collection originally
+focused on the kinds of workloads seen by Google's Python code (webapps, text
+processing), though we have since expanded the collection to include workloads
+Google cares nothing about. We have so far shied away from heavily-numerical
+workloads, since NumPy [#numpy]_ already does an excellent job on such code and
+so improving numerical performance was not an initial high priority for the
+team; we have begun to incorporate such benchmarks into the collection
+[#us-nbody]_ and have started work on optimizing numerical Python code.
+
+Beyond these benchmarks, there are also a variety of workloads we are explicitly
+not interested in benchmarking. Unladen Swallow is focused on improving the
+performance of pure-Python code, so the performance of extension modules like
+NumPy is uninteresting since NumPy's core routines are implemented in
+C. Similarly, workloads that involve a lot of IO like GUIs, databases or
+socket-heavy applications would, we feel, fail to accurately measure interpreter
+or code generation optimizations. That said, there's certainly room to improve
+the performance of C-language extensions modules in the standard library, and
+as such, we have added benchmarks for the ``cPickle`` and ``re`` modules.
+
+
+Performance vs CPython 2.6.4
+----------------------------
+
+The charts below compare the arithmetic mean of multiple benchmark iterations
+for CPython 2.6.4 and Unladen Swallow. ``perf.py`` gathers more data than this,
+and indeed, arithmetic mean is not the whole story; we reproduce only the mean
+for the sake of conciseness. We include the ``t`` score from the Student's
+two-tailed T-test [#students-t-test]_ at the 95% confidence interval to indicate
+the significance of the result. Most benchmarks are run for 100 iterations,
+though some longer-running whole-application benchmarks are run for fewer
+iterations.
+
+A description of each of these benchmarks is available on the Unladen Swallow
+wiki [#us-benchmarks]_.
+
+Command:
+::
+
+ ./perf.py -r -b default,apps ../a/python ../b/python
+
+
+32-bit; gcc 4.0.3; Ubuntu Dapper; Intel Core2 Duo 6600 @ 2.4GHz; 2 cores; 4MB L2 cache; 4GB RAM
+
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline |
++==============+===============+======================+==============+===============+============================+
+| 2to3 | 25.13 s | 24.87 s | 1.01x faster | t=8.94 | http://tinyurl.com/yamhrpg |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| django | 1.08 s | 0.80 s | 1.35x faster | t=315.59 | http://tinyurl.com/y9mrn8s |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| html5lib | 14.29 s | 13.20 s | 1.08x faster | t=2.17 | http://tinyurl.com/y8tyslu |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| nbody | 0.51 s | 0.28 s | 1.84x faster | t=78.007 | http://tinyurl.com/y989qhg |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| rietveld | 0.75 s | 0.55 s | 1.37x faster | Insignificant | http://tinyurl.com/ye7mqd3 |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowpickle | 0.75 s | 0.55 s | 1.37x faster | t=20.78 | http://tinyurl.com/ybrsfnd |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowspitfire | 0.83 s | 0.61 s | 1.36x faster | t=2124.66 | http://tinyurl.com/yfknhaw |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowunpickle | 0.33 s | 0.26 s | 1.26x faster | t=15.12 | http://tinyurl.com/yzlakoo |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| spambayes | 0.31 s | 0.34 s | 1.10x slower | Insignificant | http://tinyurl.com/yem62ub |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+
+
+64-bit; gcc 4.2.4; Ubuntu Hardy; AMD Opteron 8214 HE @ 2.2 GHz; 4 cores; 1MB L2 cache; 8GB RAM
+
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Significance | Timeline |
++==============+===============+======================+==============+===============+============================+
+| 2to3 | 31.98 s | 30.41 s | 1.05x faster | t=8.35 | http://tinyurl.com/ybcrl3b |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| django | 1.22 s | 0.94 s | 1.30x faster | t=106.68 | http://tinyurl.com/ybwqll6 |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| html5lib | 18.97 s | 17.79 s | 1.06x faster | t=2.78 | http://tinyurl.com/yzlyqvk |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| nbody | 0.77 s | 0.27 s | 2.86x faster | t=133.49 | http://tinyurl.com/yeyqhbg |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| rietveld | 0.74 s | 0.80 s | 1.08x slower | t=-2.45 | http://tinyurl.com/yzjc6ff |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowpickle | 0.91 s | 0.62 s | 1.48x faster | t=28.04 | http://tinyurl.com/yf7en6k |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowspitfire | 1.01 s | 0.72 s | 1.40x faster | t=98.70 | http://tinyurl.com/yc8pe2o |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| slowunpickle | 0.51 s | 0.34 s | 1.51x faster | t=32.65 | http://tinyurl.com/yjufu4j |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+| spambayes | 0.43 s | 0.45 s | 1.06x slower | Insignificant | http://tinyurl.com/yztbjfp |
++--------------+---------------+----------------------+--------------+---------------+----------------------------+
+
+
+Many of these benchmarks take a hit under Unladen Swallow because the current
+version blocks execution to compile Python functions down to machine code. This
+leads to the behaviour seen in the timeline graphs for the ``html5lib`` and
+``rietveld`` benchmarks, for example, and slows down the overall performance of
+``2to3``. We have an active development branch to fix this problem
+([#us-background-thread]_, [#us-background-thread-issue]_), but working within
+the strictures of CPython's current threading system has complicated the process
+and required far more care and time than originally anticipated. We view this
+issue as critical to final merger into the ``py3k`` branch.
+
+We have obviously not met our initial goal of a 5x performance improvement. A
+`performance retrospective`_ follows, which addresses why we failed to meet our
+initial performance goal. We maintain a list of yet-to-be-implemented
+performance work [#us-perf-punchlist]_.
+
+
+Memory Usage
+------------
+
+The following table shows maximum memory usage (in kilobytes) for each of
+Unladen Swallow's default benchmarks for both CPython 2.6.4 and Unladen Swallow
+r988, as well as a timeline of memory usage across the lifetime of the
+benchmark. We include tables for both 32- and 64-bit binaries. Memory usage was
+measured on Linux 2.6 systems by summing the ``Private_`` sections from the
+kernel's ``/proc/$pid/smaps`` pseudo-files [#smaps]_.
+
+Command:
+
+::
+
+ ./perf.py -r --track_memory -b default,apps ../a/python ../b/python
+
+
+32-bit
+
++--------------+---------------+----------------------+--------+----------------------------+
+| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline |
++==============+===============+======================+========+============================+
+| 2to3 | 26396 kb | 46896 kb | 1.77x | http://tinyurl.com/yhr2h4z |
++--------------+---------------+----------------------+--------+----------------------------+
+| django | 10028 kb | 27740 kb | 2.76x | http://tinyurl.com/yhan8vs |
++--------------+---------------+----------------------+--------+----------------------------+
+| html5lib | 150028 kb | 173924 kb | 1.15x | http://tinyurl.com/ybt44en |
++--------------+---------------+----------------------+--------+----------------------------+
+| nbody | 3020 kb | 16036 kb | 5.31x | http://tinyurl.com/ya8hltw |
++--------------+---------------+----------------------+--------+----------------------------+
+| rietveld | 15008 kb | 46400 kb | 3.09x | http://tinyurl.com/yhd5dra |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowpickle | 4608 kb | 16656 kb | 3.61x | http://tinyurl.com/ybukyvo |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowspitfire | 85776 kb | 97620 kb | 1.13x | http://tinyurl.com/y9vj35z |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowunpickle | 3448 kb | 13744 kb | 3.98x | http://tinyurl.com/yexh4d5 |
++--------------+---------------+----------------------+--------+----------------------------+
+| spambayes | 7352 kb | 46480 kb | 6.32x | http://tinyurl.com/yem62ub |
++--------------+---------------+----------------------+--------+----------------------------+
+
+
+64-bit
+
++--------------+---------------+----------------------+--------+----------------------------+
+| Benchmark | CPython 2.6.4 | Unladen Swallow r988 | Change | Timeline |
++==============+===============+======================+========+============================+
+| 2to3 | 51596 kb | 82340 kb | 1.59x | http://tinyurl.com/yljg6rs |
++--------------+---------------+----------------------+--------+----------------------------+
+| django | 16020 kb | 38908 kb | 2.43x | http://tinyurl.com/ylqsebh |
++--------------+---------------+----------------------+--------+----------------------------+
+| html5lib | 259232 kb | 324968 kb | 1.25x | http://tinyurl.com/yha6oee |
++--------------+---------------+----------------------+--------+----------------------------+
+| nbody | 4296 kb | 23012 kb | 5.35x | http://tinyurl.com/yztozza |
++--------------+---------------+----------------------+--------+----------------------------+
+| rietveld | 24140 kb | 73960 kb | 3.06x | http://tinyurl.com/ybg2nq7 |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowpickle | 4928 kb | 23300 kb | 4.73x | http://tinyurl.com/yk5tpbr |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowspitfire | 133276 kb | 148676 kb | 1.11x | http://tinyurl.com/y8bz2xe |
++--------------+---------------+----------------------+--------+----------------------------+
+| slowunpickle | 4896 kb | 16948 kb | 3.46x | http://tinyurl.com/ygywwoc |
++--------------+---------------+----------------------+--------+----------------------------+
+| spambayes | 10728 kb | 84992 kb | 7.92x | http://tinyurl.com/yhjban5 |
++--------------+---------------+----------------------+--------+----------------------------+
+
+
+The increased memory usage comes from a) LLVM code generation, analysis and
+optimization libraries; b) native code; c) memory usage issues or leaks in
+LLVM; d) data structures needed to optimize and generate machine code; e)
+as-yet uncategorized other sources.
+
+While we have made significant progress in reducing memory usage since the
+initial naive JIT implementation [#us-memory-issue]_, there is obviously more
+to do. We believe that there are still memory savings to be made without
+sacrificing performance. We have tended to focus on raw performance, and we
+have not yet made a concerted push to reduce memory usage. We view reducing
+memory usage as a blocking issue for final merger into the ``py3k`` branch. We
+seek guidance from the community on an acceptable level of increased memory
+usage.
+
+
+Start-up Time
+-------------
+
+Statically linking LLVM's code generation, analysis and optimization libraries
+increases the time needed to start the Python binary. C++ static initializers
+used by LLVM also increase start-up time, as does importing the collection of
+pre-compiled C runtime routines we want to inline to Python code.
+
+Results from Unladen Swallow's ``startup`` benchmarks:
+
+::
+
+ $ ./perf.py -r -b startup /tmp/cpy-26/bin/python /tmp/unladen/bin/python
+
+ ### normal_startup ###
+ Min: 0.219186 -> 0.352075: 1.6063x slower
+ Avg: 0.227228 -> 0.364384: 1.6036x slower
+ Significant (t=-51.879098, a=0.95)
+ Stddev: 0.00762 -> 0.02532: 3.3227x larger
+ Timeline: http://tinyurl.com/yfe8z3r
+
+ ### startup_nosite ###
+ Min: 0.105949 -> 0.264912: 2.5004x slower
+ Avg: 0.107574 -> 0.267505: 2.4867x slower
+ Significant (t=-703.557403, a=0.95)
+ Stddev: 0.00214 -> 0.00240: 1.1209x larger
+ Timeline: http://tinyurl.com/yajn8fa
+
+
+Unladen Swallow has made headway toward optimizing startup time, but there is
+still more work to do and further optimizations to implement. Improving start-up
+time is a high-priority item [#us-issue-startup-time]_ in Unladen Swallow's
+merger punchlist.
+
+
+Binary Size
+-----------
+
+Statically linking LLVM's code generation, analysis and optimization libraries
+significantly increases the size of the ``python`` binary.
+
+
+32-bit; gcc 4.0.3
+
++-------------+---------------+---------------+----------------------+
+| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
++=============+===============+===============+======================+
+| Release | 3.8M | 4.0M | 74M |
++-------------+---------------+---------------+----------------------+
+| Debug | 3.3M | 3.6M | 118M |
++-------------+---------------+---------------+----------------------+
+
+64-bit; gcc 4.2.4
+
++-------------+---------------+---------------+----------------------+
+| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
++=============+===============+===============+======================+
+| Release | 5.5M | 5.7M | 89M |
++-------------+---------------+---------------+----------------------+
+| Debug | 4.1M | 4.4M | 128M |
++-------------+---------------+---------------+----------------------+
+
+The increased binary size is due to statically linking LLVM's code generation,
+analysis and optimization libraries into the ``python`` binary. This can be
+straightforwardly addressed by modifying LLVM to better support shared linking
+and then using that, instead of the current static linking. For the moment,
+though, static linking provides an accurate look at the cost of linking against
+LLVM.
+
+Unladen Swallow recently experienced a regression in binary size, going from
+19MB in Unladen's 2009Q3 release up to the current 74MB shown in the table
+above. Resolution of this issue [#us-binary-size]_ will block final merger into
+the ``py3k`` branch.
+
+
+Performance Retrospective
+-------------------------
+
+Our initial goal for Unladen Swallow was a 5x performance improvement over
+CPython 2.6. We did not hit that, nor to put it bluntly, even come close. Why
+did the project not hit that goal, and can an LLVM-based JIT ever hit that goal?
+
+Why did Unladen Swallow not achieve its 5x goal? The primary reason was
+that LLVM required more work than we had initially anticipated. Based on the
+fact that Apple was shipping products based on LLVM [#llvm-users]_, and
+other high-level languages had successfully implemented LLVM-based JITs
+([#rubinius]_, [#macruby]_, [#hlvm]_), we had assumed that LLVM's JIT was
+relatively free of show-stopper bugs.
+
+That turned out to be incorrect. We had to turn our attention away from
+performance to fix a number of critical bugs in LLVM's JIT infrastructure (for
+example, [#llvm-far-call-issue]_, [#llvm-jmm-rev]_) as well as a number of
+nice-to-have enhancements that would enable further optimizations along various
+axes (for example, [#llvm-globaldce-rev]_,
+[#llvm-memleak-rev]_, [#llvm-availext-issue]_). LLVM's static code generation
+facilities, tools and optimization passes are stable and stress-tested, but the
+just-in-time infrastructure was relatively untested and buggy. We have fixed
+this.
+
+(Our hypothesis is that we hit these problems -- problems other projects had
+avoided -- because of the complexity and thoroughness of CPython's standard
+library test suite.)
+
+We also diverted engineering effort away from performance and into support tools
+such as gdb and oProfile. gdb did not work well with JIT compilers at all, and
+LLVM previously had no integration with oProfile. Having JIT-aware debuggers and
+profilers has been very valuable to the project, and we do not regret
+channeling our time in these directions. See the `Debugging`_ and `Profiling`_
+sections for more information.
+
+Can an LLVM-based CPython JIT ever hit the 5x performance target? The benchmark
+results for JIT-based JavaScript implementations suggest that 5x is indeed
+possible, as do the results PyPy's JIT has delivered for numeric workloads. The
+experience of Self-92 [#urs-self]_ is also instructive.
+
+Can LLVM deliver this? We believe that we have only begun to scratch the surface
+of what our LLVM-based JIT can deliver. The optimizations we have incorporated
+into this system thus far have borne significant fruit (for example,
+[#us-specialization-issue]_, [#us-direct-calling-issue]_,
+[#us-fast-globals-issue]_). Our experience to date is that the limiting factor
+on Unladen Swallow's performance is the engineering cycles needed to implement
+the literature. We have found LLVM easy to work with and to modify, and its
+built-in optimizations have greatly simplified the task of implementing
+Python-level optimizations.
+
+An overview of further performance opportunities is discussed in the
+`Future Work`_ section.
+
+
+
+Correctness and Compatibility
+=============================
+
+Unladen Swallow's correctness test suite includes CPython's test suite (under
+``Lib/test/``), as well as a number of important third-party applications and
+libraries [#tested-apps]_. A full list of these applications and libraries is
+reproduced below. Any dependencies needed by these packages, such as
+``zope.interface`` [#zope-interface]_, are also tested indirectly as a part of
+testing the primary package, thus widening the corpus of tested third-party
+Python code.
+
+- 2to3
+- Cheetah
+- cvs2svn
+- Django
+- Nose
+- NumPy
+- PyCrypto
+- pyOpenSSL
+- PyXML
+- Setuptools
+- SQLAlchemy
+- SWIG
+- SymPy
+- Twisted
+- ZODB
+
+These applications pass all relevant tests when run under Unladen Swallow. Note
+that some tests that failed against our baseline of CPython 2.6.4 were disabled,
+as were tests that made assumptions about CPython internals such as exact
+bytecode numbers or bytecode format. Any package with disabled tests includes
+a ``README.unladen`` file that details the changes (for example,
+[#us-sqlalchemy-readme]_).
+
+In addition, Unladen Swallow is tested automatically against an array of
+internal Google Python libraries and applications. These include Google's
+internal Python bindings for BigTable [#bigtable]_, the Mondrian code review
+application [#mondrian]_, and Google's Python standard library, among others.
+The changes needed to run these projects under Unladen Swallow have consistently
+broken into one of three camps:
+
+- Adding CPython 2.6 C API compatibility. Since Google still primarily uses
+ CPython 2.4 internally, we have needed to convert uses of ``int`` to
+ ``Py_ssize_t`` and similar API changes.
+- Fixing or disabling explicit, incorrect tests of the CPython version number.
+- Conditionally disabling code that worked around or depending on bugs in
+ CPython 2.4 that have since been fixed.
+
+Testing against this wide range of public and proprietary applications and
+libraries has been instrumental in ensuring the correctness of Unladen Swallow.
+Testing has exposed bugs that we have duly corrected. Our automated regression
+testing regime has given us high confidence in our changes as we have moved
+forward.
+
+In addition to third-party testing, we have added further tests to CPython's
+test suite for corner cases of the language or implementation that we felt were
+untested or underspecified (for example, [#us-import-tests]_,
+[#us-tracing-tests]_). These have been especially important when implementing
+optimizations, helping make sure we have not accidentally broken the darker
+corners of Python.
+
+We have also constructed a test suite focused solely on the LLVM-based JIT
+compiler and the optimizations implemented for it [#us-test_llvm]_. Because of
+the complexity and subtlety inherent in writing an optimizing compiler, we have
+attempted to exhaustively enumerate the constructs, scenarios and corner cases
+we are compiling and optimizing. The JIT tests also include tests for things
+like the JIT hotness model, making it easier for future CPython developers to
+maintain and improve.
+
+We have recently begun using fuzz testing [#fuzz-testing]_ to stress-test the
+compiler. We have used both pyfuzz [#pyfuzz]_ and Fusil [#fusil]_ in the past,
+and we recommend they be introduced as an automated part of the CPython testing
+process.
+
+Known Incompatibilities
+-----------------------
+
+The only application or library we know to not work with Unladen Swallow that
+does work with CPython 2.6.4 is Psyco [#psyco]_. We are aware of some libraries
+such as PyGame [#pygame]_ that work well with CPython 2.6.4, but suffer some
+degradation due to changes made in Unladen Swallow. We are tracking this issue
+[#us-background-thread-issue]_ and are working to resolve these instances of
+degradation.
+
+While Unladen Swallow is source-compatible with CPython 2.6.4, it is not
+binary compatible. C extension modules compiled against one will need to be
+recompiled to work with the other.
+
+
+Platform Support
+================
+
+Unladen Swallow is inherently limited by the platform support provided by LLVM,
+especially LLVM's JIT compilation system [#llvm-hardware]_. LLVM's JIT has the
+best support on x86 and x86-64 systems, and these are the platforms where
+Unladen Swallow has received the most testing. We are confident in LLVM/Unladen
+Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but
+is not widely used and may be buggy.
+
+Unladen Swallow is known to work on the following operating systems: Linux,
+Darwin, Windows. Unladen Swallow has received the most testing on Linux and
+Darwin, though it still builds and passes its tests on Windows.
+
+In order to support hardware and software platforms where LLVM's JIT does not
+work, Unladen Swallow provides a ``./configure --without-llvm`` option. This
+flag carves out any part of Unladen Swallow that depends on LLVM, yielding a
+Python binary that works and passes its tests, but has no performance
+advantages. This configuration is recommended for hardware unsupported by LLVM,
+or systems that care more about memory usage than performance.
+
+
+Impact on CPython Development
+=============================
+
+Experimenting with Changes to Python or CPython Bytecode
+--------------------------------------------------------
+
+Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is
+immune to Python languages changes that only affect the parser.
+
+We recommend that changes to the CPython bytecode compiler or the semantics of
+individual bytecodes be prototyped in the interpreter loop first, then be ported
+to the JIT compiler once the semantics are clear. To make this easier, Unladen
+Swallow includes a ``--without-llvm`` configure-time option that strips out the
+JIT compiler and all associated infrastructure. This leaves the current burden
+of experimentation unchanged so that developers can prototype in the current
+low-barrier-to-entry interpreter loop.
+
+Unladen Swallow began implementing its JIT compiler by doing straightforward,
+naive translations from bytecode implementations into LLVM API calls. We found
+this process to be easily understood, and we recommend the same approach for
+CPython. We include several sample changes from the Unladen Swallow repository
+here as examples of this style of development: [#us-r359]_, [#us-r376]_,
+[#us-r417]_, [#us-r517]_.
+
+
+Debugging
+---------
+
+The Unladen Swallow team implemented changes to gdb to make it easier to use gdb
+to debug JIT-compiled Python code. These changes were released in gdb 7.0
+[#gdb70]_. They make it possible for gdb to identify and unwind past
+JIT-generated call stack frames. This allows gdb to continue to function as
+before for CPython development if one is changing, for example, the ``list``
+type or builtin functions.
+
+Example backtrace after our changes, where ``baz``, ``bar`` and ``foo`` are
+JIT-compiled:
+
+::
+
+ Program received signal SIGSEGV, Segmentation fault.
+ 0x00002aaaabe7d1a8 in baz ()
+ (gdb) bt
+ #0 0x00002aaaabe7d1a8 in baz ()
+ #1 0x00002aaaabe7d12c in bar ()
+ #2 0x00002aaaabe7d0aa in foo ()
+ #3 0x00002aaaabe7d02c in main ()
+ #4 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70, F=0x14024e0, ArgValues=...)
+ at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
+ #5 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
+ (this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0)
+ at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
+ #6 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8,
+ envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208
+
+Previously, the JIT-compiled frames would have caused gdb to unwind incorrectly,
+generating lots of obviously-incorrect ``#6 0x00002aaaabe7d0aa in ?? ()``-style
+stack frames.
+
+Highlights:
+
+- gdb 7.0 is able to correctly parse JIT-compiled stack frames, allowing full
+ use of gdb on non-JIT-compiled functions, that is, the vast majority of the
+ CPython codebase.
+- Disassembling inside a JIT-compiled stack frame automatically prints the full
+ list of instructions making up that function. This is an advance over the
+ state of gdb before our work: developers needed to guess the starting address
+ of the function and manually disassemble the assembly code.
+- Flexible underlying mechanism allows CPython to add more and more information,
+ and eventually reach parity with C/C++ support in gdb for JIT-compiled machine
+ code.
+
+Lowlights:
+
+- gdb cannot print local variables or tell you what line you're currently
+ executing inside a JIT-compiled function. Nor can it step through
+ JIT-compiled code, except for one instruction at a time.
+- Not yet integrated with Apple's gdb or Microsoft's Visual Studio debuggers.
+
+The Unladen Swallow team is working with Apple to get these changes
+incorporated into their future gdb releases.
+
+
+Profiling
+---------
+
+Unladen Swallow integrates with oProfile 0.9.4 and newer [#oprofile]_ to support
+assembly-level profiling on Linux systems. This means that oProfile will
+correctly symbolize JIT-compiled functions in its reports.
+
+Example report, where the ``#u#``-prefixed symbol names are JIT-compiled Python
+functions:
+
+::
+
+ $ opreport -l ./python | less
+ CPU: Core 2, speed 1600 MHz (estimated)
+ Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
+ samples % image name symbol name
+ 79589 4.2329 python PyString_FromFormatV
+ 62971 3.3491 python PyEval_EvalCodeEx
+ 62713 3.3354 python tupledealloc
+ 57071 3.0353 python _PyEval_CallFunction
+ 50009 2.6597 24532.jo #u#force_unicode
+ 47468 2.5246 python PyUnicodeUCS2_Decode
+ 45829 2.4374 python PyFrame_New
+ 45173 2.4025 python lookdict_string
+ 43082 2.2913 python PyType_IsSubtype
+ 39763 2.1148 24532.jo #u#render5
+ 38145 2.0287 python _PyType_Lookup
+ 37643 2.0020 python PyObject_GC_UnTrack
+ 37105 1.9734 python frame_dealloc
+ 36849 1.9598 python PyEval_EvalFrame
+ 35630 1.8950 24532.jo #u#resolve
+ 33313 1.7717 python PyObject_IsInstance
+ 33208 1.7662 python PyDict_GetItem
+ 33168 1.7640 python PyTuple_New
+ 30458 1.6199 python PyCFunction_NewEx
+
+This support is functional, but as-yet unpolished. Unladen Swallow maintains a
+punchlist of items we feel are important to improve in our oProfile integration
+to make it more useful to core CPython developers [#us-oprofile-punchlist]_.
+
+Highlights:
+
+- Symbolization of JITted frames working in oProfile on Linux.
+
+Lowlights:
+
+- No work yet invested in improving symbolization of JIT-compiled frames for
+ Apple's Shark [#shark]_ or Microsoft's Visual Studio profiling tools.
+- Some polishing still desired for oProfile output.
+
+We recommend using oProfile 0.9.5 (and newer) to work around a now-fixed bug on
+x86-64 platforms in oProfile. oProfile 0.9.4 will work fine on 32-bit platforms,
+however.
+
+Given the ease of integrating oProfile with LLVM [#llvm-oprofile-change]_ and
+Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as
+well, provided they support a similar JIT interface [#oprofile-jit-interface]_.
+
+
+Addition of C++ to CPython
+--------------------------
+
+In order to use LLVM, Unladen Swallow has introduced C++ into the core CPython
+tree and build process. This is an unavoidable part of depending on LLVM; though
+LLVM offers a C API [#llvm-c-api]_, it is limited and does not expose the
+functionality needed by CPython. Because of this, we have implemented the
+internal details of the Unladen Swallow JIT and its supporting infrastructure
+in C++. We do not propose converting the entire CPython codebase to C++.
+
+Highlights:
+
+- Easy use of LLVM's full, powerful code generation and related APIs.
+- Convenient, abstract data structures simplify code.
+- C++ is limited to relatively small corners of the CPython codebase.
+
+Lowlights:
+
+- Developers must know two related languages, C and C++ to work on the full
+ range of CPython's internals.
+- A C++ style guide will need to be developed and enforced. See `Open Issues`_.
+
+
+Managing LLVM Releases, C++ API Changes
+---------------------------------------
+
+LLVM is released regularly every six months. This means that LLVM may be
+released two or three times during the course of development of a CPython 3.x
+release. Each LLVM release brings newer and more powerful optimizations,
+improved platform support and more sophisticated code generation.
+
+LLVM releases usually include incompatible changes to the LLVM C++ API; the
+release notes for LLVM 2.6 [#llvm-26-whatsnew]_ include a list of
+intentionally-introduced incompatibilities. Unladen Swallow has tracked LLVM
+trunk closely over the course of development. Our experience has been
+that LLVM API changes are obvious and easily or mechanically remedied. We
+include two such changes from the Unladen Swallow tree as references here:
+[#us-llvm-r820]_, [#us-llvm-r532]_.
+
+Due to API incompatibilities, we recommend that an LLVM-based CPython target
+compatibility with a single version of LLVM at a time. This will lower the
+overhead on the core development team. Pegging to an LLVM version should not be
+a problem from a packaging perspective, because pre-built LLVM packages
+generally become available via standard system package managers fairly quickly
+following an LLVM release, and failing that, llvm.org itself includes binary
+releases.
+
+Pre-built LLVM packages are available from MacPorts [#llvm-macports]_ for
+Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
+[#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries,
+such as for MinGW [#llvm-mingw]_.
+
+LLVM is currently intended to be statically linked; this means that binary
+releases of CPython will include the relevant parts (not all!) of LLVM. This
+will increase the binary size, as noted above.
+
+Unladen Swallow has tasked a full-time engineer with fixing any remaining
+critical issues in LLVM before LLVM's 2.7 release. We would like CPython 3.x to
+be able to depend on a released version of LLVM, rather than closely tracking
+LLVM trunk as Unladen Swallow has done. We believe we will finish this work
+before the release of LLVM 2.7, expected in May 2010.
+
+
+Building CPython
+----------------
+
+In addition to a runtime dependency on LLVM, Unladen Swallow includes a
+build-time dependency on Clang [#clang]_, an LLVM-based C/C++ compiler. We use
+this to compile parts of the C-language Python runtime to LLVM's intermediate
+representation; this allows us to perform cross-language inlining, yielding
+increased performance. Clang is not required to run Unladen Swallow. Clang
+binary packages are available from most major Linux distributions (for example,
+[#clang-debian]_).
+
+We examined the impact of Unladen Swallow on the time needed to build Python,
+including configure, full builds and incremental builds after touching a single
+C source file.
+
++-------------+---------------+---------------+----------------------+
+| ./configure | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
++=============+===============+===============+======================+
+| Run 1 | 0m20.795s | 0m16.558s | 0m15.477s |
++-------------+---------------+---------------+----------------------+
+| Run 2 | 0m15.255s | 0m16.349s | 0m15.391s |
++-------------+---------------+---------------+----------------------+
+| Run 3 | 0m15.228s | 0m16.299s | 0m15.528s |
++-------------+---------------+---------------+----------------------+
+
++-------------+---------------+---------------+----------------------+
+| Full make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
++=============+===============+===============+======================+
+| Run 1 | 1m30.776s | 1m22.367s | 1m54.053s |
++-------------+---------------+---------------+----------------------+
+| Run 2 | 1m21.374s | 1m22.064s | 1m49.448s |
++-------------+---------------+---------------+----------------------+
+| Run 3 | 1m22.047s | 1m23.645s | 1m49.305s |
++-------------+---------------+---------------+----------------------+
+
+Full builds take a hit due to a) additional ``.cc`` files needed for LLVM
+interaction, b) statically linking LLVM into ``libpython``, c) compiling parts
+of the Python runtime to LLVM IR to enable cross-language inlining.
+
+Incremental builds, however, are significantly slower. The table below shows
+incremental rebuild times after touching ``Objects/listobject.c``.
+
++-------------+---------------+---------------+----------------------+
+| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
++=============+===============+===============+======================+
+| Run 1 | 0m1.854s | 0m1.456s | 0m24.464s |
++-------------+---------------+---------------+----------------------+
+| Run 2 | 0m1.437s | 0m1.442s | 0m24.416s |
++-------------+---------------+---------------+----------------------+
+| Run 3 | 0m1.440s | 0m1.425s | 0m24.352s |
++-------------+---------------+---------------+----------------------+
+
+As with full builds, this extra time comes from a) additional ``.cc`` files
+needed for LLVM interaction, and b) statically linking LLVM into ``libpython``.
+
+If ``libpython`` were linked shared against LLVM, this overhead would go down.
+Incremental builds of Unladen Swallow also currently (as of r988) suffer from a
+known bug in the Unladen Swallow ``Makefile`` [#rebuild-too-much]_ where too
+many ``.cc`` files are recompiled. We consider this a blocking issue for full
+merger with the ``py3k`` branch.
+
+
+Proposed Merge Plan
+===================
+
+We propose focusing our efforts on eventual merger with CPython's 3.x line of
+development. The BDFL has indicated that 2.7 is to be the final release of
+CPython's 2.x line of development [#bdfl-27-final]_, and since 2.7 alpha 1 has
+already been released [#cpy-27a1]_, we have missed the window. Python 3 is the
+future, and that is where we will target our performance efforts.
+
+We recommend the following plan for merger of Unladen Swallow into the CPython
+source tree:
+
+- Creation of a branch in the CPython SVN repository to work in, call it
+ ``py3k-jit`` as a strawman. This will be a branch of the CPython ``py3k``
+ branch.
+- We will keep this branch closely integrated to ``py3k``. The further we
+ deviate, the harder our work will be.
+- Any JIT-related patches will go into the ``py3k-jit`` branch.
+- Non-JIT-related patches will go into the ``py3k`` branch (once reviewed and
+ approved) and be merged back into the ``py3k-jit`` branch.
+- Potentially-contentious issues, such as the introduction of new command line
+ flags or environment variables, will be discussed on python-dev.
+
+
+Because Google uses CPython 2.x internally, Unladen Swallow is based on CPython
+2.6. We would need to port our compiler to Python 3; this would be done as
+patches are applied to the ``py3k-jit`` branch, so that the branch remains a
+consistent implementation of Python 3 at all times.
+
+We believe this approach will be minimally disruptive to the 3.2 or 3.3 release
+process while we iron out any remaining issues blocking final merger into
+``py3k``. Unladen Swallow maintains a punchlist of known issues needed before
+final merger [#us-punchlist]_, which includes all problems mentioned in this
+PEP; we trust the CPython community will have its own concerns.
+
+See the `Open Issues`_ section for questions about code review policy for the
+``py3k-jit`` branch.
+
+
+Future Work
+===========
+
+A JIT compiler is an extremely flexible tool, and we have by no means exhausted
+its full potential. Unladen Swallow maintains a list of yet-to-be-implemented
+performance optimizations [#us-perf-punchlist]_ that the team has not yet
+had time to fully implement. Examples:
+
+- Python/Python inlining [#inlining]_. Our compiler currently performs no
+ inlining between pure-Python functions. Work on this is on-going
+ [#us-inlining]_.
+- Unboxing [#unboxing]_. Unboxing is critical for numerical performance. PyPy
+ in particular has demonstrated the value of unboxing to heavily-numeric
+ workloads.
+- Recompilation, adaptation. Unladen Swallow currently only compiles a Python
+ function once, based on its usage pattern up to that point. If the usage
+ pattern changes, limitations in LLVM [#us-recompile-issue]_ prevent us from
+ recompiling the function to better serve the new usage pattern.
+- JIT-compile regular expressions. Modern JavaScript engines reuse their JIT
+ compilation infrastructure to boost regex performance [#us-regex-perf]_.
+ Unladen Swallow has developed benchmarks for Python regular expression
+ performance ([#us-bm-re-compile]_, [#us-bm-re-v8]_, [#us-bm-re-effbot]_), but
+ work on regex performance is still at an early stage [#us-regex-issue]_.
+- Trace compilation [#traces-waste-of-time]_, [#traces-explicit-pipeline]_.
+ Based on the results of PyPy and Tracemonkey [#tracemonkey]_, we believe that
+ a CPython JIT should incorporate trace compilation to some degree. We
+ initially avoided a purely-tracing JIT compiler in favor of a simpler,
+ function-at-a-time compiler. However this function-at-a-time compiler has laid
+ the groundwork for a future tracing compiler implemented in the same terms.
+
+This list is by no means exhaustive. There is a vast literature on optimizations
+for dynamic languages that could and should be implemented in terms of Unladen
+Swallow's LLVM-based JIT compiler [#us-relevantpapers]_.
+
+
+Open Issues
+===========
+
+- *Code review policy for the ``py3k-jit`` branch.* How does the CPython
+ community want us to procede with respect to checkins on the ``py3k-jit``
+ branch? Pre-commit reviews? Post-commit reviews?
+
+ Unladen Swallow has enforced pre-commit reviews in our trunk, but we realize
+ this may lead to long review/checkin cycles in a purely-volunteer
+ organization. We would like a non-Google-affiliated member of the CPython
+ development team to review our work for correctness and compatibility, but we
+ realize this may not be possible for every commit.
+- *How to link LLVM.* Should we change LLVM to better support shared linking,
+ and then use shared linking to link the parts of it we need into CPython?
+- *Prioritization of remaining issues.* We would like input from the CPython
+ development team on how to prioritize the remaining issues in the Unladen
+ Swallow codebase. Some issues like memory usage are obviously critical before
+ merger with ``py3k``, but others may fall into a "nice to have" category that
+ could be kept for resolution into a future CPython 3.x release.
+
+- *Create a C++ style guide.* Should PEP 7 be extended to include C++, or
+ should a separate C++ style PEP be created? Unladen Swallow maintains its own
+ style guide [#us-styleguide]_, which may serve as a starting point; the
+ Unladen Swallow style guide is based on both LLVM's [#llvm-styleguide]_ and
+ Google's [#google-styleguide]_ C++ style guides.
+
+
+Unladen Swallow Community
+=========================
+
+We would like to thank the community of developers who have contributed to
+Unladen Swallow, in particular: James Abbatiello, Joerg Blank, Eric Christopher,
+Alex Gaynor, Chris Lattner, Nick Lewycky, Evan Phoenix and Thomas Wouters.
+
+
+Licensing
+=========
+
+All work on Unladen Swallow is licensed to the Python Software Foundation (PSF)
+under the terms of the Python Software Foundation License v2 [#psf-lic]_ under
+the umbrella of Google's blanket Contributor License Agreement with the PSF.
+
+
+References
+==========
+
+.. [#us]
+ http://code.google.com/p/unladen-swallow/
+
+.. [#llvm]
+ http://llvm.org/
+
+.. [#clang]
+ http://clang.llvm.org/
+
+.. [#tested-apps]
+ http://code.google.com/p/unladen-swallow/wiki/Testing
+
+.. [#llvm-hardware]
+ http://llvm.org/docs/GettingStarted.html#hardware
+
+.. [#rebuild-too-much]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=115
+
+.. [#llvm-c-api]
+ http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/
+
+.. [#llvm-26-whatsnew]
+ http://llvm.org/releases/2.6/docs/ReleaseNotes.html#whatsnew
+
+.. [#us-llvm-r820]
+ http://code.google.com/p/unladen-swallow/source/detail?r=820
+
+.. [#us-llvm-r532]
+ http://code.google.com/p/unladen-swallow/source/detail?r=532
+
+.. [#llvm-macports]
+ http://trac.macports.org/browser/trunk/dports/lang/llvm/Portfile
+
+.. [#llvm-ubuntu]
+ http://packages.ubuntu.com/karmic/llvm
+
+.. [#llvm-debian]
+ http://packages.debian.org/unstable/devel/llvm
+
+.. [#clang-debian]
+ http://packages.debian.org/sid/clang
+
+.. [#llvm-fedora]
+ http://koji.fedoraproject.org/koji/buildinfo?buildID=134384
+
+.. [#gdb70]
+ http://www.gnu.org/software/gdb/download/ANNOUNCEMENT
+
+.. [#oprofile]
+ http://oprofile.sourceforge.net/news/
+
+.. [#us-oprofile-punchlist]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=63
+
+.. [#shark]
+ http://developer.apple.com/tools/sharkoptimize.html
+
+.. [#llvm-oprofile-change]
+ http://llvm.org/viewvc/llvm-project?view=rev&revision=75279
+
+.. [#us-oprofile-change]
+ http://code.google.com/p/unladen-swallow/source/detail?r=986
+
+.. [#oprofile-jit-interface]
+ http://oprofile.sourceforge.net/doc/devel/jit-interface.html
+
+.. [#llvm-mingw]
+ http://llvm.org/releases/download.html
+
+.. [#us-r359]
+ http://code.google.com/p/unladen-swallow/source/detail?r=359
+
+.. [#us-r376]
+ http://code.google.com/p/unladen-swallow/source/detail?r=376
+
+.. [#us-r417]
+ http://code.google.com/p/unladen-swallow/source/detail?r=417
+
+.. [#us-r517]
+ http://code.google.com/p/unladen-swallow/source/detail?r=517
+
+.. [#bdfl-27-final]
+ http://mail.python.org/pipermail/python-dev/2010-January/095682.html
+
+.. [#cpy-27a1]
+ http://www.python.org/dev/peps/pep-0373/
+
+.. [#cpy-32]_
+ http://www.python.org/dev/peps/pep-0392/
+
+.. [#us-punchlist]
+ http://code.google.com/p/unladen-swallow/issues/list?q=label:Merger
+
+.. [#us-binary-size]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=118
+
+.. [#us-issue-startup-time]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=64
+
+.. [#zope-interface]
+ http://www.zope.org/Products/ZopeInterface
+
+.. [#bigtable]
+ http://en.wikipedia.org/wiki/BigTable
+
+.. [#mondrian]
+ http://www.niallkennedy.com/blog/2006/11/google-mondrian.html
+
+.. [#us-sqlalchemy-readme]
+ http://code.google.com/p/unladen-swallow/source/browse/tests/lib/sqlalchemy/README.unladen
+
+.. [#us-test_llvm]
+ http://code.google.com/p/unladen-swallow/source/browse/trunk/Lib/test/test_llvm.py
+
+.. [#fuzz-testing]
+ http://en.wikipedia.org/wiki/Fuzz_testing
+
+.. [#pyfuzz]
+ http://bitbucket.org/ebo/pyfuzz/overview/
+
+.. [#fusil]
+ http://lwn.net/Articles/322826/
+
+.. [#us-memory-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=68
+
+.. [#us-benchmarks]
+ http://code.google.com/p/unladen-swallow/wiki/Benchmarks
+
+.. [#students-t-test]
+ http://en.wikipedia.org/wiki/Student's_t-test
+
+.. [#smaps]
+ http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html
+
+.. [#us-background-thread]
+ http://code.google.com/p/unladen-swallow/source/browse/branches/background-thread
+
+.. [#us-background-thread-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=40
+
+.. [#us-import-tests]
+ http://code.google.com/p/unladen-swallow/source/detail?r=888
+
+.. [#us-tracing-tests]
+ http://code.google.com/p/unladen-swallow/source/diff?spec=svn576&r=576&format=side&path=/trunk/Lib/test/test_trace.py
+
+.. [#us-perf-punchlist]
+ http://code.google.com/p/unladen-swallow/issues/list?q=label:Performance
+
+.. [#jit]
+ http://en.wikipedia.org/wiki/Just-in-time_compilation
+
+.. [#urs-self]
+ http://research.sun.com/self/papers/urs-thesis.html
+
+.. [#us-projectplan]
+ http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
+
+.. [#us-relevantpapers]
+ http://code.google.com/p/unladen-swallow/wiki/RelevantPapers
+
+.. [#us-llvm-notes]
+ http://code.google.com/p/unladen-swallow/source/browse/trunk/Python/llvm_notes.txt
+
+.. [#psf-lic]
+ http://www.python.org/psf/license/
+
+.. [#v8]
+ http://code.google.com/p/v8/
+
+.. [#squirrelfishextreme]
+ http://webkit.org/blog/214/introducing-squirrelfish-extreme/
+
+.. [#rubinius]
+ http://rubini.us/
+
+.. [#parrot-on-llvm]
+ http://lists.parrot.org/pipermail/parrot-dev/2009-September/002811.html
+
+.. [#macruby]
+ http://www.macruby.org/
+
+.. [#hotspot]
+ http://en.wikipedia.org/wiki/HotSpot
+
+.. [#psyco]
+ http://psyco.sourceforge.net/
+
+.. [#pypy]
+ http://codespeak.net/pypy/dist/pypy/doc/
+
+.. [#inlining]
+ http://en.wikipedia.org/wiki/Inline_expansion
+
+.. [#unboxing]
+ http://en.wikipedia.org/wiki/Object_type_(object-oriented_programming)
+
+.. [#us-inlining]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=86
+
+.. [#us-styleguide]
+ http://code.google.com/p/unladen-swallow/wiki/StyleGuide
+
+.. [#llvm-styleguide]
+ http://llvm.org/docs/CodingStandards.html
+
+.. [#google-styleguide]
+ http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
+
+.. [#us-recompile-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=41
+
+.. [#us-regex-perf]
+ http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Regular_Expressions
+
+.. [#us-bm-re-compile]
+ http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_compile.py
+
+.. [#us-bm-re-v8]
+ http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_v8.py
+
+.. [#us-bm-re-effbot]
+ http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_effbot.py
+
+.. [#us-regex-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=13
+
+.. [#pygame]
+ http://www.pygame.org/
+
+.. [#numpy]
+ http://numpy.scipy.org/
+
+.. [#pypy-bmarks]
+ http://codespeak.net:8099/plotsummary.html
+
+.. [#llvm-users]
+ http://llvm.org/Users.html
+
+.. [#hlvm]
+ http://www.ffconsultancy.com/ocaml/hlvm/
+
+.. [#llvm-far-call-issue]
+ http://llvm.org/PR5201
+
+.. [#llvm-jmm-rev]
+ http://llvm.org/viewvc/llvm-project?view=rev&revision=76828
+
+.. [#llvm-memleak-rev]
+ http://llvm.org/viewvc/llvm-project?rev=91611&view=rev
+
+.. [#llvm-globaldce-rev]
+ http://llvm.org/viewvc/llvm-project?rev=85182&view=rev
+
+.. [#llvm-availext-issue]
+ http://llvm.org/PR5735
+
+.. [#us-specialization-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=73
+
+.. [#us-direct-calling-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=88
+
+.. [#us-fast-globals-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=67
+
+.. [#traces-waste-of-time]
+ http://www.ics.uci.edu/~franz/Site/pubs-pdf/C44Prepub.pdf
+
+.. [#traces-explicit-pipeline]
+ http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-07-12.pdf
+
+.. [#tracemonkey]
+ https://wiki.mozilla.org/JavaScript:TraceMonkey
+
+.. [#llvm-langref]
+ http://llvm.org/docs/LangRef.html
+
+.. [#us-wider-perf-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=120
+
+.. [#us-nbody]
+ http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
+
+
+
More information about the Python-checkins
mailing list