[pypy-svn] r76592 - pypy/extradoc/planning
fijal at codespeak.net
fijal at codespeak.net
Wed Aug 11 18:12:32 CEST 2010
Author: fijal
Date: Wed Aug 11 18:12:31 2010
New Revision: 76592
Modified:
pypy/extradoc/planning/jit.txt
Log:
(anto, fijal, arigo)
Clean up the jit-related tasks.
Modified: pypy/extradoc/planning/jit.txt
==============================================================================
--- pypy/extradoc/planning/jit.txt (original)
+++ pypy/extradoc/planning/jit.txt Wed Aug 11 18:12:31 2010
@@ -1,21 +1,20 @@
NEW TASKS
---------
-- look at assembler-assembler calls again: if the inner function is traced
- after the outer one, the call is slow. other cases could be faster too,
- probably.
+- look at assembler-assembler calls again: if the inner function is
+ traced after the outer one, the call is slow. Might be solved
+ easily if we implement full out-of-line guards (e.g. by invalidating
+ the outer function when the inner one gets compiled)
- have benchmarks for jit compile time and jit memory usage
- trace into functions even if they have a loop. only if the loop is actually
- hit, a residual portal call is produced
+ hit, a residual portal call is produced (status: kill-caninline branch,
+ buggy)
- generators are not really fast – maybe add a JUMP_ABSOLUTE_GENERATOR that
does not call can_enter_jit after an iteration in which there was a yield.
- obviously.
-
-- think about handling hidden registers - mayb we can use couple of
- first spots on the stack as registers
+ obviously. (status: kill-caninline branch)
- think again about perfect specialization. Check if we loose anything
if we turn it off. Another approach to specialization: specialize things
@@ -28,60 +27,25 @@
current exception from the struct in memory, followed by a regular
GUARD_CLASS.
-- getfields which result is never used never get removed (probably cause -
- they used to be as livevars in removed guards). also getfields which result
- is only used as a livevar in a guard should be removed and encoded in
- the guard recovert code.
+- write a document that says what you cannot expect the jit to optimize.
+ E.g. http://paste.pocoo.org/show/181319/ with B being old-style and
+ C being new-style, or vice-versa.
-- think about strings more. since string are immutable, unnecessary copies
- does not make any sense (sometimes people construct strings through
- arrays, which is harder to track and then not use the result)
-TASKS
------
+OPTIMIZATIONS
+-------------
-- think about code memory management
-
-- forcing virtualizables should only force fields affected, not everything
+Things we can do mostly by editing optimizeopt.py:
-- think out looking into functions or not, based on arguments,
- for example contains__Tuple should be unrolled if tuple is of constant
- length. HARD, blocked by the fact that we don't know constants soon enough
-
-- look at example of storing small strings in large lists (any sane templating
- engine would do it) and not spend all the time in
- _trace_and_drag_out_of_nursery.
- Requires thinking about card-marking GC, which is hard, postpone
-
- Card marking GC is hard in our model. Think about solutions to chunked
- list (a list that if big enough is a list of lists of stuff). Experiments
- show that it helps a lot for some examples.
-
-- improve tracing/blackholing speed
- Essential, especially blackholing in translate.py, as well as html5lib.
-
-- some guards will always fail if they ever start failing
- (e.g. the class version tag). Do something more clever about it.
- (already done a hack that helps: don't compile more guard_value's
- if the value guarded for keeps changing fast enough: r71527)
+- getfields which result is never used never get removed (probably cause -
+ they used to be as livevars in removed guards). also getfields which result
+ is only used as a livevar in a guard should be removed and encoded in
+ the guard recovert code (only if we are sure that the stored field cannot
+ change)
- int_add_ovf(x, 0) guard_overflow is 20% of all int_add_ovf, not much
overall, but probably worth attacking
-- think about such example:
-
- http://paste.pocoo.org/show/188520/
-
- this will compile new assembler path for each new type, even though that's
- overspecialization since in this particular case it's not relevant.
- This is treated as a megamorphic call (promotion of w_self in typeobject.py)
- while in fact it is not.
-
-- a suggestion - if we call some code via call_assembler that raises an
- exception, in theory we could do something smarter in case our frames
- don't escape and call simplified version that does not allocate all
- frames. Sounds hard
-
- if we move a promotion up the chain, some arguments don't get replaced
with constants (those between current and previous locations). So we get
like
@@ -93,102 +57,38 @@
maybe we should move promote even higher, before the first use and we
could possibly remove more stuff?
-Python interpreter:
-
-- goal: on average <=5 guards per original bytecode.
- Almost achieved :-) pypy/jit/tool/traceviewer.py can view where we're
- failing (red boxes out of jit-log-opt logging)
-
-- put the class into the structure to get only one promote when using an
- instance
-
-- this example: http://paste.pocoo.org/show/181319/
- showcases a problem that works fine as long as you not present a
- combination of oldstyle and newstyle classes. If you however present
- a combination of old and newstyle classes (try modifying) things go
- far slower and traces look bad.
- DON'T DO THAT?
-
-Benchmark Notes
-----------------------------
- - spitfire:
- - it's an issue with GC, that's probably won't fix. On spitfire's small
- benchmarks we're either matching CPython or are faster (if using cStringIO)
+PYTHON EXAMPLES
+---------------
- - html5lib:
- - we're faster (a bit) than CPython on a long enough run. blackholing is
- an issue there.
- - slowness seems to be mostly the fault of PyUnicode_DecodeCharmap in
- module/_codecs/app_codecs.py. Are such things not jitted?
- - the tokenizer uses regular expressions and generators, which probably
- doesn't help
+Extracted from some real-life Python programs, examples that don't give
+nice code at all so far:
- - spambayes
- - uses regular expressions and generators a lot
- - regexes are 80% of runtime of long-enough run
-
- - ai
- - the slowness is the fault of generators and generator expressions
- - many of the generator expressions are a bit stupid (like tuple(<genexp>))
- WON'T FIX, maybe?
-
-
-JIT-related Release Tasks
----------------------------
-
-(there are other release tasks, specifically about packaging, documentation,
-website and stability that need sorting out too. However, they are beyond the
-scope of this section)
-
-wishlist:
-- the checks that look whether profiling/tracing in the Python interpreter is
- enabled look expensive. Do we want to do something about them?
-
-
-
-
-META
------
-
-- stability!
-
-- keep test coverage in check
-
-- prevent too much method and fields demoting in the jit
-
-- the tragedy of the skipped tests
-
-- update things in metainterp/doc
-
-inlining discussion
---------------------
-
-- at some point we need to merge the tails of loops, to avoid exponential
- explosion
-- tracing aggressively will put pressure on the speed of tracing
-- what should we do about recursive calls?
+- http://paste.pocoo.org/show/188520/
+ this will compile new assembler path for each new type, even though that's
+ overspecialization since in this particular case it's not relevant.
+ This is treated as a megamorphic call (promotion of w_self in typeobject.py)
+ while in fact it is not.
+- pypy/objspace/std/inlinedict: put the class into the structure to get
+ only one promote when using an instance, instead of two: the promotion
+ of the '.w__class__' and the promotion of the '.structure'
-things we know are missing
----------------------------
+- guard_true(frame.is_being_profiled) all over the place
-tests:
-- find a test for r64742 (JitException capture)
+- xxx (find more examples :-)
-Goals/Benchmarks
------------------
-Goal: be somehow faster than CPython in real programs
- actually, DONE!
+LATER (maybe) TASKS
+-------------------
-Benchmarks:
- they live at svn+ssh://codespeak.net/svn/pypy/benchmarks
+- think about code memory management
+- think out looking into functions or not, based on arguments,
+ for example contains__Tuple should be unrolled if tuple is of constant
+ length. HARD, blocked by the fact that we don't know constants soon enough
-ootype discussion
-------------------
+- out-of-line guards (when an external change would invalidate existing
+ pieces of assembler)
-- try to unify interfaces to make doing the right thing for ootype easier
-- different constraints for different groups of people
-- what to do with ootype jit support after Anto finished his PhD?
+- merge tails of loops-and-bridges?
More information about the Pypy-commit
mailing list