When will Java go mainstream like Python?

Wanja Gayk brixomatic at yahoo.com
Thu Feb 25 14:08:43 EST 2010


Am 25.02.2010, 02:05 Uhr, schrieb Lawrence D'Oliveiro  
<ldo at geek-central.gen.new_zealand>:

> In message <op.u8nfpex8y5e8ok at laptopwanja>, Wanja Gayk wrote:
>
>> Reference counting is about the worst technique for garbage collection.
>
> It avoids the need for garbage collection. It means I can write things  
> like
>
>     contents = open(filename, "r").read()
>
> and know the file object will be immediately closed after its contents  
> are returned.

Which has nothing to do with reference counting in general.
This is what JVMs can do aswell.
Just have a look at the behaviour of Weak- and SoftReferences to get an  
idea what goes on behind the scenes.
For example:
int x = new Point(10,20).getX();
The point object will be collectable as soon as getX() returned - even  
better: todays JIT compilers will inline the call and not even allocate  
the object, if it's allocation is free of side effects.
What you write above is something entirely different: it's just a  
mechanism to release file handles, syntactic sugar for what you have in  
java when you close a stream in a finally-block.

If you like to know a litle more about reference counting, feel free to  
study the following paper on a very sophistcated reference counting GC for  
the JVM by Levanoni and Petrank:
http://reference.kfupm.edu.sa/content/e/o/e___an_on_the_fly_reference_counting_gar_61636.pdf
The paper talks about some ofthe problems that occur with reference  
counting.
Their algorithm uses some nice techniques to avoid thread contention (the  
reference-counter updates must be atomic, thus require that all "mutators"  
are stopped) and other problems connected to "referece counting" in  
general, such as slow writes (-> cache coherence, etc. read-only access is  
faster).
What the paper doesn't tell is how they deal with fragmented heap, so I  
wouldn't trust that this Refcount-GC will have a stable performance over a  
long period of time (we're talking apps like Ebay, which run 24/7 for  
months if not years).
Their algorithm may have a "comparable" speed to the (meanwhile) old  
implementation the JVM's GC, but it still needs the original JVM clean up  
what it leaves behind(it forgets some references) - maybe that's also how  
they deal with a fragmeted heap, as java uses a compacting collector.
So, yes, it can be fast, or let's say responsive, but in general it's not.  
And when it is up to speed you get stability issues at least I have not  
heard anything different yet.

To have some numbers:
Benchmark Original RC
Total 2582.2 2676.0
compress 720.8 723.3
db 374.0 383.7
jack 264.6 299.7
javac 225.0 235.2
jess 181.7 209.7
mpegaudio 607.1 610.6

As you see: SUN's GC is faster. Even though: what those folks have done is  
impressive (but buggy).

> It also means I don’t have to predefine how much memory my program will  
> use.

Which again has nothing to do with GC. This has been a design decision.
In fact todays JVM dynamically allocate memory and gro their heaps.
A quick look at the command line options will make that clear:
java -X
  [...]
    -Xms<size>        set initial Java heap size
    -Xmx<size>        set maximum Java heap size

Todays GC run as folows, they initialize the heap size at what you giove  
them in your Xmx-Setting (or use the default).
When the heap is filled up to that initial size, they do a full GC and if  
that won't free enough memory, they will allocate some more.
This is done until the upper limit from -Xmx is hit, when the program  
requires more than that, it will fail with an OutOfMemoryError.
As you can imagine those limits may influence the GC, but they are no  
requirement for a GC.
Those limits exists for the security of the host system (which may not be  
a system that allows setting own itself, like Unix/Linux).

> A garbage collector will only kick in when the app runs low on memory.

Wrong. Please inform yourself. Current generational GCs have minor and  
major collections. The JLS only requires that before the program runs out  
of memory the garbage has to be collected, which does not mean it cannot  
do that earlier. In fact you may also request a full GC by calling  
"System.gc()" when you know there's time for it. But it's not recommended  
in general. With JDK 7 we will get a new "Garbage First" Collector, which  
works differently than the ones we have today. It will promise lower  
response times and a more predictable behaviour with about the same speed.  
Starting with Java6 update 14 you may try that one (if you like, I can  
give ou the command line switch to activate it). Basically it is also a  
copying collector and in some way it's also generational, but it's  
"promotion" strategy is different.
Anyway, it also won't only clean up when memory is low, but when it  
detects that there is a segment which is hardly live.

> On a
> system with dynamic memory allocation, that will not happen until the  
> entire
> system runs low on memory, unless you limit the app to some fixed amount.
> Which is an antiquated way of doing things.

Which is wrong, see above.
Today eve the C++ folks start using Garbage Collectors, because  
allocating/deallocating space using free-lists has proven to be slower  
than copying live objects.

> And then there’s caching. Modern CPUs owe most of their speed to  
> assumptions that programs will obey locality of reference.

Well, and what's the point? Stack allocation is just not necessary in a  
copy-colllector, bevause such a collector knows no deallocation (it just  
saves the survovors), so there is no gain from stack allocation. Stack  
allocation is faster in traditional C++ programs (without GCs), because  
the free-lists have to be managed.
Todays HotSpot-Compilers take memory-locality into account aswell, just  
read some papers about it, have a look at "escape analysis" for example.

Some further reading:
http://www.ibm.com/developerworks/java/library/j-jtp09275.html
http://www.ibm.com/developerworks/java/library/j-jtp04223.html

> Pointer-chasing is a cache- hostile activity. Garbage collection  
> involves a lot of pointer-chasing,
> particularly of dead objects that have long since been flushed from the
> cache, compared with reference counting of recently-accessed objects.
> Therefore garbage collection loses performance.

You haven't read my previous posts, have you?
Dead objects are not considered by today's Garbage collectors.
Those trace only "reachable" objects and copy those. They won't even look  
at "dead" objects.
Since most objects die young, this is where you gain a lot of performance.

http://www.ibm.com/developerworks/library/j-jtp01274.html

Reference counters, on the other hand, must detect when the reference  
count falls to zero. So they must look at each object, no matter if they  
are about to die, just to find out if they will die.

Regards
-Wanja-

-- 
Erstellt mit Operas revolutionärem E-Mail-Modul: http://www.opera.com/mail/

--- news://freenews.netfront.net/ - complaints: news at netfront.net ---



More information about the Python-list mailing list