[Chicago] Advice about a Java program?

Sun Oct 12 19:04:28 CEST 2014

A lot of what a computer does is translating between different languages.
Java is a language. Python is a language. Their respective bytecodes are
languages. The machine code of the system is a language. Java gets compiled
to JVM bytecode (.class files) by javac. Python gets compiled to Python
bytecode (the .pyc files) when you call python, but it's different from JVM
bytecode. In Jython, an alternative to CPython, you can run a .py file and
it will create JVM bytecode in-memory (no .class file is saved to the
filesystem). Clojure works similarly, generating JVM bytecode as you
evaluate code. The Java Virtual Machine interprets bytecode for you so it
can run on your machine. Or at least, it used to. Nowadays, the JVM has
something called a "Just in Time" (JIT) optimizer, which will compile your
Java bytecode to machine code (again, all in memory), and execute it
directly to speed things up.

The JVM is essentially a relatively simple assembly language, with built-in
support for Java's object system. It is a little limiting in some ways
(maybe look up "invokedynamic" and "JVM tail recursion"), but because it is
relatively well-engineered and has a large library ecosystem that works
fairly well cross-platform (on Windows, Linux, Mac) many languages use it
as a backend: Clojure, Groovy, Scala, Jython, Fantom, etc.

Clojure and Common Lisp are both lisps, but they are as different as day
and night (respectively). Common Lisp is one of the world's oldest
programming languages. Clojure came out around 5 years ago, I believe.
Common Lisp relies heavily on destructive updates for performance (it was
made in a type where modern-day functional programming was not possible due
to memory and speed constraints). Like the urban myth about Eskimos having
a dozen words for snow, it is actually *true* in the case of Common Lisp
that there are a dozen (all distinct) ways of doing assignment. (Something
every other language manages with just one = sign).

Clojure on the other hand was designed with modern, large-scale,
distributed systems in mind. I highly recommend watching videos and talks
on YouTube by the creator, Rich Hickey. He is quite a funny guy.

I don't know if many people still use the phrase "AI" any more in research.
The in-vogue thing is to call it "machine learning" and be up-front about
how it's all just linear algebra and statistical methods piled on
Google-sized warehouses of data.

Lisp was a favorite among early AI researchers for a few reasons, I think.
First, Lisp was invented in academia, and stayed there for a long time.
Researchers were familiar with it and the benefits it provided over the
alternatives at the time. Most people were still using assembler when Lisp
came out. Even fifteen years later, the industry had just C as their silver
bullet. But assembler and C are terrible mismatches for the kind of
programming done by AI researchers. The data structures of assembler are
merely what's allowed by the machine architecture. In C, you have the
machine data types and structs, which let you create tuples of those types.
But it is incredibly difficult in C to create any kind of data type that
requires a dynamic amount of memory (especially if it starts changing in
size), because you have to manage your own garbage collection.

Lisp on the other hand supported linked lists as a fundamental data type,
along with a special type called a symbol.Symbols were useful for tagging
data structures with metadata. And because of lisp's macro system, you
could easily create small languages inside itself that better modeled what
you were doing.

Nowadays, the advantages of lisps aren't quite as pronounce. (And the same
goes for the disadvantages). Computers are fast enough to run Lisp and the
"it's too slow and memory hungry" argument is moot for many problem
domains. But most languages now have garbage collection now (which was
something invented specifically for lisp), and so creating dynamically
sized data structures is not painful.

And just to mention Haskell, as it is one of my primary languages. Haskell
is something very different. It's still a language that grew up in
academia, but it had a very different goal in mind: compiler research for
statically typed languages, and in particular, studying the implementation
of "lazy evaluation", where all values (arithmetic expressions, calls to
function, etc) are computed on demand. It makes it possible to do
interesting things you can't in other languages. For instance, if I wanted
to write the AI for a computer game, in Python, I might write an algorithm
that grows the search tree as I walked down it. My algorithm would
necessarily need to know both about how to generate the tree and how to
traverse it. But in Haskell, I can split those two concerns up: I can
simply generate an infinite(!) search tree then hit it with a naive
breadth-first search algorithm. At first glance, it looks like my code
generates the entire (infinite) tree, then passes it to the search. But
Haskell doesn't evaluate any more of the tree than is actually necessary to
do the search. Once my algorithm is complete, the only bits of the search
tree that actually get computed and loaded into memory are those that the
search needed to look at.

On Sat, Oct 11, 2014 at 10:19 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:

> That's a lot of interesting stuff to digest!  Sounds like Clojure might be
> a language that I would like to explore when I have some time.  (Not sure
> when that will be!  Right now it feels like I'm juggling several dishes
> with one hand tied behind my back. )
>
> Yes, I've heard some pretty bad things about Stephen Wolfram.  The guy is
> first and foremost a businessman.  He likes money!  But then again we all
> do, right?  I love Maple and Mathematica.  I learned Maple in the math
> department at NEIU, and I started working with Mathematica when I began
> teaching math courses at Oakton.  However, as much as I love Maple and
> Mathematica, it appears that both of them are in the shadow of Matlab,
> which has become extremely popular at every college and university that has
> an engineering program.  But Matlab is mostly numerical.  Maple and
> Mathematica really excel at computer algebra and symbolic computation.  I
> **THINK** you can declare symbolic variables with Python's sympy module,
> but honestly it's been a while since I played with those commands in
> Python.  (And where is the supporting documentation for sympy? )
>
> Kind of a dumb question here.  How can Lisp (or any other programming
> language for that matter) use the JVM, the Java interpreter?  I don't get
> it.  Isn't that like a French student carrying around a Spanish dictionary?
>   : )
>
> Is Clojure similar or almost the same as "Common Lisp"?  I think
> artificial intelligence researchers are really into functional programming
> languages, such as Lisp and Haskell.  I wonder why they are so into
> functional programming.
>
> Isn't Python essentially 50% procedural, 40% object-oriented and 10%
> functional?  (Those are NOT official statistics!  Just guesses or estimates
> based on my limited knowledge of the language. )
>
> On Sat, Oct 11, 2014 at 2:51 AM, Michael Maloney <tac at tac-tics.net> wrote:
>
>> Yeah. Clojure is a lisp that runs on the JVM. It's notable for being one
>> of the only lisps that seems to have gained any popularity among people
>> writing real-world code. (Lisp has always been a powerful, but comically
>> unpopular language). It is not object oriented, but rather, functional,
>> relying on immutable data structures and and data transformations rather
>> than mutable updates. It is dynamic in a way similar to Python. But it's
>> macro system is much more flexible than Python's metaprogramming
>> capabilities. It is a very good choice for building "domain specific
>> languages" that better model your problem domain than the out-of-the-box
>> language does.
>>
>> Mathematica is a powerful software package, but Wolfram himself is a
>> pretty despicable guy. He allegedly screwed over a few of his cofounders
>> early on in his company's career and took their credit and intellectual
>> property. The marketing for his company's products are hyped to the point
>> of nausea. And his views on mathematics are also borderline crank. His
>> book, "A New Kind of Science" greatly exaggerates the importance of his
>> work, makes pseudo-scientific claims, and has nothing close to the level of
>> rigor expected from either science nor mathematics.
>>
>> On Sat, Oct 11, 2014 at 1:47 AM, eviljoel <eviljoel at linux.com> wrote:
>>
>>> Hey Doug,
>>>
>>> You can figure this out by referencing the excellent Javadocs.  The one
>>> for Integer is here:
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html
>>>
>>> As you can see, the parent of Integer is Number.
>>>
>>> Thanks,
>>> eviljoel
>>>
>>>
>>> On 10/10/2014 06:46 PM, Lewit, Douglas wrote:
>>> > Interesting stuff about rings.  I read somewhere on Google last night
>>> > that the whole matrix multiplication thing could be simplified if I
>>> > could find the superclass that contains the Integer class and Double
>>> > class, and then declare my array to belong to that particular object
>>> > type.  However, what is the superclass for Integers and Doubles?  I
>>> > haven't got a clue!  Remember guys that this is my second semester of
>>> > Java!  (But I've got a little more math under my belt than the average
>>> > bear--or average programmer. )
>>> >
>>> > Doug.
>>> >
>>> > On Fri, Oct 10, 2014 at 3:47 PM, Michael Maloney <tac at tac-tics.net
>>> > <mailto:tac at tac-tics.net>> wrote:
>>> >
>>> >     One last thing I wanted to say!
>>> >
>>> >     You mentioned that it's nice that Python allows ints and floats to
>>> >     be kept in the same list. I just wanted to issue a warning on that.
>>> >
>>> >     In 99.9% of situations, /keep your lists homogeneous/. That is,
>>> >     *don't* keep more than one type of data in them. (This goes for any
>>> >     sort of collection).
>>> >
>>> >     The basic operations on lists are essentially:
>>> >
>>> >     1) taking a list and mapping a function over it
>>> >     2) taking the list and filtering out items that don't interest
>>> you, and
>>> >     3) reducing or folding the list into a summary value
>>> >
>>> >     All three of these require that the items in the list share some
>>> >     common operations. For mapping, you need to supply a function that
>>> >     behaves for every element in the list. For filtering, you need to
>>> >     supply a test that makes sense for all elements. And for reduction,
>>> >     you need some binary operation that works with each element.
>>> >
>>> >     So while it may seem neat that you can store elements of different
>>> >     types, if you end up with, say, a list containing bools and ints
>>> >     together, the number of "shared operations" that work on these is
>>> >     much smaller than either a homogeneous list of bools or a
>>> >     homogeneous list of ints separately.
>>> >
>>> >     It /is/ possible in Python to do a run-time test of what type an
>>> >     object is, (and then presumably doing something intelligent based
>>> on
>>> >     the result). However, it is arguably bad coding practice. It's
>>> >     easier to reason about code that acts "uniformly" on their inputs,
>>> >     rather than dispatching based on type. Runtime type inspection is
>>> >     also slightly nuanced when dealing with subtyping, and in Python,
>>> it
>>> >     is limited in what it can tell you. (You can tell an integer is an
>>> >     integer, but given a function, you can't tell what inputs are valid
>>> >     nor what outputs to expect back from it).
>>> >
>>> >     In practice, mathematical and algorithmic code tends to be
>>> >     specialized to one data type. (And perhaps re-implemented
>>> separately
>>> >     for different types, say for single-precision floats then for
>>> >     double-precision floats). This is because performance is often
>>> >     paramount in these domains. But just realize that this kind of
>>> thing
>>> >     is done because it's necessary, not because it's a good thing to
>>> do.
>>> >
>>> >     If execution speed wasn't an issue, you would want to parametrize
>>> >     your matrix code by the data type. Java can do this (to some
>>> degree)
>>> >     with subtyping and generic types. (It's a non-issue for Python
>>> >     because of its lack of static typing).
>>> >
>>> >     At the risk of alienating some people, in mathematics, matrices are
>>> >     commonly done over some arbitrary ring. (A ring is a set of numbers
>>> >     which support addition, subtraction, and multiplication, but may or
>>> >     may not support division). In Java, you might consider defining an
>>> >     abstract base class called Ring with four methods: add, negate, and
>>> >     multiply. Then, for your matrix code, instead of working with
>>> >     int[][]'s or double[][]'s, you would have Ring[][]. Then, during
>>> >     matrix multiplication, any place you would use matrix1[i][j] *
>>> >     matrix2[j][k], you would instead write it as
>>> >     matrix1[i][j].multiply(matrix2[j][k]) (and analogously for
>>> addition).
>>> >
>>> >     Then, you could create an IntRingclass where multiplication is just
>>> >     integer multiplication, etc, and a FloatRingwith the operations for
>>> >     floats. I could also later decide to create a ComplexRing, and now,
>>> >     without any changes to my code (which I made into a library and
>>> >     published to Github), now works for complex numbers. I could also
>>> >     create a ModularIntRing, where operations are taken mod some number
>>> >     for my (slow-ass, unverified) crypto implementations! I could even
>>> >     have really fancy classes like PolynomialRingor PowerSeriesRing,
>>> and
>>> >     now I can do a bit of symbolic mathematics. Or I might make my
>>> >     Matrixclass itself a Ring, and now I can work with block matrices
>>> >     (with square blocks).
>>> >
>>> >     Of course, again, none of this matters in Python. Python trades
>>> away
>>> >     any sort of static guarantees about your program for an incredible
>>> >     amount of flexibility. The Java version, on the other hand, makes
>>> >     you jump through more hoops up front, but it will also catch more
>>> >     errors at compile time.
>>> >
>>> >     I think I got carried away. If any of this doesn't make sense, just
>>> >     ignore it!
>>> >
>>> >
>>> >     On Fri, Oct 10, 2014 at 3:11 PM, Michael Maloney <tac at tac-tics.net
>>> >     <mailto:tac at tac-tics.net>> wrote:
>>> >
>>> >         @Philip, I believe Douglas is taking a class, so unfortunately,
>>> >         Clojure is probably not an option. (Although I encourage anyone
>>> >         to look at Clojure).
>>> >
>>> >         A few comments on the code:
>>> >
>>> >         Watch your alignment. On line 11, for example, the block inside
>>> >         the main method lines up with the declaration. You want to tab
>>> >         it. Even though Java doesn't enforce indentation, you should
>>> >         pretend it is. On line 18 and other places, you've tabbed the
>>> >         curly brace. This is a relatively stylistic choice. (I think C
>>> >         programmers use it still, though?) Java's official style guide
>>> >         says opening curly braces should come at the end of the same
>>> >         line, closing curly braces should line up with the if/for/while
>>> >         statement or method declaration that opened it:
>>> >
>>> >         while (...) {
>>> >             // ...
>>> >             // ...
>>> >             // ...
>>> >         }
>>> >
>>> >         At 101 lines of code, your main method is excruciatingly long.
>>> >         Most methods you write should to be between 1 and ~8 lines
>>> long.
>>> >         Highly algorithmic code (say, an implementation of a mergesort)
>>> >         might be around 30 lines long. The main method of a script
>>> might
>>> >         be that long too in some cases. But 101 is enough to exhaust
>>> >         anyone's attention. As I mention in the other email I sent, you
>>> >         shouldn't need to double-space all of your code. And you should
>>> >         consider splitting the main method up into separate smaller
>>> >         "helper" methods.
>>> >
>>> >         A good way to do this might be to break out these pieces: the
>>> >         code to read the user's dimension input (~18 lines), the user's
>>> >         choice of data type (~10 lines), the user's entry input (~20
>>> >         lines), and the output code (~30 lines).
>>> >
>>> >         Even in cases where splitting a method up into separate pieces
>>> >         doesn't decrease the total line count of your program, it often
>>> >         helps your codes readability considerably. Especially if you
>>> >         choose your method names carefully, it's easier to glance at
>>> the
>>> >         function call and /guess/ what it should be doing, without
>>> >         having to be presented with the gory details.
>>> >
>>> >         I know none of this addresses your question directly, but
>>> often,
>>> >         having your code more organized will help you isolate the
>>> errors
>>> >         you run into. Rearranging your code to make it more
>>> >         understandable is what we call this refactoring. I don't know
>>> >         about the rest of the community, but I've always found it very
>>> >         relaxing. Like trimming a bonzai tree or raking the sand in a
>>> >         zen garden :)
>>> >
>>> >
>>> >         On Fri, Oct 10, 2014 at 1:21 PM, Philip Doctor
>>> >         <diomedestydeus at gmail.com <mailto:diomedestydeus at gmail.com>>
>>> wrote:
>>> >
>>> >
>>> >             For console output I'd recommend checking the docs on how
>>> to
>>> >             pad numbers into columns
>>> >             (
>>> http://docs.oracle.com/javase/tutorial/java/data/numberformat.html).
>>> >             If this is not a homework assignment and you're allowed to
>>> >             use 3rd party libraries I've never used it but I heard good
>>> >             things about https://code.google.com/p/j-text-utils/ .
>>> >
>>> >             Of course if you're not doing it for a class I'm not
>>> totally
>>> >             sure why you would reinvent the wheel on matrix
>>> >             multiplication as there's tons good math libraries out
>>> there
>>> >             for java that will do this:
>>> >
>>> >             http://commons.apache.org/proper/commons-math/
>>> >             http://math.nist.gov/javanumerics/jama/
>>> >             https://code.google.com/p/efficient-java-matrix-library/
>>> >
>>> >             (a dozen more if you google it)
>>> >
>>> >             Best of luck (p.s. if jvm is a requirement but java isn't,
>>> >             I'm going to fan-boy plug clojure as a language you might
>>> >             enjoy more given your statements about python).
>>> >
>>> >             /off-topic
>>> >
>>> >
>>> >             On Fri, Oct 10, 2014 at 12:43 PM, Lewit, Douglas
>>> >             <d-lewit at neiu.edu <mailto:d-lewit at neiu.edu>> wrote:
>>> >
>>> >                 This is probably the wrong forum for this, but I
>>> thought
>>> >                 I would give it a try because the people at my
>>> >                 university cannot always be counted on for good
>>> feedback.
>>> >
>>> >                 I wrote this Java program that multiples two matrices.
>>> >                 I think it's basically pretty good.  (And doing this in
>>> >                 Python is WAY EASIER because Python doesn't distinguish
>>> >                 between lists of ints and lists of doubles, and
>>> actually
>>> >                 allows both data types to get combined in the same
>>> list. )
>>> >
>>> >                 However, I'm having some issues with the formatted
>>> >                 output of my "float" matrices.  They are technically
>>> >                 doubles, but in the program I refer to them as floating
>>> >                 point values for the sake of clarity because some users
>>> >                 of the program may not know what a double is.  Is that
>>> >                 like a double martini?  : )
>>> >
>>> >                 I'm trying to get all of my numbers lined up properly
>>> in
>>> >                 their respective columns, but it's just not working out
>>> >                 that way, even with the *printf *command.....???
>>> >
>>> >                 If anyone can offer some good suggestions about good
>>> >                 formatting, that would be great.  I would really
>>> >                 appreciate it.
>>> >
>>> >                 By the way, I did the same thing in Python and it took
>>> >                 less than half as much code!  The Python code was short
>>> >                 and to the point.  I guess Java has its uses, but for
>>> >                 some things it is really tedious and overly
>>> >                 complicated.  Ah well.... but then again Java
>>> developers
>>> >                 make really good money, so I guess I'll have to study
>>> >                 both Java AND Python!
>>> >
>>> >                 Take care and thanks for the feedback.
>>> >
>>> >                 Best,
>>> >
>>> >                 Douglas Lewit
>>> >
>>> >                 _______________________________________________
>>> >                 Chicago mailing list
>>> >                 Chicago at python.org <mailto:Chicago at python.org>
>>> >                 https://mail.python.org/mailman/listinfo/chicago
>>> >
>>> >
>>> >
>>> >             _______________________________________________
>>> >             Chicago mailing list
>>> >             Chicago at python.org <mailto:Chicago at python.org>
>>> >             https://mail.python.org/mailman/listinfo/chicago
>>> >
>>> >
>>> >
>>> >
>>> >     _______________________________________________
>>> >     Chicago mailing list
>>> >     Chicago at python.org <mailto:Chicago at python.org>
>>> >     https://mail.python.org/mailman/listinfo/chicago
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Chicago mailing list
>>> > Chicago at python.org
>>> > https://mail.python.org/mailman/listinfo/chicago
>>> >
>>>
>>>
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> https://mail.python.org/mailman/listinfo/chicago
>>>
>>>
>>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> https://mail.python.org/mailman/listinfo/chicago
>>
>>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20141012/f18ccde4/attachment-0001.html>