Why isn't Python king of the hill?

Sat Jun 2 06:51:44 EDT 2001

Martijn writes, sometimes quoting me:

> Ah, a powerful query system, then. Difficult, too. Could you give
> a real life example of an actual arbitrary retrieval system? It's
> not full-text-search search, right? (the ZCatalog I believe already
> can do something like that, on multiple attributes). 

Arbitrary and full-text are more or less identical. I can't speak
competently about ZCatalog. If it's got it, that's great -- but
I'm not sure what it is exactly, or how effective or efficient it
is to use that on a collection of large text. :)

> > I can't speak to how easy it is to move from one app server to
> > another. I /can/ speak for how much of the "hard work" is done
> > for you by some of these enterprise containers -- a whole lot!
> 
> Could you go into some more detail on this? I've heard this 
> more often,
> but I'm still not entirely clear on *what* work is done. :)

I think the meat of the response lies in answering this question.
I'm going to try to answer it first in regards to JSP and
servlets, and I'm going to mix in some of the J2EE stuff near the
end. I'm nowhere near as proficient with J2EE as I am with the
other two.

Meanwhile, others will I'm sure jump in and correct me where I'm
wrong. Or I /hope/ they will, because I'm sure to be wrong. :)

First, one framing quote:

> I'm not interested in promoting a standard, I'm just interested in 
> stealing ideas for Python-based APIs. I'm not interested in 
> standardizing these APIs. :)

That frames my first issue.

-----
STANDARDIZATION
-----
I think that's a big part of what's great about these API's.
They're standard! It's not just a question of whether you can
find a Java programmer or whether you can find a Python
programmer. The languages are not really a significant factor, so
much as the toolsets.

Finding Zope programmers is /hard/. Finding someone who's had
experience with WebSphere or WebLogic or one of the other servlet
containers is /easy/. The APIs are standard. Maybe they won't
develop their app in exactly the same way you developed yours,
but the odds are pretty good they'll be similar -- they're based
on the same APIs, making use of the same mechanisms.

Is it harder to cull the grain from the chaff when selecting
these programmers? Maybe. But it's easier to let go of someone
you have, than to hold on to someone you don't.

I hafta grant that standardization is in some ways problematic.
Bad standards can bestow ruin as easily as good ones can bestow
greatness. (Just look at C++! <wink>)

The Java servlet and JSP standards are pretty good, and I think
the J2EE stuff is pretty good, too. The standards are evolving in
a fairly nondestructive manner, and addressing some of the issues
left outstanding.

-----
REQUEST-CENTERED FRAMEWORK
-----

When I'm developing under Zope, I start with a page and work
back from that. It's possible to determine from the request which
bit of content to serve up -- you either make a conditional page,
or you stick a bit of code at the top of the page that does
redirection based on the contents of the request object.

That's reasonable, and it makes sense to people who have done
lots of server-side scripting. I think PHP and ASP kinda
encourage this sort of coding. You can do it with the Java
products too -- you just use JSP to develop your page content,
and bang-presto, you've got a page-centered application.

I don't want a page-centered application; in fact, I would go so
far as to say that page centering breaks the MVC (model / view /
controller) paradigm that I aspire toward in my software designs.
The way Zope is set up, it seems to me that it encourages a
design that embeds controller elements in the view.

You can hack around that; that's true. But it's a hack, and it's
not the "natural" way to program. Zope is set up to support (and
support well!) content with dynamic elements.

With a pure JSP site, you're still going to be hacking. You can
do some things to clean up the hack and make it less crude, but
JSP is like PHP or Zope: it's page-centered.

When you add servlets to the mix, though, you start eking your
way into deeper potential. Instead of requests being aimed at
your view components, requests are aimed at a servlet. The
servlet is invoked via routines that are designed to handle a
request. The objects associated with the request are a little
more "primitive" than the objects you pull out of Zope, but they
are handled in a fairly elegant fashion.

Having the servlet in charge of processing the request puts the
control into the controller, if you will. It makes it easy for
the app to center around the controller logic, without relying on
the view logic to hand over the reins.

Model/View/Controller separation is a topic of some discussion in
servlet circles. I think the jury's still out on the best way to
do it. I think that one of the more common solutions uses JSP as
the view components, servlets as the controller components, and
beans as the model components, with various and sundry utilities,
factories, and other classes thrown in to stitch it all together.

Do I need to define these? JSP is a lot like PHP or DTML -- you
write Java code in the page. Servlets have access to the request
directly, as well as repositories for session, request, and
application state. Beans are basically just any old object, but
with properly-named accessors and mutators for all public
attributes. When you move from a simple servlet container to an
enterprise servlet app, Enterprise Java Beans effectively take
the place of the Beans (though you can still use simple beans
locally). Let's talk about these components and their features in
a little more depth.

-----
JSP FEATURES
-----

Like I said, JSP is much like DTML or PHP in appearance. JSP has
a few "directives" that you can use to apply general processing
rules to your page. Of particular interest:

Your JSP pages get compiled into a servlet. I think that makes
them pretty fast when compared with something that does page
interpretation at request time.

Your JSP pages have access to at least four levels of state
(there might be another I'm forgetting): application, session,
request, and page. The first three correspond to the same state
repositories managed by the servlet. The 'page' context is for
local objects.

You can forward control to other servlets or JSP pages, and you
can also include pages. I believe (though I'm not sure I've
tested this) that a page you forward to gets a fresh "page"
context, while an include uses the current page context. Also,
forward is terminal, while an include will return control after
processing the child. Not too revolutionary here, but the
features are present.

You can embed Java. In fact, that's the way you do your
conditional display, and almost everything else interesting. This
can look kinda goofy in practice, but it does increase ease of
use.

You can associate a bean with an object, and map form properties
onto the bean's attributes. I think that this is most useful when
you're programming pure JSP, but it's pretty useful when you're
not.

You can create "custom tag libraries" for a more abstract and
less "page-centered" approach to reusable page content. Custom
tag libraries are more Java-centric than markup-centric.

-----
SERVLET FEATURES
-----

Most of what you get out of servlets is also available indirectly
in JSP if you jump through the right hoops. For pieces of the
application that don't need to display, though, there's not much
sense in hopping through the hoops -- it can be downright
annoying to need to. That's what makes servlets "natural" for the
controller component.

I've discussed session management before. Sounds like they're
doing some of that in newer Zope? That's good. It's a very hard
thing to do well. I would be concerned about ZEO and ZODB being
used in their unadulterated form to handle session management for
a high traffic site. It just seems like there'd be too much I/O
for ZODB/ZEO to handle it efficiently.

Servlet containers implement session management for you. The
servlet containers are presently not obliged to make these
sessions either persistent or distributable, but some of them do,
including at least one of the freely-available containers.
Persistent, distributable sessions provide good load balancing
and good failover.

Servlets get a request and a response object passed to them.
Generally you manipulate both. The request object has a list of
parameters and also a list of attributes. The former are
generally the form values while the latter is where you might
store state to pass it between servlets or between a servlet and
a JSP page.

Servlets can dispatch a request using either a forward or an
include, just like in JSP.

The container's handling of a servlet (or entire web app, which
is generally a bunch of servlets and lots of supporting pages) is
controlled by the "deployment descriptor", a file called web.xml.
This file has a standard, mandated format. It gives the developer
a great deal of control over how his or her servlet will be
instantiated, and how it will be managed once in memory.

Once written, a web application can be packed up into a .war file
(analogous to a .jar file) for easy deployment. The first time
the app is accessed, the .war file is unpacked. That makes
deployment remarkably (almost disgustingly) simple.

-----
ENTERPRISE JAVA BEANS
-----

The 'enterprise' in the J2EE stuff basically means "designed for
large and diverse environments" I think. When the word was first
used in the phrase "enterprise networking" it usually referred to
the challenges involved in getting Macs, PCs, and other hardware
all talking on the same LAN. These days, that's a non-problem.
Now enterprise generally seems to refer to anything of large
scope, and particularly applications distributed across multiple
pieces of hardware. But I think the "what the hell does that
mean" camp may have a point here. :)

Adding EJB entails a few sacrifices and a few big gains. Sticking
your servlets into the J2EE "framework" means that you hafta put
your "application" level state into an EJB or a database (or at
least knowing that anything you stick into the application
storage can't be counted on to persist any longer than the
request). You really need to make anything you put into the
session area serializable (though there are a couple exceptions
to that rule). There are a couple other rules that go along with
that, I can't put fingers on them off the top of my head. All the
sacrifices are really pretty minimal.

The big gains: you get the distribution and persistence of
sessions. That's a hard thing to do well. Your servlets load
balance across all the servers that are running them. That's a
pretty big win, too -- that's also pretty hard to do in a
reliable fashion. And you get the EJB stuff, which means your
backend model objects pretty much distribute themselves with no
real additional design effort -- the binding mechanism is all
there, the publication mechanism, the lookup mechanism. It's all
there.

The J2EE stuff makes deployment still one more step simpler, too.
It pushes parameters that might change during runtime out one
notch further, so that deployment configuration can be separated
from the details of servlet instantiation, URI mappings, and
other miscellany concerning the interaction of the app and the
app server.

There's lots more to the J2EE stuff than what I've identified
here. For where I'm at and the kinds of projects I'm thinking
about, it's enough to use a servlets + JSP + simple beans
architecture. The kinds of projects I'm thinking about and
drafting start small -- start out living on a single server. But
by following a half-dozen simple design principles (most of which
you'd pretty much follow anyway), moving a web app from servlets
and JSPs into a full J2EE app is a (fairly) straightforward
affair. I suspect (though I haven't had the chance to put it to
the test yet) that an app set up to use beans heavily could
pieces of the model re-written to capitalize on the J2EE features
without touching any of the controller or view code.

-----
JAVA DATA OBJECTS
-----
JDO is still in development, so it's maybe not fair of me to talk
about it. But since a good proportion of the Python/Zope
technologies that we're trotting out for comparison are still in
development, I figure I might as well. Like I said once before,
maybe we can beat Java to the punch here.

JDO is a way of mapping an object graph onto a database. I
haven't read the draft of the standard nearly as closely as I
should -- I'm the sort that waits for an implementation to play
with before learning the technology, because that's the way I
learn best. However, I'm told that it bears an uncanny
resemblance to the Enterprise Objects Framework from Apple (and
NeXT before that).

The EOF I've used, and the EOF I can comment on with a fair
amount of accuracy <wink>. The way EOF works, in its most potent
form, is thus: you develop a class with all its accessors and
mutators, plus all the business logic you might want to include.
You map some or all of its data fields to columns in your
database in what's called an entity description. You include
things like relationships to other entities -- does my Employee
entity have a one-to-one relationship with a Boss entity? Model it.
Does my Boss entity have a one-to-many relationship with Employee
entities? Model it.

Fetches can be performed by key, or by selector. Selectors can
pick out individual objects based on the properties of the
objects, or through elaborate path constructions that might trace
through a long series of relationships. An example: you might
fetch all Bosses who have an employee who has a spouse who has a
birthday this month. The selector might look something like:
employees.spouse.birthday.month == now.month

When you fetch these objects from the database, the EOF puts
"Faults" in the place of relationships. When you first try to
read from a fault object, it runs off and fetches the object from
the database.

But that's not all! EOF also provides sophisticated caching and
data source controls. EOF also provides multiple 'editing
contexts' with undo capabilities and the ability to nest editing
contexts so that changes can be made within the context of other
changes, and committed incrementally, or abandoned en masse.

EOF is a powerful tool for management of object graphs and
mapping the object graph onto the database. If even a fraction of
the power EOF offers developers comes through in JDO, that will
be a huge boon to anyone who works with a sophisticated dynamic
data model.

-----
AND HERE IT IS AGAIN...
-----

The best part of it all is that all the things outlined above are
STANDARD. They're part of the API. When programmers learn
WebLogic or WebSphere or Tomcat or Enhydra or whatever else, they
learn to develop to the servlet API. JSP is an extension of the
servlet API; it's a five-minute lesson taught to a servlet
programmer, or a gentle introduction to an HTML hacker.

Sure, your webapp may do some funky things with the technology,
and the way you do MVC might be different, and maybe you've got a
toolkit or two that you've integrated or a different templating
engine. But the core of the architecture is the same, and the
application is as simple or as complex as you make it from there.

I choose to keep it simple, and I choose to write the code so
that it's easily extended toward J2EE in case I need that kind of
robustness. For now, a single server running a servlet container
is good enough, and in the immediate future, that breaks out
quite nicely into a database server and maybe Enhydra on the
front end running on a couple boxes. If the load continues to
grow, though, I may need to go to a full-fledged J2EE model, and
pull the backend objects out onto their own machines so that the
servlet machines only hafta worry about processing requests. I've
got lots of room to grow, and the upgrade path is very clear.

Meanwhile, the API's required are standardized and well-known.
I'm pretty confident that someone who's developed web apps under
the servlet API could grasp my architecture and start
contributing within a day, two at the worst. I'm pretty confident
that someone who knows a bit of Java could climb on board inside
a week and be adding significant features.

What's more, I'm pretty confident that if I was on another
project when the customer needed work done, my attentions might
not even be necessary.

I will note, in closing, that recent days have been something of
a reawakening of faith in the darkened halls of my soul. Sunlight
has begun to break through the clouds as I consider Python all
over again. I have enjoyed this discussion a great deal, and
enjoyed most of all that it has stirred my humility some -- I
haven't seen all there is to see in the Python world, and much of
what I haven't seen is quite promising.

Thanks! :)
--G.