[Edu-sig] PySqueak issues: image storage
Paul D. Fernhout
pdfernhout at kurtz-fernhout.com
Thu May 4 14:37:26 CEST 2006
In trying to further think through what would be involved in supporting
Squeak-like (or just Smalltalk-like) capabilities for Python in
constructivist education, on reflection, I think the single biggest issue
is that of the Squeak/Smalltalk "image". There are lots of other issues,
but I am now thinking those are more easily solvable by just programming,
often using existing Python libraries, than this one, which is more
reflective of deeper issues.
For those of you unfamiliar with the notion of a Smalltalk "image", it is
essentially this: you can pick "save image" from your running Squeak
environment and the whole running vm's object memory is saved to one file.
Then when you start up Squeak and specify your image file, everything is
back the way you had it when you saved (in theory) -- windows, open files,
open socket connections, everything. In practice, open socket connections
and open files sometimes can't be reopened (the port may be in use for a
server socket, or a server may be down for a client socket), files may
have been deleted or locked and so can't be reopened, a database may have
changed, and so on. (And making this work under the covers can be
non-trivial.) But, as least, the system tries to put everything back the
way it was, and it almost always does a perfect job for plain GUIs. What's
more, you can bring this file to any machine on any OS and processor that
runs a fairly similar version of Squeak (like moving Mac/PPC to
GNU/Linux/AMD64), and just start it up and you are back where you saved.
And here is the key point: you get all this for "free" when you work with
Squeak. You generally don't have to use a "pickle" library yourself, or
write object saving and loading code for your windows, or do anything like
that. It just works.
Why is the image so important for the novice user in a constructivist
educational setting? If a learner is learning by building, then it stands
to reason that tomorrow they would like to learn by building on top of
what they have already built more often then they would want to start from
scratch. Starting from scratch can be a good learning experience, no
doubt, but it gets old if you do it over and over again. So, when a
learner can save their image and reload it tomorrow, they are exactly
where they left off. So, if they are in the midst of building a simulation
by dragging graphical widgets around, then they just save it, and tomorrow
there it is. And here is they key: they never had to write any saving or
loading code to do this. And neither did the author of the underlying
package they are using. This is a big thing IMHO. It lets the learner
focus more on what they are doing without the distraction of "how can I
save and load this".
Now, in practice, Squeak does save and load things outside the image. It
uses files to store code, communicates over sockets, uses databases, and
so on. In some ways, using an external file to save, say, your email seems
much safer than storing it just in an image (which is a binary file and
could easily get corrupted). To do that, an application has to use some
explicit form of representing objects as text and reading them back in
itself. If you want to share a small part of your image, like just the
simulation you wrote, then it makes sense to be able to export that part
(preferably as text) and share it (most smalltalk have some support for
this equivalent to pickle or xml object writing). So the image isn't
everything. But for the basics of where windows are, what widgets are in
them, and what dynamic objects exist in the environment, it does a great job.
So, where is the Python PySqueak problem? The graphics widget set is the
biggest issue (sockets, files, databases, etc. are hard, but simpler).
Smalltalk are designed from the ground up to do this. In Squeak's case,
all the widgets are native, so their state is defined purely by Smalltalk
objects. In other Smalltalk which use native widgets, the issues are
thought through at the beginning on how the windows will be created and
then recreated on reloading an image and that part is usually hidden from
the user. And how to figure out what native GUI widgets are up on the
screen and how to save that and reload it, all transparently to the user,
is a non-trivial thing, and it varies from widget set to widget set. Since
Python wants to be platform agnostic, and since there are a lot of widget
sets out there (x, wx, tk, swt, swing, gtk, qt, mac flavors, mozilla and
other web browser widgets, etc.), plus a lot of code written for them,
that means a bit of a problem. I think in theory one could write widget
set savers and reloaders that know enough about a specific widget set to
walk the tree of displayed widgets and rebuild it. But, that is a lot of
work that needs to be done for each widget set. And then, ideally, that
all needs to work with existing widget using code not written to use a
well designed library that hides these reloading problems.
So, I think that would be the biggest issue to solve -- giving Python an
image capacity. It doesn't have to be solved because one could always
continue with the Python assumption that applications will be written to
save and load their state. One could also build that into specific
educational constructivist widget sets like an eToys clone. But it remains
a mismatch in philosophy.
This isn't a plug for C++, but consider:
http://www.whysmalltalk.com/quotes/index.htm
"Smalltalk is the best Smalltalk around" [on using C++ to code dynamic
language idioms more appropriately done in Lisp or Smalltalk] - Bjarne
Stroustrup
Which is meant more or less as a reminder that languages have things they
do well, and not so, well, and philosophies and communities built around
them. So, while it is possible to ape Squeak Smalltalk in Python
(including an image), is that worth doing? If you want everything
Smalltalk (or Squeak) has to offer, then you can use Smalltalk (or Squeak)
and live with other limitations (it's license, stability, and world view).
I'm not sure there is that much value in reinventing that wheel. And it's
also probably much easier to just put a Python parser into Squeak than
reinvent Squeak on Python. So I continue to think what is interesting (and
challenging) about the notion of a PySqueak is to try to understand what
the core issues are that Squeak tries to solve (in this case, making it
easy to save and load current state of an object system, including that of
GUIs) and think of Python oriented approaches to do that.
Again though, the notion of having a Python image could be rejected as a
goal. Even Squeak does use external files for various tasks. When the
outside world changes the inside world in the image gets out of date. More
and more community-oriented applications (including Squeak's Croquet
shared 3D world, but also others down to simple web applets or html forms)
rely on storing state in outside servers or across peers and so need to
reload from the network on startup anyway in practice. Keeping everything
in one image mixes application and user data, and also often leads to an
unmanaged growth in complexity as the image accumulates clutter. When the
Squeak VM changes, one needs to clone the image into a new format, and in
practice, end users aren't going to want to do that with their old images
and just start over from scratch, filing and out code or application
state. Squeak stores its source code history and other changes outside
the image in a couple of text files (in part to rebuild the image if it
crashes or gets corrupt). Even within the Squeak community, there are
constant pressures to be able to build an image from a textual
representation (something not trivial to do, and historically resisted by
the main people in the project, perhaps out of sentimentality? or
compassion? as the Squeak image has roots as a living thing back more than
20 years). However, even with pressure to be able to build an image from
Scratch, something every Python program essentially does by running from
*.py files, I'm sure almost no one in the Squeak community would want to
lose saving and loading their current image.
So anyway, that's the outline of one of the biggest Squeak->Python issues
IMHO. And, I'd caution, it is one that is easy to dismiss as unimportant
if you do not have experience working with images, the same way it is easy
to dismiss something like garbage collection as unimportant if you are
comfortable working in C++. Still, you can patch garbage collection onto
C++ (in an awkward fashion) and one could probably patch images onto
Python somehow (if that was desired). It's mostly a matter of considering
how images interact with the Python glue philosophy, and also, in this
case, in an educational constructivist setting, where I think saving state
easily is a big win for everybody, especially in a classroom setting often
with very short time periods for doing a bit of exploring and
constructing, but potentially lots of them over the course of months. The
default right now is every Python application must invent its own way of
saving its state. So, should that be revisited? Or perhaps, this instead
points to a need to improve pickle or create a community-wide widget
reloading standard?
It occurs to me, just now, at the end, as I revise this, what the Python
way might be. :-) And it is to write its state as a Python text file!
Something I need to muse over. :-)
--Paul Fernhout
Learning by writing. :-)
More information about the Edu-sig
mailing list