[Edu-sig] PySqueak issues: image storage

Thu May 4 14:37:26 CEST 2006

In trying to further think through what would be involved in supporting 
Squeak-like (or just Smalltalk-like) capabilities for Python in 
constructivist education, on reflection, I think the single biggest issue 
is that of the Squeak/Smalltalk "image". There are lots of other issues, 
but I am now thinking those are more easily solvable by just programming, 
often using existing Python libraries, than this one, which is more 
reflective of deeper issues.

For those of you unfamiliar with the notion of a Smalltalk "image", it is 
essentially this: you can pick "save image" from your running Squeak 
environment and the whole running vm's object memory is saved to one file. 
Then when you start up Squeak and specify your image file, everything is 
back the way you had it when you saved (in theory) -- windows, open files, 
open socket connections, everything. In practice, open socket connections 
and open files sometimes can't be reopened (the port may be in use for a 
server socket, or a server may be down for a client socket), files may 
have been deleted or locked and so can't be reopened, a database may have 
changed, and so on. (And making this work under the covers can be 
non-trivial.) But, as least, the system tries to put everything back the 
way it was, and it almost always does a perfect job for plain GUIs. What's 
more, you can bring this file to any machine on any OS and processor that 
runs a fairly similar version of Squeak (like moving Mac/PPC to 
GNU/Linux/AMD64), and just start it up and you are back where you saved. 
And here is the key point: you get all this for "free" when you work with 
Squeak. You generally don't have to use a "pickle" library yourself, or 
write object saving and loading code for your windows, or do anything like 
that. It just works.

Why is the image so important for the novice user in a constructivist 
educational setting? If a learner is learning by building, then it stands 
to reason that tomorrow they would like to learn by building on top of 
what they have already built more often then they would want to start from 
scratch. Starting from scratch can be a good learning experience, no 
doubt, but it gets old if you do it over and over again. So, when a 
learner can save their image and reload it tomorrow, they are exactly 
where they left off. So, if they are in the midst of building a simulation 
by dragging graphical widgets around, then they just save it, and tomorrow 
there it is. And here is they key: they never had to write any saving or 
loading code to do this. And neither did the author of the underlying 
package they are using. This is a big thing IMHO. It lets the learner 
focus more on what they are doing without the distraction of "how can I 
save and load this".

Now, in practice, Squeak does save and load things outside the image. It 
uses files to store code, communicates over sockets, uses databases, and 
so on. In some ways, using an external file to save, say, your email seems 
much safer than storing it just in an image (which is a binary file and 
could easily get corrupted). To do that, an application has to use some 
explicit form of representing objects as text and reading them back in 
itself. If you want to share a small part of your image, like just the 
simulation you wrote, then it makes sense to be able to export that part 
(preferably as text) and share it (most smalltalk have some support for 
this equivalent to pickle or xml object writing). So the image isn't 
everything. But for the basics of where windows are, what widgets are in 
them, and what dynamic objects exist in the environment, it does a great job.

So, where is the Python PySqueak problem? The graphics widget set is the 
biggest issue (sockets, files, databases, etc. are hard, but simpler). 
Smalltalk are designed from the ground up to do this. In Squeak's case, 
all the widgets are native, so their state is defined purely by Smalltalk 
objects. In other Smalltalk which use native widgets, the issues are 
thought through at the beginning on how the windows will be created and 
then recreated on reloading an image and that part is usually hidden from 
the user. And how to figure out what native GUI widgets are up on the 
screen and how to save that and reload it, all transparently to the user, 
is a non-trivial thing, and it varies from widget set to widget set. Since 
Python wants to be platform agnostic, and since there are a lot of widget 
sets out there (x, wx, tk, swt, swing, gtk, qt, mac flavors, mozilla and 
other web browser widgets, etc.), plus a lot of code written for them, 
that means a bit of a problem. I think in theory one could write widget 
set savers and reloaders that know enough about a specific widget set to 
walk the tree of displayed widgets and rebuild it. But, that is a lot of 
work that needs to be done for each widget set. And then, ideally, that 
all needs to work with existing widget using code not written to use a 
well designed library that hides these reloading problems.

So, I think that would be the biggest issue to solve -- giving Python an 
image capacity. It doesn't have to be solved because one could always 
continue with the Python assumption that applications will be written to 
save and load their state. One could also build that into specific 
educational constructivist widget sets like an eToys clone. But it remains 
a mismatch in philosophy.

This isn't a plug for C++, but consider:
   http://www.whysmalltalk.com/quotes/index.htm
"Smalltalk is the best Smalltalk around" [on using C++ to code dynamic 
language idioms more appropriately done in Lisp or Smalltalk] - Bjarne 
Stroustrup

Which is meant more or less as a reminder that languages have things they 
do well, and not so, well, and philosophies and communities built around 
them. So, while it is possible to ape Squeak Smalltalk in Python 
(including an image), is that worth doing? If you want everything 
Smalltalk (or Squeak) has to offer, then you can use Smalltalk (or Squeak) 
and live with other limitations (it's license, stability, and world view). 
I'm not sure there is that much value in reinventing that wheel. And it's 
also probably much easier to just put a Python parser into Squeak than 
reinvent Squeak on Python. So I continue to think what is interesting (and 
challenging) about the notion of a PySqueak is to try to understand what 
the core issues are that Squeak tries to solve (in this case, making it 
easy to save and load current state of an object system, including that of 
GUIs) and think of Python oriented approaches to do that.

Again though, the notion of having a Python image could be rejected as a 
goal. Even Squeak does use external files for various tasks. When the 
outside world changes the inside world in the image gets out of date. More 
and more community-oriented applications (including Squeak's Croquet 
shared 3D world, but also others down to simple web applets or html forms) 
rely on storing state in outside servers or across peers and so need to 
reload from the network on startup anyway in practice. Keeping everything 
in one image mixes application and user data, and also often leads to an 
unmanaged growth in complexity as the image accumulates clutter. When the 
Squeak VM changes, one needs to clone the image into a new format, and in 
practice, end users aren't going to want to do that with their old images 
and just start over from scratch, filing and out code or application 
state.  Squeak stores its source code history and other changes outside 
the image in a couple of text files (in part to rebuild the image if it 
crashes or gets corrupt). Even within the Squeak community, there are 
constant pressures to be able to build an image from a textual 
representation (something not trivial to do, and historically resisted by 
the main people in the project, perhaps out of sentimentality? or 
compassion? as the Squeak image has roots as a living thing back more than 
20 years). However, even with pressure to be able to build an image from 
Scratch, something every Python program essentially does by running from 
*.py files, I'm sure almost no one in the Squeak community would want to 
lose saving and loading their current image.

So anyway, that's the outline of one of the biggest Squeak->Python issues 
IMHO. And, I'd caution, it is one that is easy to dismiss as unimportant 
if you do not have experience working with images, the same way it is easy 
to dismiss something like garbage collection as unimportant if you are 
comfortable working in C++. Still, you can patch garbage collection onto 
C++ (in an awkward fashion) and one could probably patch images onto 
Python somehow (if that was desired). It's mostly a matter of considering 
how images interact with the Python glue philosophy, and also, in this 
case, in an educational constructivist setting, where I think saving state 
easily is a big win for everybody, especially in a classroom setting often 
with very short time periods for doing a bit of exploring and 
constructing, but potentially lots of them over the course of months. The 
default right now is every Python application must invent its own way of 
saving its state. So, should that be revisited? Or perhaps, this instead 
points to a need to improve pickle or create a community-wide widget 
reloading standard?

It occurs to me, just now, at the end, as I revise this, what the Python 
way might be. :-) And it is to write its state as a Python text file!
Something I need to muse over. :-)

--Paul Fernhout
Learning by writing. :-)