[Edu-sig] PySqueak: "Self" as the source of many ideas in Squeak

Sun May 7 02:00:54 CEST 2006

Paul D. Fernhout wrote:
> Ian Bicking wrote:
>> This is all not to say that modifying Python source in a manner 
>> different than ASCII is infeasible.  Only that "source" and "runtime" 
>> have to be kept separate to keep this process sane.  Images -- which 
>> mingle the two -- have never felt sane to me; clever, but not 
>> manageable.  
> 
> Well, perhaps sanity is overrated. :-)
> 
> And Smalltalk has been managing this for thirty years. :-)
> 
> But it is true, it requires a bunch of special purpose tools. Changeset 
> browsers. A change log. External source code control. Image cloners. and 
> so on. Probably a Python that stored a running image as Python program 
> files might need some specialized tools too.

Zope went down that path with through-the-web development with the ZODB 
as the image.  I think everyone agrees that didn't work.  Smalltalk has 
pursued that with more vigor, but I was never comfortable with the 
image.  In some ways you can read my comments as me figuring out just 
why I dislike the image model.

I think it's important to figure out, because the image model seems very 
compelling for an education environment.

>  > Unfortunately we do not have very good abstractions in
>> Python related to source, so they have to be invented from scratch.  The 
>> AST might help, but higher-level abstractions are also called for.  For 
>> instance, you might define a color interactively in some fashion, and 
>> the color gets serialized to:
>>
>>    import color
>>    color.Color(r, b, c)
>>
>> That color object might be immutable (probably should be), but the 
>> *source* isn't immutable, and the source means something.  How can we 
>> tell what it means?
> 
> I'm not sure anyone needs to ever tell what that means, except the Python 
> parser when it reads in the code and runs it (in the context of other code 
> it also loaded). Obviously when code is being written out which can 
> reconstruct a current object tree, then the writing process needs to 
> understand what it means by each piece of code, especially as it might 
> refer to previously defined bits to shorten or make more readable the 
> output code file. But since it is writing the code, there should not be 
> any ambiguities of intent. The hardest issue perhaps becomes making the 
> results look as close to human written Python as possible (say, using 
> "class" to define a class instead of making a plain object and turning it 
> into a class by setting fields).

With the color example, I'm really thinking about richer kinds of source 
literals.  Maybe color.Color(1, 0, 0) is a better example.  You can 
statically determine that is "red", and you could display it as red.

And maybe it's not that much of a stretch to think about the program as 
a set of these objects.  They aren't live objects, and they aren't 
entirely equivalent to the live objects.  So a function definition is a 
source object; the function itself is the runtime object.  We can treat 
them as equivalent, until we get to the point that we want to update things.

>> I can imagine a bit of interaction between the source and the runtime to 
>> do this.  For instance, we might see that "color" is bound to a specific 
>> module, and "color.Color" to a specific object.  We'll disallow certain 
>> tricks, like binding "color" dynamically, monkeypatching something into 
>> "color.Color", etc.  Ultimately figuring out exactly what color.Color is 
>> isn't *easy*, but at least it is feasible.
>>
>> Using existing introspection we can figure out some parts, some of the 
>> sorts of things that IDEs with auto-completion figure out.  They can 
>> figure out what the arguments to Color() are and the docstring.
> 
> Perhaps you are thinking about this from the point of view of something 
> like PyDev parser, which reads Python code to syntax highlight it and 
> perhaps  provide code assistance. But, since the code has already been 
> read, and we are manipulating the object tree directly, we know what 
> everything means, because we just look at what the objects are directly by 
>   inspecting them. Granted, some of those objects represent code, and the 
> code may be wrong or ambiguous or confusing from the users point of view, 
> but that is ignorable by the code writer, which just writes out the 
> snippet of code the way a user put it in.

You can't know much about the objects inside a function, because they 
aren't bound to anything.  You can know about objects defined at the 
module level.  So even with runtime interaction you really can't do a 
whole lot better than PyDev.

>> But, you can also imagine adding an editor or other things to that 
>> object; a richer form of __repr__, or a richer editable form than the 
>> ASCII source.  Maybe there would be a flag on Color that says it can be 
>> called without side effect (meaning we can speculatively call it when 
>> inspecting source); and then the resulting object might have something 
>> that says it can be displayed in a certain way (with its real color), 
>> and has certain slots you can edit and then turn into a new Color 
>> invocation.
> 
> I can see the value of all this from the GUI side, with various desired 
> display options defined in the class or prototype. But again, Python 
> object are (or should be :-) constructible from scratch without calling a 
> class' init function. 

I don't understand what you mean here?  Most objects aren't 
constructible from scratch.  Many objects are immutable, and can only be 
recreated, not modified.

> So, given that a writer can inspect an instance and 
> just write out all the fields in its dictionary and their values it seems 
> like we can write out any object (though perhaps it may need to 
> recursively write out parts of embedded objects first, and then perhaps 
> patch up circular references at the end).

You mean pickle?

>> This is all harder than what HTConsole is doing currently, mostly 
>> because Python source introspection is much poorer than Python object 
>> introspection.
> 
> Good point. And perhaps an area of Python that this project could work on?
> Can you elaborate on what parts of source introspection might need the 
> most work and in what ways so that either programs or people can better 
> inspect running Python programs?

There's a bunch of pieces.  There's work already done on many of them, 
and people (particularly in the IDE crowd) interested in refining that 
work.  It's not something I've done a lot with, but here's some 
non-obvious pieces:

* Better source code readers.  Perhaps the work on a Python API to the 
AST would help here.  Python 2.5 introduces a better internal model for 
source (the AST), but actual access to that isn't worked out yet.  It's 
important to know how that abstract structure maps to actual source; 
right now I think it just has a line-level granularity.

* The decompiling tools might be useful.  There's already several; I 
know they drop things, but I don't know what.  I think they do pretty 
well.  If things like comments lived in the AST, that'd be great, but 
they don't currently.  I think some variable names might be lost too, 
but I'm not sure.

* Tools to figure out the relations between modules.  Or maybe more 
generally, just a way of determining as much as possible statically. 
PyChecker and other tools do this some.  It doesn't have to be perfect.

* We can only really work with modules that can be safely imported and 
do not care about their environment.  Lots of modules aren't like that, 
but I think those modules should just be given up on, or they need to be 
fixed.  PyChecker and other projects have to worry about this a lot, and 
it's just not worth it -- environment-dependent modules just suck and 
need to be fixed.

And maybe that's mostly it.  The AST seems most important, though 
current tools actually do a lot of the necessary parts.

So, for instance, I was stressed out about how a function gets edited in 
place.  But maybe that doesn't need to be too hard.  A function usually 
has a reference to where it comes from (the file and line number).  You 
could then parse the code to figure out where the function ends.  Then 
you rewrite the file with the new function definition, indented to the 
same level as the old function definition.

Few other objects carry their source information.  Modules do (just 
their __file__).  Classes do not, nor do attributes or other 
assignments, and objects like lists often have no direct representation 
in the source.

But classes are easy enough to figure out by just looking at the source, 
and at least they carry their original name.

Well, at least this makes me feel a little better about what I want to 
do with HTConsole.  I just have to avoid editing objects that are not 
obviously represented by source.  I.e., you can edit/reassign variables 
and class attributes and functions, but not instances or lists.

-- 
Ian Bicking  |  ianb at colorstudy.com  |  http://blog.ianbicking.org