[Python-ideas] Enabling access to the AST for Python code

Fri Jul 3 22:42:55 CEST 2015

On Fri, Jul 3, 2015 at 6:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 3 July 2015 at 06:25, Neil Girdhar <mistersheik at gmail.com> wrote:
> > Why would it require "a lot of extra memory"?  A program text size is
> > measured in megabytes, and the AST is typically more compact than the
> code
> > as text.  A few megabytes is nothing.
>
> It's more complicated than that.
>
> What happens when we multiply that "nothing" by 10,000 concurrent
> processes across multiple servers. Is it still nothing? How about
> 10,000,000?
>

I guess we find a way to share data between the processes?

>
> What does keeping the extra data around do to our CPU level cache
> efficiency? Is there a key data structure we're adding a new pointer
> to? What does *that* do to our performance?
>

Why would a few megabytes of data affect your CPU level cache?  If I have a
Python program that generates a data structure that's a few megabytes, does
it slow down the rest of the program?

>
> Where are the AST objects being kept? Do they become part of the
> serialised form of the affected object? If yes, what does that do to
> the wire protocol overhead for inter-process communication, or to the
> size of cached bytecode files? If no, does that mean these objects may
> be missing the AST data when deserialised?
>

When do you send code objects on the wire?  I'm not even sure if pickle
supports that yet.

When we're talking about sufficiently central data structures, a few
> *bytes* can end up counting as "a lot". Code and function objects
> aren't quite *that* central (unlike, say, tuple instances), but adding
> things to them can still have a significant impact (hence the ability
> to avoid creating docstrings).
>

Thanks, I'm interested in learning more about this.

There are a lot of messages in this discussion.  Was there a final
consensus about how the AST for a given code object should be calculated?
Was it re-parsing the source?  Was it an import hook?  Something else?  I
want to do this with a personal project.  I realize we may not get the AST
by default, but it would be nice to know how I should best determine it
myself.

>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150703/18bc190e/attachment.html>