[Python-ideas] .pyu nicode syntax symbols (was Re: Empty set, Empty dict)

Tue Jul 1 23:33:02 CEST 2014

> On Tuesday, July 1, 2014 10:35 AM, Steven D'Aprano <steve at pearwood.info> wrote:

> I think that micro-optimization is probably the wrong reason to hack 
> bytecodes. What I'm more interested in is exploring potential new 
> features, or to add functionality, for example:
> 
> Adding the ability to trace individual expressions, not just lines:
> http://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
> 
> Exploring dynamic scoping:
> http://www.voidspace.org.uk/python/articles/code_blocks.shtml
> 
> A proposal from Python 2.3 days for a brand-new decorator syntax:
> http://code.activestate.com/recipes/286147
> 
> A (serious!) defence of GOTO in Python:
> http://www.dr-josiah.com/2012/04/python-bytecode-hacks-gotos-revisited.html
> 
> (although even Josiah doesn't suggest using COMEFROM :-)
> 
> 
> I don't know that such bytecode manipulations should be provided in the 
> standard library, and certainly not as a built-in "asm" command. But, 
> I 
> think that we ought to acknowledge that bytecode hacking has a role to 
> play in the wider Python ecosystem.

I think CPython provides just about the right level of support here.

The documentation, the APIs, and the helper tools for dealing with bytecode are all superb, and get better with each release. It's all more than sufficient to figure out what you're doing, and how to do it.

It might be nice if there were an assembler in the stdlib, but the format is simple enough, and the documentation complete enough, that you can write one in a couple hours (as I did). And, honestly, I suspect a stdlib assembler wouldn't be updated fast enough—e.g., when support for Instruction objects was added to CPython's dis module in 3.4, I doubt an existing assembler would have been modified to take advantage of that, but a new one that you slap together can do so easily.

Documenting that bytecode is only supported on CPython, and can change between CPython versions, isn't a problem for anyone who's just looking to experiment with and explore ideas, rather than write production code. As your examples show, you can usually even publish your explorations for others to experiment with, granting those limitations, and maintain them for years without much headache. (Bytecode has traditionally been much more conservative than what the documentation allows; it's generally only when your hacks rely on knowing exactly what bytecode will be generated for a given Python expression that they break. But even there, with a sufficient test suite, it's usually pretty simple to adapt.)

> I'm lead to understand that in the Java community, bytecode hacking is, 

> perhaps not common, but accepted as something that powerusers do when 
> all else fails:
> 
> https://weblogs.java.net/blog/simonis/archive/2009/02/we_need_a_dirty.html

Here, it sounds like you _are_ suggesting that bytecode hacking may need to be used for production code, not just for exploration. But there are some pretty big differences between Java and Python that I think are relevant here:

 * Java is designed for one specific VM, on which many other languages run; Python is designed to run on a variety of VMs, and nothing else runs on the CPython VM.
 * Java is designed to be secure first, fast second, and flexible a distant third; Python is designed to be simple and transparent first, flexible and dynamic second, and everything else a distant third. So most of what you'd want to do (including solving problems like the one in the blog) can be done with simple monkey-patching and related techniques—and you can go a lot deeper than that without getting beyond the supported, portable reflection techniques.
 * Java's VM is designed to be debuggable and optimizable; CPython's is designed to be the simplest thing that could support CPython. So, anything that's too hard to do with runtime structures is often easier at the VM level in Java, while the reverse is true in CPython.
 * Java code is often distributed and always deployed as binary files; Python almost always as source. Besides being the cause of problems like the one in this article, it also means that if you have to go below the runtime level, you don't have the intermediate steps of source and AST hacking, you have no choice but to go to the bytecode.