Simple question about how the optimizer works

Tim Peters tim.one at comcast.net
Fri May 10 11:52:42 EDT 2002


[Andrew Dalke, on "optimization"]
> ...
> Given the limited resources for Python development, how much time
> should be spent on this?  I think very little.  As Tim Peters once
> pointed out, there haven't been problems with Python's optimization
> code <wink>.

That was before we tried any <wink>.  One of my coworkers decided they were
tired of seeing LOAD_CONST followed by UNARY_NEGATIVE whenever they had a
negative literal in the source (like -4 or -1.23), and changed the compiler
to store the negation of the literal instead, leaving just LOAD_CONST at run
time.

We do that now, but it introduced several bugs, and the process of stumbling
into them stretched over almost a year.  At the parsing end, it wound up
breaking mixtures of unary minus with exponentiation, and at the semantic
end it screwed up on negative float 0 literals (like -0.0).  There was also
a bug in memory management, due to indirect mixing of the PyObject_xyz
memory API with raw platform malloc, and that bug went uncaught until very
recently because it could only matter if pymalloc was enabled.

The code today looks like this:

	if ((childtype == PLUS || childtype == MINUS || childtype == TILDE)
	    && NCH(n) == 2
	    && TYPE((pfactor = CHILD(n, 1))) == factor
 	    && NCH(pfactor) == 1
	    && TYPE((ppower = CHILD(pfactor, 0))) == power
 	    && NCH(ppower) == 1
	    && TYPE((patom = CHILD(ppower, 0))) == atom
	    && TYPE((pnum = CHILD(patom, 0))) == NUMBER
	    && !(childtype == MINUS && is_float_zero(STR(pnum)))) {
		if (childtype == TILDE) {
			com_invert_constant(c, pnum);
			return;
		}
		if (childtype == MINUS) {
			char *s = PyMem_Malloc(strlen(STR(pnum)) + 2);
			if (s == NULL) {
				com_error(c, PyExc_MemoryError, "");
				com_addbyte(c, 255);
				return;
			}
			s[0] = '-';
			strcpy(s + 1, STR(pnum));
			PyMem_Free(STR(pnum));
			STR(pnum) = s;
		}
		com_atom(c, patom);
	}
	else if (childtype == PLUS) {
		com_factor(c, CHILD(n, 1));
		com_addbyte(c, UNARY_POSITIVE);
	}
	else if (childtype == MINUS) {
		com_factor(c, CHILD(n, 1));
		com_addbyte(c, UNARY_NEGATIVE);
	}
	else if (childtype == TILDE) {
		com_factor(c, CHILD(n, 1));
		com_addbyte(c, UNARY_INVERT);
	}
	else {
		com_power(c, CHILD(n, 0));

As that strongly hints, CPython's intermediate code representation is a
concrete syntax tree, and so very difficult for any sort of semantic
analysis to process.  This is Good, because it inhibits my coworkers from
doing more of this <wink>.






More information about the Python-list mailing list