[Python-Dev] PEP 399: Pure Python/C Accelerator Module Compatibiilty Requirements

Terry Reedy tjreedy at udel.edu
Thu Apr 7 04:42:21 CEST 2011


On 4/6/2011 2:54 PM, Terry Reedy wrote:

> I believe that at the time of that decision, the Python [heapq] code was only
> intended for humans, like the Python (near) equivalents in the itertools
> docs to C-coded itertool functions. Now that we are aiming to have
> stdlib Python code be a reference implementation for all interpreters,
> that decision should be revisited.

OK so far.

 > Either the C code should be generalized to sequences or
 > the Python code specialized to lists, making sure the doc matches 
either way.

After rereading the heapq doc and .py file and thinking some more, I 
retract this statement for the following reasons.

1. The heapq doc clearly states that a list is required. It leaves the 
behavior for other types undefined. Let it be so.

2. Both _heapq.c (or its actual name) and heapq.py meet (I presume) the 
documented requirements and pass (or would pass) a complete test suite 
based on using lists as heaps. In that regard, both are conformant and 
should be considered 'equivalent'.

3. _heapq.c is clearly optimized for speed. It allows a list subclass as 
input and will heapify such, but it ignores a custom __getitem__. My 
informal test on the result of random.shuffle(list(range(9999999) shows 
that heapify is over 10x as fast as .sort(). Let it be so.

4. When I suggested changing heapq.py, I had forgetten that heap.py 
defined several functions rather than a wrapper class with methods. I 
was thinking of putting a type check in .__init__, where it would be 
applied once per heap (and possibly bypassed), and could easily be 
removed. Instead every function would require a type check for every 
call. This would be too obnoxious to me. I love duck typing and held my 
nose a bit when suggesting a one-time type check.

5. Python already has an "extra's allowed" principle. In other words, an 
implementation does not have to bother to enforce documented 
restrictions. For one example, Python 2 manuals restrict identifiers to 
ascii letters. CPython 2 (at least in recent versions) actually allows 
extended ascii letters, as in latin-1. For another, namespaces (globals 
and attribute namespaces), by their name, only need to map identifiers 
to objects. However, CPython uses general dicts rather than specialized 
string dicts with validity checks. People have exploited both loopholes. 
But those who have should not complain to us if such code fails on a 
different implementation that adheres to the doc.

I think the Language and Library references should start with something 
a bit more specific than at present:

"The Python x.y Language and Library References define the Python x.y 
language, its builtin objects, and standard library. Code written to 
these docs should run on any implementation that includes the features 
used. Code that exploits or depends on any implementation-specific 
feature or behavior may not be portable."

_x.c and x.py are separate implementations of module x. I think they 
should be subject to the same disclaimer.


Therefore, I currently think that the only change needed for heapq 
(assuming both versions pass complete tests as per the doc) is an 
explanation at the top of heapq.py that goes something like this:

"Heapq.py is a reference implementation of the heapq module for both 
humans and implementations that do not have an accelerated version. For 
CPython, most of the functions are replaced by much faster C-coded versions.

Heapq is documented to required a python list as input to the heap 
functions. The C functions enforce this restriction. The Python versions 
do not and should work with any mutable random-access sequence. Should 
you wish to run the Python code with CPython, copy this file, give it a 
new name, delete the following lines:

try:
     from _heapq import *
except ImportError:
     pass

make any other changes you wish, and do not expect the result to be 
portable."

-- 
Terry Jan Reedy



More information about the Python-Dev mailing list