[Python-Dev] sys.implementation

Thu May 10 03:33:14 CEST 2012

On Thu, May 10, 2012 at 2:53 AM, Barry Warsaw <barry at python.org> wrote:
> On May 09, 2012, at 11:09 AM, Brett Cannon wrote:
>
>>Sure, but couldn't we define this "empty" class in C code so that you can
>>use the C API with it as well and just provide a C function to get a new
>>instance?
>
> +1
>
> ISTM to be a companion to collections.namedtuple.  IWBNI this new type was
> also exposed in the collections module.

Please, no. No new
just-like-a-namedtuple-except-you-can't-iterate-over-it type, and
definitely not one exposed in the collections module.

We've been over this before: collections.namedtuple *is* the standard
library's answer for structured records. TOOWTDI, and the way we have
already chosen includes iterability as one of its expected properties.

People shouldn't be so quick to throw away ordered iterability - it
makes a lot of things like generic display routines and serialisation
*much* easier, and without incurring the runtime cost of multiple
calls to sorted().

The original concern (that sys.implementation may differ in length
across implementations) has been eliminated by moving all
implementation specific values into sys.implementation.metadata. The
top-level record now has a consistent length for any given language
version. The fact that the length of the record may still change in
*future* versions of Python can be handled through documentation - we
can simply tell people "it's OK to iterate over the fields, and even
to use tuple unpacking, but if you want to future proof your code,
make sure to include the trailing ', *' to ignore any fields that get
added in the future".

To help focus the discussion, I am going to propose a specific (albeit
still somewhat hypothetical) use case: a cross-implementation testing
system that wants to be able to consistently capture data about the
version of Python that was tested, *without* needing implementation
specific code in the metadata capture step.

That produces the following set of requirements:

1. sys.implementation should be immutable for a given execution of Python
2. repr(sys.implementation) should display all recorded details of the
implementation
3. It should be possible to write a generic, future-proof,
serialisation of sys.implementation that captures all recorded details

collections.namedtuple meets all those requirements (_structseq
doesn't meet the last one at this point, but more on that later)

It also shows that we only need to place very minimal constraints on
sys.implementation.metadata: the type of that structure can be
entirely up to the implementation, with the only requirement being
that repr(sys.implementation.metadata) should produce a string that
accurately captures the stored information. The only
cross-implementation operation that is supported on that field would
be to take its representation.

Now, because this is going to be in the sys module, for CPython, we
would actually need to use _structseq rather than
collections.namedtuple. To do so in a useful way, _structseq should
get two new additions:
- the "_fields" attribute
- the "_asdict" method

As an added bonus, sys.float_info and sys.hash_info would also gain
the new operations.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia