[IronPython] Array Access Problem

Tue May 10 19:57:48 CEST 2005

I think that value types are a real advantage of the Common Language
Infrastructure (CLI) even though they make life more difficult for
IronPython.  They're an advantage because they're an essential concept
for a number of data structures and people will use the notion of a
value type whether or not you give it to them.  In a case like that it's
better to capture it explicitly so that tools and languages can reliably
work with it.

One example of value types is Numeric Python's arrays of complex
numbers.  It's vital for the performance of these arrays that they're
encoded as arrays of C structs rather than arrays of pointers to small
objects.  If I wanted to implement these same arrays in Java I'd
probably implement them as a double[2*N] where I'd then manually keep
track of the different indices for the real and complex parts.  The use
of structs for complex numbers in Numeric Python arrays has never been a
source of complaints.  The reason is that complex numbers in Python are
immutable and none of the issues we're discussing here are a problem for
immutable data types.  If you're interested in a deeper technical dive
into value types, I'd recommend this blog:
http://blogs.msdn.com/cbrumme/archive/2003/05/10/51425.aspx

Since we need to live with value types as part of the cost of seamless
interaction with the rest of the CLS world, I think that Bob's ideas are
intriguing.  After looking at them more closely, I'm pretty sure that
mutable proxies would have even more potential for confusion that the
current situation of always copying value types.  The idea of treating
value types as immutable from IronPython is really appealing even though
that also has some significant unresolved issues.

> Bob Ippolito wrote:
> > (1) Don't have mutable value types, use a reference type that points
> > to a value type (some kind of proxy)

I don't think that this is possible to do in a consistent way and my
suspicion is that doing this half-way would be more confusing than not
doing it at all.  Let's walk through the original example:

>>> apt = Array.CreateInstance(Point, 1)
This creates a true CLI array of Point structs

>>> pt = Point(1,2)
Today this makes a new Point struct and returns the boxed version of
that struct.  We could instead return a new instance of an imaginary new
type, ValueProxy<Point>.  This new instance is a standard reference type
that holds a point as its data.  This proxy will need to forward all
field, property and method accesses to the contained Point struct.

>>> apt[0] = pt
What do we do here?  We need to copy the data in pt into apt[0].  This
is what it means to have an array of structs.  No matter what we do with
proxies or wrappers there's no way out of this copy.  We could add some
kind of pointer to the ValueProxy<Point> keeping track of the fact that
there's a copy of this variable now held in apt[0].  This would need to
be an arbitrarily large list of pointers.  This list would also be easy
to break with CLI code that directly modified apt or other containers
holding on to the value types.

>>> pt.X = 0
The only way this can modify apt[0] is if we keep the full list of
references in ValueProxy.  See above for why keeping that full list
still wouldn't always work.

>>> apt[0].X = 0
This example would work using the ValueProxy that pointed to apt[0];
however, when apt[0] is assigned to a variable the situation becomes as
bad as it is for pt.

>>> for pt in apt:
>>>   pt.X = 0
The for loop uses an Enumerator to loop through the points in apt.
Without constructing a custom enumerator for arrays there's no way to
get anything but copy semantics here.  While we could build a custom
enumerator for arrays this wouldn't solve the general case of value
types being returned from methods.

When I played with this example in C#, I discovered something
interesting:

Point[] pa = new Point[3];
foreach (Point p in pa) {
    pt.X = 10;
}

The code above generates an error from the C# compiler:
"Cannot modify members of 'p' because it is a 'foreach iteration
variable'"

The C# compiler is treating these iteration variables as semi-immutable
in order to minimize the confusion that can come from the copy semantics
of value types.  This seems like a promising idea...

> > (2) Make value types immutable (or at least the ones you grab from
> > collections)

All of the problems with value types stem from their mutability.  Nobody
ever complains that int, double, char, etc. are value types because
those are all immutable.  For immutable objects there's no difference
between pass by reference and pass by value.

The CLR team's API Design Guidelines say this:
- Do not create mutable value types.
http://blogs.msdn.com/kcwalina/archive/2004/09/28/235232.aspx
(or see here - http://peter.golde.org/2003/10/13.html#a16)

In some ways, this would be just reflecting in IronPython this good
design sense.

One advantage of immutability is that it would make failures like the
following much more obvious:

>>> apt[0].X = 0
If value types were immutable this would throw.  The exception message
might give people enough information to get started tracking down the
issue and modifying their code to work correctly.

What are the problems with this approach?

1. C#/VB examples won't port very naturally to IronPython and the docs
will need a section explaining the various workarounds to the fact that
IronPython doesn't support this idiom.  This isn't ideal, but I could
easily live with this doc burden.

2. There's no way that I know of to make a value type 100% immutable
without controlling its implementation.  IronPython could block setting
of fields and properties on value types, but there's no way to reliably
detect and block all sets that came through methods.  Just getting the
properties and fields would probably cover 95% of the cases where people
try to mutate a value type, but it seems pretty awkward to me to say
that value types in IronPython are sort-of immutable unless there are
mutating methods.  The fact that this is what the C# compiler does for
iteration variables is encouraging at least in that it's a precedent.

3. There might be things that are impossible to express with this
restriction.  I don't think that's true, particularly with the use of
named parameters to initialize fields and properties in the value type's
constructor.  However, one of the principles of IronPython is that it
should be able to use any CLS library and it's possible there's some
weird library design with value types that wouldn't work if they were
considered virtually immutable by IronPython.

If we went down the immutable value type route, it would be interesting
to look at different kinds of sugar that could be provided to make the
impact on most programs less than it currently is.

-Jim