[Python-Dev] Re: [Patches] make 'b','h','i' raise overflow exception

Mon, 8 May 2000 13:29:21 -0700

On Mon, May 08, 2000 at 10:00:30AM -0400, Guido van Rossum wrote:
> > Changes the 'b', 'h', and 'i' formatters in PyArg_ParseTuple to raise an
> > Overflow exception if they overflow (previously they just silently
> > overflowed).
> 
> Trent,
> 
> There's one issue with this: I believe the 'b' format is mostly used
> with unsigned character arguments in practice.
>However on systems
> with default signed characters, CHAR_MAX is 127 and values 128-255 are
> rejected.  I'll change the overflow test to:
> 
> 	else if (ival > CHAR_MAX && ival >= 256) {
> 
> if that's okay with you.
> 
Okay, I guess. Two things:

1. In a way this defeats the main purpose of the checks. Now a silent overflow
could happen for a signed byte value over CHAR_MAX. The only way to
automatically do the bounds checking is if the exact type is known, i.e.
different formatters for signed and unsigned integral values. I don't know if
this is desired (is it?). The obvious choice of 'u' prefixes to specify
unsigned is obviously not an option.

Another option might be to document 'b' as for unsigned chars and 'h', 'i',
'l' as signed integral values and then set the bounds checks ([0, UCHAR_MAX]
for 'b')  appropriately. Can we clamp these formatters so? I.e. we would be
limiting the user to unsigned or signed depending on the formatter. (Which
again, means that it would be nice to have different formatters for signed
and unsigned.) I think that the bounds checking is false security unless
these restrictions are made.

2. The above aside, I would be more inclined to change the line in question to:

   else if (ival > UCHAR_MAX) {

as this is more explicit about what is being done.

> Another issue however is that there are probably cases where an 'i'
> format is used (which can't overflow on 32-bit architectures) but
> where the int value is then copied into a short field without an
> additional check...  I'm not sure how to fix this except by a complete
> inspection of all code...  Not clear if it's worth it.

Yes, a complete code inspection seems to be the only way. That is some of
what I am doing. Again, I have two questions:

1. There are a fairly large number of downcasting cases in the Python code
(not necessarily tied to PyArg_ParseTuple results). I was wondering if you
think a generalized check on each such downcast would be advisable. This
would take the form of some macro that would do a bounds check before doing
the cast. For example (a common one is the cast of strlen's size_t return
value to int, because Python strings use int for their length, this is a
downcast on 64-bit systems):

  size_t len = strlen(s);
  obj = PyString_FromStringAndSize(s, len);

would become

  size_t len = strlen(s);
  obj = PyString_FromStringAndSize(s, CAST_TO_INT(len));

CAST_TO_INT would ensure that 'len'did not overflow and would raise an
exception otherwise.

Pros:

- should never have to worry about overflows again
- easy to find (given MSC warnings) and easy to code in (staightforward)

Cons:

- more code, more time to execute
- looks ugly
- have to check PyErr_Occurred every time a cast is done

I would like other people's opinion on this kind of change. There are three
possible answers:

  +1 this is a bad change idea because...<reason>
  -1 this is a good idea, go for it
  +0 (mostly likely) This is probably a good idea for some case where the
     overflow *could* happen, however the strlen example that you gave is
	 *not* such a situation. As Tim Peters said: 2GB limit on string lengths
	 is a good assumption/limitation.

2. Microsofts compiler gives good warnings for casts where information loss
is possible. However, I cannot find a way to get similar warnings from gcc.
Does anyone know if that is possible. I.e.

	int i = 123456;
	short s = i;  // should warn about possible loss of information

should give a compiler warning.

Thanks,
Trent

-- 
Trent Mick
trentm@activestate.com