[Python-Dev] hey, who broke the array module?

Jack Jansen jack@oratrix.nl
Wed, 07 Jun 2000 23:47:01 +0200


Recently, Guido van Rossum <guido@python.org> said:
> > On Thu, May 18, 2000 at 12:01:16PM +0200, Jack Jansen wrote:
> > > > I broke it with my patches to test overflow for some of the PyArg_Parse
> *()
> > > > formatting characters. The upshot of testing for overflow is that now t
> hose
> > > > formatting characters ('b', 'h', 'i', 'l') enforce signed-ness or
> > > > unsigned-ness as appropriate (you have to know if the value is signed o
> r
> > > > unsigned to know what limits to check against for overflow). Two
> > > > possibilities presented themselves:
> > > 
> > > I think this is a _very_ bad idea. I have a few thousand (literally) rout
> ines 
> > > calling to Macintosh system calls that use "h" for 16 bit flag-word value
> s, 
> > > and the constants are all of the form
> > > 
> > > kDoSomething = 0x0001
> > > kDoSomethingElse = 0x0002
> > > ...
> > > kDoSomethingEvenMoreBrilliant = 0x8000
> > > 
> > > I'm pretty sure other operating systems have lots of calls with similar 
> > > problems. I would strongly suggest using a new format char if you want 
> > > overflow-tested integers.
> > 
> > Sigh. What do you think Guido? This is your call.
> > 
> > 1. go back to no bounds testing
> > 2. bounds check for [SHRT_MIN, USHRT_MAX] etc (this would allow signed and
> > unsigned values but is sort of false security for bounds checking)
> > 3. keep it the way it is: 'b' is unsigned and the rest are signed
> > 4. add new format characters or a modifying character for signed and unsign
> ed
> > versions of these.
> 
> Sigh indeed.  Ideally, we'd introduce H for unsigned and then lock
> Jack in a room with his Macintosh computer for 48 hours to fix all his
> code...

Hmm, hmm. As stated before I'm not too fond of this as it is a
gratuitous change that breaks lots of things (not only in Mac modules: 
the array module was what started this discussion, test_cPickle
and test_pkg crash, socketmodule, os.stat() and I assume all code that
uses structmodule will also have to be looked at, and quite possibly
there's a lot more) and for which I don't really see all that much
benefit.

As a datapoint: a quick search for 8-digit hex numbers starting with
digit 8-F and not suffixed with an L throughout the Python files in
the distribution found a stunning 464 matches. Add another 1147
4-digit hex numbers with the sign bit on and there are an awful lot of 
candidates for breaking...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm