[Python-Dev] PyArg_ParseTuple and 16 bit bitpatterns

Tue, 4 Jul 2000 16:42:58 -0700

Hi Jack,

I am your evil man for this one (but then you already knew that).

On Tue, Jul 04, 2000 at 11:50:02PM +0200, Jack Jansen wrote:
> I've adapted PyArg_ParseTuple (and Py_BuildValue) to understand the H
> format specifier, which is meant for 16-bit bitpatterns. (in case you
> didn't follow the discussion last month: the old lowercase h now
> checks values to be in the range -32768..32767, so constants like
> 0x8000 are not acceptable anymore).

I think that the Right Answer is:

b = signed byte
B = unsigned byte
h = signed short
H = unsigned short
i = signed int
I = unsigned int
l = signed long
L = unsigned long

Before my patch we had (no range checking was done so signed vs. unsigned
made no difference):

b =  byte
h =  short
i =  int
l =  long
L =  LONG_LONG

After my patch (i.e. now) we have:

b = unsigned byte
h = signed short
i = signed int
l = signed long
L = signed LONG_LONG

Notes on that:
- Choosing signed or unsigned for each was based on the common case (or
  presuming what the common case was). I.e. unsigned bytes are more common
  than signed bytes.
- unsigned byte is 'b' and not (my ideal) 'B', for backward compatibility
  reasons
- LONG_LONG as 'L' really sucks because that is the blocker to
  PyArg_ParseTuple nirvana (my first list). It should be 'q' and 'Q' for Quad
  or something like that.

Your patch is adding (right?):

H = unsigned short

Aside: Now that it will be called 2.0 would we maybe want to go for the Right
Answer. I suspect that a *lot* more people would complain of breakage with
the use of 'L' changing to 'Q', and that I am asking for a lynching.

> 
> I haven't added an I and L specifier, because (surprise, surprise:-)
> for 32-bit integers 0x80000000 turns out to be a legal value, unlike
> for their poor 16-bit brethren.

I can't see how 'I' allows 0x80000000 (unless, of course, sizeof(int) > 4 on
your machine) because the 'I' formatter is not in PyArg_ParseTuple. Yes, 'L'
will probably accept 0x80000000 because LONG_LONG is probably 64-bits wide on
your machine hence.

> 
> I've currently implemented H as meaning unsigned (i.e. 0..0xffff), but 

I woudl suggest allowing [0, USHRT_MAX]. Should be defined in limits.h
(guessing), I think.

> on second thoughts I think allowing -32768..0xffff might be better:
> there's probably lots of code out there that passes -1 when all 16
> flag bits should be set. Please let me know if you have strong

I think that uses of -1 should use either USHRT_MAX or (unsigned short)-1.

> opinions on either meaning before I check this in.
>

> <grumpy mode="on">Note that I'll only adapt PyArg_ParseTuple and the
> gazzilion mac-specific occurrences of "h" where a 16-bit pattern is
> needed. I've done only a very cursory check of other occurences of
> "h", but someone else will have to pick that up if they feel like.
> </grumpy>

I am sorry to have been the cause of work for you. I just think that the
'security' offered by bounds checking all values converted by
PyArg_ParseTuple is important.

Trent

-- 
Trent Mick
trentm@activestate.com