strong/weak typing and pointers

Thu Nov 4 17:19:07 EST 2004

Steven Bethard <steven.bethard at gmail.com> wrote:
   ...
> Yeah, this goes to the heart of the misunderstanding.  I'm not asking
> anyone to justify the _existence_ of weak-typing.  Weak-typing is a direct
> result of a language's support for untyped (bit/byte) data.  I agree 100%
> that this sort of data is not only useful, but often essential in any
> low-level (e.g. OS, hardware driver, etc.) code.

But so is the ability to get at the same bits/bytes in structured ways.

> > > So, we have an area
> > of 8 bytes in memory which we need to be able to treat as:
> >     8 bytes, for I/O purposes, say;
> >     a float, to feed it to some specialized register, say;
> >     a bit indicating sign plus 15 for mantissa plus 48 for significand,
> >         or the like, to perform masking and shifting thereof in SW -- a
> >         structure of three odd-bit-sized integers juxtaposed;
> 
> As a quick refresher, I quote myself in what I was looking for: "taking
> advantage of weak-typing would be a case where you treat the bits as three
> different things: the sequence of bits, and two (mutually exclusive)
> intended structures."
> 
> My response to this example is that your two intended structures are not
> mutually exclusive.  Yes, you have to do some bit-twiddling, but only
> because your float struct doesn't have get_sign, get_mantissa and
> get_significand methods.  ;)  You're still dealing with the same
> representation, not converting to a different type.  You're just
> addressing a lower level part of the representation.

What do you mean by "mutually exclusive"?  "Never useful at the same
time"?  You're asking for an example of things never useful at the same
time that are useful at the same time?!

The struct type with so many bits being signs, exponent, significands,
IS a distinct type from double-precision float -- it's the
representation of the latter according to some standard.  To multiply by
0.1 I have to have a float, to 'get the N-bit integer that gives the
exponent shifted right by 3' I have to have that struct type.  They're
totally distinct (not "mutually exclusive" because they ARE useful as
ways to look at the same bitbunch at the same time, of course) types,
ways to analyze or interpret the same bunch of bits (apart from the
untyped representation where I can do binary I/O with them, too).

> I can see the point though: at least in most of the languages I'm familiar
> with, float is declared as a type while there's no subtype of float that
> specifies the sign, mantissa and significand.

Right.  To get at the bitfields, you use weaktyping instead.

> > Another example: we're going to send a controlblock of 64 bytes to some
> > HW peripheral, and get it back perhaps with some mods -- a typical
> > control/status arrangement.  Depending on the top 2 (or in some case 4)
> > bytes' value, the structure may need to be interpreted in several
> > possible ways, in terms of juxtaposition of characters, halfwords and
> > longwords.  Again, the driver responsible for talking with this
> > peripheral needs to be able to superimpose on the 64 bytes any of
> > several possible C-level struct's -- the cleanest way to do this would
> > appear to be pointer-casting, though unions would (as usual, of course)
> > be essentially equivalent.
> 
> Is the interpretation of the controlblock uniquely defined by the top 2 or 4
> bytes, or are there some values for the top 2 or 4 bytes for which I have to
> apply two different interpretations (C-level structs) to the same sequence of
> bits?

In the HW I was thinking of, the former is the case.

> If the top 2 or 4 bytes uniquely define the structs, then I would just say
> you're just going back and forth between a typed structure and its untyped
> representation.  If the top 2 or 4 bytes can specify multiple interpretations
> for the same sequence of bits, then this is the example I was looking for. =)

I need to examine the top bytes of the block as the HW returned it, in
some cases, to know what struct type is most useful to interpret the
bunch of bits.  There is typically only one type (besides 'just a bunch
of 64 bytes') that it useful at _one_ given time.  But weak typing does
not require parallel processing without locks -- only if two independent
threads of controls were looking at the same bits concurrently from two
separate processors would saying "at ONE time" make sense... true and
unfettered concurrent access...

As for two different interpretations of the same bits being useful (not
"at the same time"), consider a 16-bit field that can be seen as one
16-bit word or two 8-bit bytes.  In the former case, '0' means the whole
operation concluded successfully, any non-0 means problems were
encountered.  So, a piece of code that just needs a pass/nonpass filter
on the operation is best advised to tread that field as a 16-bit word,
so it can test it for == or != 0 atomically.

At a deeper level, one byte indicates possible problems of one kind (say
ones "intrinsic" to the procedure/operation in question), another
indicates possible problems of a different kind (say ones "extrinsic" to
the procedure per se, but caused by preemption, power failures, etc).
Unix return-status values aren't too far away from this.  If you need
accurate diagnosis of what went wrong, seeing the same field as two
8-bit bytes is handier (assuming you can get some kind of lock in that
case, since you are then dealing with nonatomic testing).

You could see a test such as "if x->field16 == 0:" as a weird shorthand
for "if x->field8_a == 0 and x->field8_b == 0:", but depending on
considerations of atomicity it might not even be.

Another example where the same sequence of bits may be usefully
interpreted in more ways at the same time: given a string of bytes which
encodes some unicode text in utf-8 it's clearly useful to consider it as
such, parsing it left to right byte by byte to find the unicode chars
being encoded and display the proper glyphs, etc.  But I may also want
to walk the same area of memory as a sequence of 64-bit words to compute
a simple checksum to ensure data integrity (as well as the usual need
for 'untyped' bytescan for I/O).  Or, say I don't know whether the
incoming data were utf-8 or utf-16; by walking over them in both 1-byte
(utf-8) and 2-byte units I may well be able to get strong heuristic
indications of which of the two encodings was in use.  Similar
heuristics are sometimes very useful even in determining whether a bunch
of 4-byte words from a record are floats or ints -- as long, of course,
as you CAN walk them both ways and compare strangeness-indicators.  If
you even need to recover old data from datasets whose details were lost,
you'll find that out for yourself.

Alex