Getting started

Mon Sep 23 05:17:48 EDT 2002

James J. Besemer wrote:

> 
> Alex Martelli wrote:
> 
>>$a = "2.3";
>>$b = 45;
>>print $a+$b;
>>
>>emitting 47.3 is pretty close to what I consider "weak typing".
>>
> 
> I don't see why you call this "weak typing".  In Perl, the "+" operator
> is defined to work with strings or numbers.  

It's defined to work *INDIFFERENTLY* on numbers, or on strings that
look like numbers, "ignoring" the types of the operands.  Since most
(all?) operations are designed to ignore the operands' types, this
makes those types much weaker than they would be in languages where
the operands' types are respected.

> It may seem more confusing

Not at all: to the typical newbie, it seems LESS confusing to be
allowed to ignore the differences between numbers and strings,
because the significance of the types is so weak.  Like so many
other aspects of Perl, weak typing is *intended* to be a handy
convenience for the user.  It turns out NOT to be, IMNSHO (or
else I'd still be using Perl, no?-), because it ends up as a
collection of trips and pitfalls.  E.g., if $b erroneously turns
out to be "hello world!", $a+$b silently and transparently masks
the error by treating $b just as if its value was 0.  By masking
all kinds of such errors, weak typing makes debugging large
programs a nightmare (in my experience with Perl -- extensive,
though old enough to be mostrly pre-Perl 5 -- and in my experience
with other weakly typed languages such as Basic dialects).

> at first if you expect "+" also to be the string concatenation operator,
> which it is not.  In the above example,
> 
>     print $a . $b;
> 
> will print the string "2.345".

String concatenation has little to do with it.  I have no
special problem against a "stringize and concatenate" operator.
I'm not a "strong-typing evangelist" -- I'm a pragmatist and
my experience tells me that a "stringize and whatever" operator
is not especially error-prone, no more than, say, the rough
Python equivalent "%s%s" % (a, b) would be.

> This all is no different in principle from Python's:
> 
>     a = 2.3
>     b = 45L
>     print a + b

Wrong!  It *is* VERY different in principle to have a hierarchy
of types such that "A is usable where B is required" (often
called A IS-A B: a special, type-safe case of polymorphism --
but such an important special case that many modern languages
HINGE on it, and almost all languages accept some version of
it, possibly limited to numerics), rather than having specific
operators that "treat both operands as A" vs "treat both
operands as B".  The former is NO weakening of typing, in
principle.  The latter is.

I'm somewhat surprised at seeing you make such assertions at
obvious variance with type theory -- I thought you had a solid
CS background (stronger than my own -- I'm just an engineer,
more interested in pragmatics than in theory).  We _do_ share
the fundamentals, I hope?  E.g., my favorite "essential Luca
Cardelli bibliography":

Cardelli, Wegner, "On understanding types, data abstraction, and 
polymorphism", ACM Computing Surveys 1985

Cardelli, "Type Systems", in "Handbook of Computer Science and 
Engineering", CRC Press, 1997.

Cardelli, "Typeful programming", in "Formal Description of Programming 
Concepts", Springer-Verlag, 1989

Abadi, Cardelli, Pierce, Plotkin. "Dynamic typing in a 
statically-typed language", ACM Transactions on PLS, April 1991

Abadi, Cardelli, Pierce, Rémy, "Dynamic typing in polymorphic 
languages", Journal of Functional Programming, January 1995

Abadi, Cardelli, Viswanathan, "An interpretation of objects and object 
types", Proceedings POPL 1996

> and getting 47.3.  The "+" operator is defined for floats and also for
> longs.  Since they're all "numbers" it would seem really strange if they
> did not interact naturally like this.  But fundamentally they're

Indeed it would; as far as I know, the CAML dialect of ML (also in
the O'Caml extension, last I looked) is one of the very few
programming languages that doesn't have a polymorphic + operator --
you have to use + for integers only, +. for floating-point.  As I
recall some dialects of Forth had a similar lack of polymorphism
for the *opposite* reason -- typing so weak that once data was on
the stack the system had no idea any more about how it had to be
taken, FP or integer -- CAML (but not Standard ML, if I recall
correctly) ends up in the same place as a kind of reductio ad
absurdum of VERY strong vs weak typing ("the extremes meet":-).

"Promotion" rules are (or should be) simple: whenever (e.g.) a
float is required, an integer is accepted instead and gets
promoted to a float transparently and implicitly (ideally without
loss of information -- in practice, since longs are represented
by unbounded numbers of bits and floats aren't, the "without
loss of information" is only an ideal).  Operator / used to be
a notable exception -- fortunately, that notable exception is
at long last going away.

As everybody but CAML's designers seem to acknowledge, such
polymorphism in no way weakens typing (in a formal sense, as
per the above bibliog, this only holds under the constraint of
no-information-loss).  Perl's treatment of arithmetic operators
cannot be framed this way, nor made consistent with typing
theory: it's _weak_ typing, not just _dynamic_ typing.

> Anyway, why should "2" + 5 == "7" be all that much stranger than "2" * 5
> == "22222"?  If the operation is unambiguous and useful what rule would
> it break?

As I already said, newbies do appear to like the idea of confusing
numbers and strings -- in that sense, such weak typing may be taken
as the reverse of "much stranger"... it's what most beginners seem
to expect (take a tour of duty on the help hotline to get a very
good sample of beginners' misconceptions and problems...:-).  But
experience with Perl (and other weakly typed languages) shows that
such crucial weakening of typing, by hiding mistakes, is a serious
burden to bear in developing and maintaining large programs.

The rule broken is that the implicit-conversions directed graph
must be acyclic -- if type A can be implicitly converted to B,
then B cannot be implicitly converted to A.  It would (I suppose)
be possible to build a consistent (strong) typing system where
all strings can be implicitly converted to numbers, but only
if no numbers where ever implicitly converted to strings (so,
for example, the print statement as it stands in Python today
would have to disappear -- more relevantly, so would the %s
format specifier's ability to apply to any object).  Personally,
I think that would be absurd, but I don't think it would break
the rules (as long as information loss in the conversion could
somehow be contained -- which I think means the conversion rule
would have to be something like "treat as a base-256 number
with the char's ordinal values as digits"... forget "1.0"
being treated as the number 1...!-).

Incidentally, the DAG-rule also explains why it's a good idea
to have a[b] raise an exception if b is floating -- since ints
become floats implicitly, the reverse should not hold.  I'm
not claiming that Python is 100%-perfect is this (as it's not
in information-loss terms, see above), but I think it strikes
a roughly reasonable pragmatic compromise.

The int/long situation is interesting, and shows why the
unification of those two types (currently ongoing) IS most
likely a good idea -- we do want them to offer the same
"interface" to the rest of the program, with the only
diistinction being one of how they're implemented internally;
wanting them totally interchangeable means we should thus
have just one type (with two different implementations --
no problem with that).  I expressed different opinions in
the past, but framing things in terms of type theory helps
me change my mind:-).

  Personally, I think Python would be improved slightly by
> adding this conversion.  It's convenient and there's ample precedent in
> other languages (Awk, Snobol, Icon, Perl, etc.).  But I am sure that
> notion will provoke howls of protest from the True Believers.

The only reason I'm sniggering rather than howling is that I'm
confident that this (expletive deleted) proposal will never be
implemented in Python.

> There are a lot of reasons why Python is better than Perl but strongly
> vs. weakly typed is not one of them.  I don't see a material difference
> in the "strength," per se, of typing between Perl and Python.  They're

Then you are not looking at things in the right way.  Maybe you
do need to refresh your knowledge on type theory.  Python (and
Smalltalk) are much more suitable than Perl (awk, Snobol, etc)
to implement large programs, apart from all other differences,
exactly *because* typing is stronger in Python and Smalltalk
than in those other languages.

> Getting back to the original question, the quintessential example of
> "weak typing" is in old C or Fortran where you can define a function to
> take, say, a string argument but are allowed instead pass in a real or a

In the Fortran language (Fortran IV and Fortran 77 -- the versions
of Fortran of which I had to become a guru: I have no deep knowledge
of earlier or later versions) you are NOT allowed to pass a real to
a function that wants (e.g.) an integer (no strings in Fortran IV).
If you do so, your program's behavior becomes undefined.

No compilers that I know of actually *diagnosed* this programming
mistake (hmmm, perhaps Waterloo Fortran did -- that WOULD be quite
consistent with its didactic orientation -- but it's been too many
years since I last used it, so I don't recall), but that's quite
another issue compared to how the LANGUAGE is defined.  "Old C"
was never precisely specified, so that knowing what was in fact
an error leading to undefined behavior which a given compiler
could not diagnose is harder; taking lint as part and parcel of
the language, as I recall was advised by C's authors and/or main
practitioners, then again we can say that such mismatches were
NOT allowed.

> pointer to a struct and get garbled results or even corrupt the program:
> 
>     printf( "%s\n", 2.5 );        // possible program fault

Variadics are indeed a problem (gcc diagnoses this -- but wouldn't if you
didn't use a constant as the format).

> Another example is all too familiar to MFC programmers:
> 
>     CEdit* edit = (CEdit*)GetDialogItem( ID_FOO );
> 
> GetDialogItem() essentially returns a raw pointer.  The programmer is
> required to explicitly cast the pointer into the proper type, the same
> type as the dialog item identified by the numeric ID.  However, if the
> dialog item itself changes, say, from an Edit box to a List box then the
> cast becomes wrong, some operations on edit may fail catestrophically,
> and -- worst of all -- the compiler cannot detect and diagnose this
> error for you.
>   
> These types of errors are not possible in Python or Perl, which
> illustrates that they're both "strong".

Yes, this is a good example of the dangers of (old-style) casting.
Of course, you could (and probably should) just assign to a plain
CWnd* (no raw pointer involved: it's a pointer to the base class),
then check it's indeed a CEdit* before you proceed, but in practice
programmers DO perform unchecked downcasts and come to grief thereby.

> Curiously, in Python
> 
>     print "%s" % 2.5
> 
> prints the string "2.5" instead of, say, raising an exception.  So
> there's some small precedent for treating numbers as strings in a string
> "context" even in Python.

See above: in a sense, it's exactly BECAUSE we _want_ to have
operations that "transform into a string then ...", that theory
tells us the _reverse_ (making the types-DAG into a graph with
loops) is a disaster.  Incidentally, Java (strongly AND
statically typed) does that, for strings specifically, too
(using the toString method, like Python uses __str__).

> I'll go out on a limb and say that most "dynamic typed" languages are
> also "strong" by necessity.  If the user isn't keeping track of the
> types then the system HAS to.  An exception I can think of would be an
> ancient dialect of Lisp where integers also were machine addresses and
> thus callable without restriction.  E.g., if a function name evaluated
> to a number the interpreter performed a subroutine call to that address
> instead of, say, applying a lambda.  It was the mechanism through which
> system builtins were accessed.  However,  nothing prevented a user from
> "calling" an arbitrary integer and leaping off into never never land.

Yep, that's pretty weak typing indeed -- Forth-ish, I would say.

But you miss the essential distinction: what DRIVES typical ops,
the types of the objects involved, or the operators &c that the
programmer has coded?

In a strongly-typed language, types drive.  In a weakly-typed
language, operators (or the equivalent thereof) drive.  It's not
a black-and-white issue, but it's the key distinction.  In your
"weak Lisp" example, the "call this thing" operator (so to speak:
not _syntactically_ an operator, but that matters not) drives.

Recall your question up above...?

> If the operation is unambiguous and useful what rule would
> it break?

I think you've given a good example of why, even though the definition
WAS unambiguous (if it's an integer, unambiguously this happens)
AND useful (e.g. to call system builtins), this weak Lisp was
not optimally designed for practical use, particularly in the
development and maintenance of large programs.

>>find it very hard to produce substantial, large programs using
>>weakly-typed languages.
>>
> Given my definition of "weakly typed," I agree it's more difficult than
> in strongly typed languages.
> 
> On the other hand, isn't it the case that much of Python is implemented
> in C, which is weakly typed?  So "weakly typed" makes it harder but not
> impossible, at least not for some.

Why, sure.  HUGE programs have indeed been developed even in
assembly language, after all (they may be riddled with so many
bugs to put an ant-hill to shame, maybe, but they're still in
use:-).  C has been used (IS still in use today) to develop
more and larger programs than other languages.  That most of
them could have been developed and could be maintained with 1/10
the effort in Python is another issue.

Human ingenuity (and stubbornness) is strong enough that any
claim that some intellectual-work endeavour is "impossible" (to
a sufficient number of sufficiently-motivated human beings,
with unbounded amounts of cash to keep them going:-) had better
be backed by a very solid mathematical proof...:-).

>> Python is strongly, albeit dynamically, typed,
>>
> FWIW, so is Perl.

Nope: see all of the above.

Alex