Perl is worse! (was: Python is Wierd!)

Fri Jul 28 06:57:31 EDT 2000

"Steve Lamb" <grey at despair.rpglink.com> wrote in message
news:slrn8o2fgr.1e2.grey at teleute.rpglink.com...
> On 28 Jul 2000 19:46:29 +1200, Paul Foley <see at below> wrote:
> >The distinction you're trying to make (between "types" and "just
> >data") is rather foolish.  Data are the things that have types.
>
>     Rather foolish?  For whom?  Humans don't think in "integer" and "long
> integers" or "strings".  We think in terms of data whose exact value,
"type",
> is determined by context alone.

Human language, and human thought, are really all like this: fuzzy,
imprecise,
vague, ambiguous, highly context-dependent.  Ludwig von Wittgenstein, in
my opinion (and not mine only:-), is the thinker who most lucidly explored
this
issue -- in a manner all the more striking because, in his youth, he took
such
a drastically different tack, striving to define the conditions for "proper"
speech,
by which he meant speech (and thought) logical, precise, specific,
effective.

He concludes his first masterpiece, the "Tractatus Logico-Philosophicus", by
the famous admonition "Whereof you cannot speak, thereof must you be
silent".  The main implication of which is that there ARE things, extremely
significant ones as it happens, of which "speech" (in the logical, etc)
sense
is just unfeasible (e.g., earlier in the Tractatus, "The Mystical cannot be
spoken
of: it SHOWS itself").

Alas, a built-in contradiction lies at the root of this -- because, among
the
Unspeakable things, are most of those which the Tractatus speaks of!-)

Young W, quite aware of this, makes heroic attempts to talks his way around
the issue ("this is just a ladder, to be discarded once you've used it to
climb" -- I'm quoting by memory, and no doubt unreliably).  But the fact
remains that humans WANT to speak, most of all, about "unspeakable"
things (in the precise, young-Wittgensteinian sense of "speech").  And
precision, sharpness, logical rigour, are just unfeasible in such domains.

And so, after decades of development and reflection, does Wittgenstein
come up with the framing best displayed in the postumously published
"Philosophical Investigations" (alas, not a finished work: he never thought
it ready for publication).  "Whatever we're talking about, we are talking
about the natural history of human beings".  Language -- actual use of
speech -- is a collection of language-games that are played out, for
biological reasons connected to how evolution shaped us, in their various,
separate contexts.  Rigour, sharpness, precision, are ideals never to be
attained (save by hypothetical inhuman-speakers), and their relentless
pursuit can be a snare and a delusion.

Now this framing leads to an interesting perspective regarding the
unceasing stream of requests for computers to be able to handle
"natural language".  This really translates to a request that the machines
be able to deal with deliberate and accidental ambiguity, mixed signals,
vagueness, imprecision, fuzzy requests that had better be ignored, all
the mix of issues that human being themselves often have trouble with
(else, young geniuses would not spend years writing "Tractatus"-like
pleads for precision, sharpness, logic...:-), and are, roughly speaking,
able to disambiguate/resolve by relying on shared context, which is
largely cultural but ultimately rests on biological foundations.

Yes, I would call this quest "foolish", just as I would call Don Quichote's
quest for unattainable chivalric ideals, young Wittgenstein's quest for
unassailable, perfect sharpness in language, and many other ones of
humanity's noblest pursuits.  This is a strictly descriptive term and does
not imply value-judgments in the general case; we'd all be less human,
and poorer for it, if all of humanity's efforts were always bent towards
practical, attainable, sensible ends (besides, pragmatically speaking,
a tiny fraction of such 'foolish' projects ends up with often-unexpected
side benefits of such magnitude as to repay humanity, collectively, for
all of what a bean-counter could consider "wasted" effort; although it's
not a strictly good example, since the collective benefits are highly
debatable, an analogy might be Columbus' foolish quest for a way West
towards Asia -- unattainable due to Earth's actual girth, which he should
have known but had wilfully blinded himself to, but serendipitously ending
up in the lucrative-for-his-backers landing in unforeseen America).

Floating down from these dizzy heights, back to everyday programming:
that humans being tend to think in certain fuzzy/imprecise/ambiguous
ways is hardly a recommendation for a language intended for human
use *to interact with computers* to shy away from precision, clarity, and
sharpness, nor does it necessarily make it any less 'foolish' to do so.

Very specific analysis of the various possible trade-offs will be needed:
what levels of ambiguity are allowed, what their practical consequences
are, and so forth.

>     As a human, type 1.  Character?  String?  Integer?  Floating number?
No,
> it is 1 and it can be all of those all on context.  Types are a construct
> created to help computers deal with human concepts.  Why, then, when a

Not really: they were molded to help humans deal with *PRECISION*, well
before computers existed.  The need for such precision came from maths,
and often from its growing applications in science and technology as well.

To a human being's typically-fuzzy thinking, the number '23' and the set
containing the number '23' as its only member are basically one and the
same thing, for example.  But mathematicians found it necessary to draw
a sharp distinction between an elementary object, and the singleton set
containing that object, well before computers existed.  "CATS": is this
a 4-letter string? a set of animals? a popular musical? or an important
diagnostic technique?  Philosophers felt the needs for such precision
of concepts well before mathematicians did -- and laid the foundations
for what we can recognize today as a theory of types.

Human beings are often able to disambiguate from context -- but not
always.  One of my cherished memories is a Uniforum session, many
years ago, which was billed as being about "ATM Networks" -- keen,
like many others, to learn the latest about Asynchronous Transfer Mode
networks, I trooped into the small room and, like all others, found myself
surprised to see it packed with twice as many people as it could hold;
half of them, as it happened, the suit-and-tie kind of people, rather than
the T-shift-and-sneakers guys you could expected to be interested in
such tech exoterica, which made up the other half of the audience.  It soon
transpired that half the audience were in the room to learn the latest about
technology being applied to networks of Automatic Teller Machines, of
course.

It so happened that the context of being at a computer and networking
conference was NOT sufficient to disambiguate -- the guys from the
banking and finance sectors had quite reasonably misread 'ATM', it
being such a pervasively used acronym in _their_ world.

It was bad enough (if excessively amusing) among human beings; it
could be very serious, and potentially VERY costly, if such ambiguity
was let loose among clearly-common-sense-free computers.

> language comes along that does a darn fine job of doing the right thing by
> defining data as a scalar (it isn't a string, it isn't an integer, it
isn't
> floating point, it isn't a character, is can be all of them depending on
> context) do you call it "foolish" to think in those terms?

In my practical experience of Perl use, this can and does indeed give
rise to substantial problems.  For example, if a floating number gets
implicitly coerced to a string, and that string gets coerced back to a
number, a loss of precision is a frequent occurrence.  Of course, this
specific issue is not going to hit you on the nose until and unless you
attempt to use Perl for substantial computations.  But the general issue,
which tends to loom larger and larger as program size and complexity
grows, is that coercions are being silently, implicitly perpetrated *that
are NOT reversible*.  Whenever a pair of conversions, A->B->A,
happens implicitly and silently, and the A you end up with is NOT the
same A you started out with, then you have a problem just waiting to
show up.  By general law, it will show up when you can least afford it
do, i.e., when you have just deployed your application at your most
valuable customer's site:-).  ("Only the paranoids survive"...:-).

The ancillary costs must also be considered.  By choosing not to
distinguish between two types of data, you miss being able to use
those types to provide you with context to clarify the meaning of
the operations you perform; rather, the _operations_ must supply
the context and therefore the meaning themselves.

I.e., having 'abc' + "def" return 0 is NOT very desirable: surely the
"context" of the "+" being applied to strings should let it mean
"concatenate these strings" rather than "sum these numbers"?
They *aren't* numbers after all, are they?

But you can't have it both ways: either the operations fix the
context rigidly, and the data gets coerced (with information
loss) to fit that context; or, the data has a specific type, that can
supply the context in which the operation is performed.  Perl
picks the first alternative, and thus you must have different
operations for "concatenate these strings" and "add these
numbers", specify explicitly every time whether a scalar or
a collection (and of which kind) is being used, etc, etc.  I see
colleagues with years of experience in Perl, and enthusiastic
about the language, STILL write "@x[3] oops I mean $x[3]"
and so on.

I'm not being a Python apologist here; I think that using typed data
is preferable in most cases _but not in all_, and that Python should
let the operation determine the context, rather than the data, in a
few more cases than it currently does.  E.g., the print statement now
coerces its arguments to strings, and I consider it a good thing.  But
the operator / does integer (truncating) division on integers, and that
is a detail which I consider not optimal -- I believe it would be better
(like, say, in Pascal) to have different operators for division-with-
truncation and division-with-floating-result (the existing divmod builtin
function, actually, could perfectly well carry the burden of division with
truncation, IMHO, letting / always mean division-with-floating-result --
were it not for the need of backwards compatibility, of course, i.e. in
an abstract and hypothetical language-design-ab-ovo context).

> >character "a" (Python, sadly, lacks a proper character type);
>
>     I'd consider that a blessing.  Once less artifical construct to get in
the
> way.

Here I happen to agree with you -- since bidirectional transformation
between a character and a 1-character string yields no information
loss, and no 'intuitively-desirable' "overloads" of operations to mean
different things in the two context appear obvious, then little would be
gained in drawing the distinction.

A similar situation, IMHO, holds between integers and long-integers;
I hear that the current sharp distinction between them is going to be
somewhat relaxed in a near-future release, and that seems to be a
good thing to me.  So, you see, I'm being anything but a type-purist;
I dislike Perl's silent implicit coercion between strings and numbers
for highly pragmatical reasons, direct and consequential, borne out
by many years of practice (and tens of thousands of lines of code)
with Perl, and a few months (and thousands of lines of code) with
Python.  My pragmatical inference from this experience (and from
reasoning about it) is that keeping the distinction just works better
in most cases, although specific, constrained, motivated cases of
relaxation thereof (such as in 'print', and IMHO also in some other
cases where it is not currently applied) might be worthwhile.

> >No; "a" is a string and strings are sequences; 1 is an integer, and
> >integers are not sequences.  This particular quirk is in your head.
>
>     No, the particular quirk is in the language by forcing types.
Remember,
> all this came about because I pointed out that Python has just as much
> automagical "typing" going on as Perl does, it just limits that typing
from
> what I can see.

What "automagical typing"?  Are you referring to, e.g., "print" always
trying to turn every argument into a string in order to print it?  Yes, it
does happen, in some very specific cases.  But it's definitely not "just
as much" as in Perl, so I suspect you must be intending something else.

Just in case you're thinking of the fact that a variable can refer to
different types in different cases, why, of course -- in fact Python is
much more general in this than Perl, since the same variable can
refer to a string, number, list, tuple, dictionary, object, &c, without any
syntax quirks needed to distinguish between the various cases.  The
type is carried by the *data* being referred-to, and never by the variable
doing the referring; which is a completely different issue, with nothing
"automagical" about it (no transformation whatsoever takes place, i.e.,
no coercion -- no risk of information loss whatsoever -- etc, etc).

Alex