New language

Sun Jun 3 10:08:32 EDT 2001

In comp.object Topmind <topmind at technologist.com> wrote:
[snip]
>> I think I did elsewhere. Too heavy in syntax, 

> The difference is minor between arrays and 
> tables for a well-tuned table API. 

I was talking about the difference between tuples and dictionaries,
not arrays and tables.

> I will usually take the slightly-more up-front syntax in
> order to get longer-term adaptability any day.

Less syntax sometimes helps you with longer-term adaptibility.
This is another tradeoff. As I tried to express before, I use
tuples, dictionaries, arrays and tables in different contexts and
I don't have to switch between them often. For me, the specialized
syntax/semantics help me far more than they harm me.

> I already gave an example of where I was burned by
> dictionaries about 5 or so messages ago.

I'll have to go look it up.

>> However, as I said before, it's mostly a syntax issue to me
>> [not just speed]. There is
>> a case to be made for a kind of tuple-like dictionary, though the
>> advantages of that are minimal in my opinion; for anything more complex
>> I'd be inclined to use a full-fledged object anyway. 

> But an object does not give you a full-fledge collection still.
> I can view a table while the program is running, for example.
> All an object gives you above a basic array is more columns.

> http://www.geocities.com/tablizer/collrght.htm

And a full fledged table doesn't give me a list of objects still.
It goes both ways. Objects can have attached methods and together
with polymorphisms this can be exploited.

You can of course make an object wrapper above tables to do the same thing,
and for scalable apps with lots of data this can obviously be useful, but
I frequencly have collections of dozens of objects, not thousands. Doing
a decent object wrapper above a table is a lot of work, too, and it often
pays to do something simpler.

[snip]
>> >> I disagree; I think a distinction like this can help sometimes. Look at
>> >> Perl and their scalars, which merges things like integers, floats and
>> >> strings into 'one thing'. 
>> 
>> > I like that approach. It makes the code leaner and cleaner IMO.
>> > Less casting, converting, and declaration clutter. It allows
>> > you to look at raw business logic instead of diddling with
>> > conversions, casting, and bloated declarations.
>> 
>> You're confusing things here; you're confusing the effect of static
>> type checking (declarations and casting) with that of having different
>> datatypes. In Python, you don't need to declare or cast integers, only
>> convert when necessary. You need to convert an integer to a string if you
>> want to use integers in a string, for instance.
>> 
>> Because of this, the program stops when you do something silly, instead of 
>> going on blindly and making a mishmash of your data.

> I prefer that the "conversion" be done when comparing. IOW, having
> an API that says "compare as numbers" or "compare as strings".

I find the conversion doesn't happen that often to make something like
that pay for the drawbacks it has (doing 'magical things' by trying to
guess what the programmer really meant; what integer value does the
string "foo" have? What about "1foo"? What about "foo1"?).

I convert things on input only once; this is the more efficient way to do
it anyway. Similarily, I convert things to a string on output; I don't
mind a little bit of automation there -- everything can be made to have
*some* string representation. So I rather like that:

 "%s" % foo 

in Python converts foo, whatever it may be, to a string (the %s indicates that).

> In my pet language, a comparison might look like:

>   if x %cta> y
>     .....
>   end if

> The "c" means compare as character and the t means trim and
> the "a" means compare case-sensative.

> Most languages require one to do this:

>   if trim(uppercase(toString(x))) > trim(uppercase(y)) then .....

> I factor these operations into the middle. 

Frequently it's quite possible to factor them out before you do the 
comparison. You also haven't addressed the case where you want to do different
things to x than you want to y, right?

It's an interesting idea, though. You can of course also simply produce
a comparison function which does the same:

def cta_greater(a, b):
    return str(a).upper().strip() > str(b).upper().strip()

And then you could just use:

if cta_greater(a, b):
    ...

Effectively you could reach your result by going one step further;
introducing the ability to use cta_greater as an infix operator instead.

>> This is *not* the same argument as that for static type checking however;
>> it is important to see the distinction. It's an argument for a light-weight
>> dynamically checked type (or interface/protocol) system.
>> 
>> What you seem to be describing as the benefits of the Perl scalar may
>> instead be the benefits of the absence of statically checked types.

> Perhaps. I still have not got the hang of Python's typing system
> approach yet. I don't like types anymore. Types create "hard
> to disect binary blobs" in my view. I have grown toward the
> Unix philosophy of "every interface between systems should
> be ASCII" (Or UNICODE, perhaps) if possible. I now apply this at a
> smaller level than just "between systems and applications".
> (Accept I evolved it up to tables also. The ultimate
> xfer protocol: text and tables.)

Starts to sound like XML. :)

Python's 'type system' is fairly simple; if you call a method on any
object, it just tries to find a method of that name on the object (looking
in base classes if available) and if it succeeds it happens.

The 'special things' just get translated to such method calls; they're
just syntactic sugar:

a[i]      a.__getitem__(i)
str(a)    a.__str__() 
a()       a.__call__()
a + b     a.__add__(a, b) 

There's a bit of subtlety that's planned to go away soon in that 
built-in types coded in C are somewhat different from the 'types'
you can create by defining a class in Python.

The __add__ defined on strings and integers (the built-in variety)
refuses to add a string to an integer just like that; instead it
raises an exception. You can write your own string or integer classes if
you really insist on doing that (presumably by trying to convert the
strings to integers), though (though I doubt anybody actually does).

>> > A jillion messages
>> > are already devoted to that topic, with no "killer proof"
>> > on either side. It may be subjective which is the "best".
>> > I grew up on strong typing, but have gravitated toward
>> > prefering dynamic typing over the years.
>> 
>> Me too.

> I should change that to "type-free". I grew up on strong
> and explicit typing, but have completely reversed 
> and wish my language of use was completely type free.

So how does the '+' operator deal with adding two integers?
Two strings? A string that can't be converted to an integer sensibly
to one that can? Two collections? Note that I'm
talking the language of types here, but I don't see how to
avoid it; not all strings make sensible integers.

If it 'all just works', then I'll have to remember a tremendous lot
about the + operator, and if I accidentally add the wrong things together
my program will happily continue mulching the data and it'll come out
wrong (and where did it fail?). Effectively + may have to define
n * n operations, where 'n' is a 'category of things' (if we want to
avoid the word 'type').

An other alternative similar to this to make + do an implicit conversion
of both its operands, say, to integer. Non-integer scalars will be converted
to say, 0, if they can't be converted to a sensible integer. Collections
will also be converted to integers; lets say it's their size, or 0 as
well. The result of + will always be an integer-category scalar.
This still needs you to remember the n conversions that + may do, for
all 'categories of things', including special casing like strings that
may or may not contain integers. The data mulching problem remains as well.

Yet another alternative is to make + fail on all categories of things
it can't really add. You'll have to use another operator for that.
This is in effect Python's or Smalltalk's system; though for 'operator'
you should read method. Methods fail whenever the system just doesn't
know what to do (the method is not defined on the object you're calling
it on). You can see this as a 'typeless' system; in effect types are
an abstraction that the language doesn't really have to know about; the
language only cares about operations. In this "typeless" system you 
get some of the effects of a typed system if you want to; I can't add
'foo' to 'bar' if foo and bar don't know how to be added.  

> My pet language has only one type: a dictionary
> array. It is used for *everything* including scalers.
> scalers are simply a shortcut for something like:

> x = 5
> x.__value__ = 5        // same

> (It does not use underscores, but something
> equivalent.)

You'll have to show more about this for me to make sense of it.
Is 5 a dictionary too? If not, not everything's a dictionary;
literals evidently aren't. Is 'x' a reference or an actual
value? Is assignment polymorphic based on the left hand side, and
if so, there must be more than just dictionaries on the left hand side.

How does your language know what to do with this?

x = y

If y is a dictionary, then this could rebind the reference of name 'x'
to whatever dictionary is pointing at. However, how would you do
this, then?

y = 5
x = y

> Well, I should say everything except for internal
> structures.

I'm not sure what you mean by this either.

> It is not like Python that way. I
> see no real need for that. I would probably use
> tables to pull off what others would use the
> Python meta language tools for.

> My pet language in many ways is similar to Python,
> but much more minimalist WRT types and collections.
> Python has too many syntax variation IMO.

Note that this is in the end mostly syntactic sugar in Python as
described before; I won't go into how syntactic sugar can help
again, I did that before.

>> >> It's funny you should compare tuples with dictionaries and say they 
>> >> should be conflated; most people complaining about tuples say they're
>> >> too much like *lists* (arrays). They're right that they're very much
>> >> like lists, 
>> 
>> > That too. Roll 'em all up. Requirements change. I hate recoding
>> > from lists to touples to dictionaries to tables, etc.
>> > Make the interfaces the *same*, and only swap the engine, NOT
>> > the interface.
>> 
>> Oh, I'd say make the interfaces different, use the same engine where
>> possible. 

> Why make the interface different? Then you have to overhaul
> everything if your collection needs change. (A Meyerian
> Continuity sin.)

I already said before why I said this type of collection change is
not as frequent as you say it is, at least in my programs. I think
the tradeoffs way into different directions than you do.

> It is not just "minimalism", but anti-sub-typing also.

There's no sub-typing here; just things with different interfaces.
A single interface for everything simply does not make any sense;
you'll have to have different interfaces at some level; if they're
not explicit they'll be implicit in use, and that can lead to hackery
(trying to use an interface beyond its design constraints) and maintenance
problems (due to the hackery and because the reader of the code gets
little clues about what you're doing at any particular point with
the interface).

>> I changed my mind a little about dictionaries; in practice
>> they're often used to store lots of homogenous values, not as a kind
>> of datatype (in Python, class instances (objects) are used for that).
>> 
>> I think it's a myth that having a universal rolled-into-one collection
>> type helps your program deal with change more easily. 

> Well, I am pushing that "myth" and have no reason to
> back down.

>> In my Python
>> programs, I use lists and tuples and dictionaries and tables in 
>> rather different places in different idioms. 

> But do they STAY different?

Not always, but usually. The constraints of the average application
don't change that radically. Change happens more often if you're
whipping up a quick prototype of something that you'll do again
in a more robust fashion later, but then the 'throw-away' is planned,
and you know it's not going to scale.

> I find that collections often need more than what they started
> out needing. IS-A collections cannot hop IS-A fences very well.

> Perhaps an example would help.

Okay; thought I gave you some before, but I'll try:

lists: homogenous ordered collections of objects:

# going through all the lines in a shortish file and doing something to them
f = open(filename)
lines = f.readlines()
for line in lines:
   ...some processing here...          

Lists generally have dozens to thousands of elements, are often thrown
away after use and tend to be generated by iteratively using .append() or
.extend() to add new elements to the end. The same thing frequently gets done 
to all elements, such as in a for loop.

dictionaries: homogenous unordered keyed collections of objects:

# totalling something
dict = {}
for element in something:
    dict[element.id] = dict[element.id] + element.get_some_number()

Similar to lists, but order is not important or can alternatively be
easily derived by sorting the keys. New elements are frequently added
by indexing to new keys; deleting occurs occasionally as well. Asking for an 
object by key happens a lot. Loops are common to do the same thing to all
key/value pairs. 

objects (class instances): heterogenous unordered keyed collection of objects:

# setting some attributes
a.foo = "a text"
a.bar = [1, 2, 3]

Each attribute of an object is completely different and treated in a
different way. Loops through all attributes are extremely rare. Attributes
are often queried for and changed. Order of attributes is unimportant.
Just a few attributes, though some may of course be other collections.
New attributes are rarely added after initialization. Deletion of
attributes is even rarer. 

tuples: heterogenous ordered collection objects:

a, b = foo()

Just a few elements, though of course some may be other collections.
Each element is different; loops through all elements are infrequent.
Elements are rarely added and virtually never removed; the only way
to 'add' an element in fact it to create a new tuple. Frequently
thrown away after temporary use.

tables: homogenous keyed/ordered collection objects:

Often very many elements. Access by key is frequent; access by query is
frequent. Ordering can often be easily imposed (and internal
ordering often exists), and operations to all or subsets of elements are
fairly common. Elements are added and removed fairly frequently. 
Usually persistent. 

Switching from a heterogenous collection to a homogenous collection is
extremely rare. Switching inside the categories is more frequent; while
moving from a list to a dictionary and vice versa (ordered to keyed) is
relatively infrequent, switching from either to a table is more common
as the application needs to scale. Heterogenous collections such as
instances can then be mapped to records.

Such scalability is often completely unnecessary though, and can hinder
by taking a lot of resources and program space/programmer mind. Frequently
data structures are temporary, or you know you're not dealing with
more than just dozens of objects.

>> While it is possible I
>> change one into the other on occasion, this is the exception, not the rule.

> For what I do, rule.

> At least frequent enuf to want to prevent it up front.

Yes, this often makes sense. Scalability is important. But in my 
applications this still leaves very many situations where I know my
list isn't going to be that large.

>> When such changes do happen so many other changes tend to happen it
>> doesn't really matter anymore anyway; the change in collection type is
>> probably caused by such a larger change.

> I disagree. It might simply be another view of the *same* data,
> another new column, etc. One thing about custom business
> programming is that many different parts often need the
> same data, but with a different view, lookup, join, sort,
> etc.

I frequently use tables for that kind of thing. But that's not the only
thing that happens in my programs. There is user interface code, there
are throwaway data structures.

> IOW, you never know what or who will need data from
> your collection(s).

For some collections, yes. But there is a large set of collections used
in other contexts, in my programs.

>> With a universal collection type you lose some of the benefit of these
>> separate idioms (which can help with the readability of the program). 

> I disagree about readability. When collection needs change, trying
> to force a linked list or dictionary into something else makes for
> a much larger readibility problem.

> Like I said, array syntax may give you a SLIGHT benefit up
> front, but the loss down it road more than makes up for
> this.

Again, depends on what your usage is.

>> You
>> also may increase errors, as due to the absence of different interfaces
>> and idioms you run a higher risk the program will continue after an error
>> and mangle your data in unpredictable and hard to track down ways.

> I would have to see some examples of this.

I think using an array for something you
really need heterogenous collection for is an example.

a[0] = "foo"
a[1] = 2
a[2] = [1, 2, 3]

is going to cause no end of confusion. Same for most, though not all,
uses of dictionaries in that way. This is also why table records are
not tables; a table record is a heterogenous collection.

> The "protection" needs
> often don't align along the collection type's boundaries
> or features. Not allowing dictionaries to be sorted (in place) is
> an *arbitrary* limit in my book.

Then you don't want to implement your dictionaries using a hash table
but some form of tree. Just a side-note. 

>> Anyway, as I said before, you're in the minimalist camp here, along with
>> Smalltalkers (everything's an object with messages) and Lispers (everything's
>> a list). 

> WRT to collections, yes, but not control structures
> (IF, loops, etc.)

And not with respect to operators either, as we saw before. :)

>> I take the position that syntactic sugar can help with idioms,
>> which can help with clarity, readability and error detection.

> Well, we will just have to agree to disagree. I have used
> both approaches, and don't like collection type proliferation
> the least bit.

All right. We'll agree to disagree. Note that I never said tables suck;
tables are cool. I'm interested in seeing a language which tries to
push tables further; allow light-weight throwaway tables that do
have querying abilities. You may run into some difficulties, however;
I've actually tried to use a light-weight table system, called MetaKit, 
in a Python XML DOM implementation. It turned out to be rather slow still.
You may call it my fault, but when I created another backend which used
dictionaries and lists, it was faster. :) Still, Metakit was neat and
certainly could be the right solution in other circumstances. You may
want to look at it for inspiration, or perhaps even to use as a backend
for your pet language (does it have a name?).

Metakit:

http://www.equi4.com/metakit/

>> >> except that they're immutable (like integers and strings
>> >> in Python, but unlike lists and dictionaries and instances). Your
>> >> desire to conflate them with dictionaries is in my opinion wrong as well,
>> >> but you're more right than those who want to merge them with lists; 
>> 
>> > Show me "wrong".
>> 
>> Wrong as in "I think there are arguments against this which you are missing
>> and I disagree with your evaluation of the tradeoffs". This is a
>> subjective issue. I imagine you can do empirical research about programming
>> language effectiveness and these issues, but I'm not going to do it.
>> Are you? 

> Nope. I won't challenge any agreement that it is subjective. I 
> should be happy enough that you agree it is likely subjective. This is
> a lot more than I often get out of the pro-OO camp.

Of course since it's *my* subjective impression I think it's important,
just like you think yours is. Like you, I think others may benefit from
my impression. :)

> I just wish the industry would realize this and knock it off
> with the one-paradigm/language-fits all scenario, such as the
> Java-tization of everything.

Me too. It just seems to be industry; there seems to be a longing for
'magic pixie dust' you just sprinkle over your problems in IT and it'll
solve everything. Java, XML, OO, open source, structured programming,
4GL languages, whatever is the hype of the moment. All can be good
tools and you can learn something from them (if even how not to do
it :); but there simply is no silver bullet, as it'll always be a question
of tradeoffs, which will be different in each new context.

>> >> tuples are generally used as 'records' (heterogenous objects) and not
>> >> as lists of homogenous objects.
>> 
>> > Doesn't matter. Needs change. See above. Homo today, hetero tomorrow.
>> > Micheal Jackson Collections, you could say.
>> 
>> Heterogenous collections are not going to change into homogenous collection
>> and vice versa in by far the most circumstances. If you disagree, you
>> should name some cases; I can't think of any.

> You would have to leave an actual example because "heterogenous" may
> depend on how one classifies the world in their head.

Okay, see examples higher up. To be short; a heterogenous collection
rarely has the same thing done to all elements; all elements are
created and treated *differently*, while a homogenous collection
usually has all elements created and treated in the same way.

[snip]

> It would be interesting to see some of your designs.

Hm, it's mostly a bunch of idioms I and other python programmers use
in our code. I can't think of any good source code to show you that
would make much sense in isolation from the framework in which it
is used... I'll try to think of something.

[reference parameters versus returning multiple things in a tuple]
>> > But harder than looking at the top.
>> 
>> Yes, but the problem already exists in any dynamically typed language
>> where any kind of heterogenous collection can be returned, and you said
>> you prefer dynamic typing. It doesn't add to the problem therefore;
>> it's just as hard if you're returning a record or dictionary. The
>> advantage of tuples is that they can be instantly unpacked after the
>> function call. 

> Another point where a specific example might be helpful. 
> You argument has "when X happens..." arguments in them,
> and the way I code/design, X may not happen very often.

Well, imagine you have a function which returns a year, a month
and a day.

Using by reference parameter in some imaginary language, it could
look like this:

def current_datetime(&ref_year, &ref_month, &ref_day):
    ... calculate year/month/day ...
    ref_year = year 
    ref_month = month 
    ref_day = day 

You use it like this:

year = None
month = None
day = None
current_datetime(year, month, day)
print year, month, day

But, almost certainly, if this language is dynamically typed, you can
instead do something like this:

def current_datetime():
   ... calculate year/month/day ...
   return { 'year': year, 'month': month, 'day': day }

result = current_datetime()
print result['year'], result['month'], result['day']

I myself would be inclined to do this, as it's less verbose and more
clear.

If you do that, you'll have to look at the result type anyway, which
was your objection to the even less verbose use of tuples:

def current_datetime():
    return year, month, day

year, month, day = current_datetime()
print year, month, day

[snip]
>> > Having to also check return statements is a two-stop deal.
>> > (I don't end up looking at return statements very often.)
>> 
>> The other deal is that I don't have to go look up the function definition
>> each time I see a function call I don't know about, just in case this may
>> involve reference parameters! That's a huge deal in my opinion. :)

> I guess the naming conventions I use for routines is the
> primary indicator as to whether it is mostly changing
> or using info. I tend to use prefixes like "put" or
> "change" or "move" to indicate that a lot of changing 
> is going on.

Makes sense. Anyway, it still seems more unwieldy to me, but let's
just quit this argument too. :)

[snip]
> To say that "math didn't do it" is misleading IMO. What is good
> for math may not be good for programming.

No, of course not. But I still can make my case that output parameters
were a hack introduced to be able to return multiple values by saying
originally languages didn't have them (and many still don't).  

>> (and of course in Python you can mutate mutable objects passed to a function and
>> *any* variable in Python is a reference. But it's better style to avoid
>> mutating input if possible, in my opinion. It encourages more independent
>> functions which makes for easier to maintain and debug code).

> Yip. Complex things passed in are alterable anyhow. For example,
> it makes more sense to change a large array *in place* rather
> than make a copy and pass it back out.

If the array is expected to be large, yes. I usually prefer generating
a new array, though.

> Thus, you have *two* param changing
> conventions floating around in Python.

Not exactly; it's just that all variables are references in Python. If
I pass in immutable references (integers, strings, tuples), or simply
choose not to mutate the references, it's fine. That's different
somewhat from pass-by-reference, which would allow me to do this:

a = 1
foo(a)
print a # prints 2

That's impossible. It's impossible to make 'a' point to any other object
than the one you passed in, though you may change the object. If you
call it another param changing convention, then allowing this behavior
would give the language *three* conventions. :)

[snip]
[snip]
>> > I meant industry domains, like business versus embedded systems versus 
>> > scientific computing, etc. I don't do a lot of X, Y coordinate work, BTW.
>> 
>> European example for the industry domain is a 'year/weeknumber' tuple.
>> In Europe industry often works with (ISO) weeknumbers. To calculate 
>> weeknumers back to a date (beginning of the week), you need the year as well,
>> so it can make sense to pass these around as pairs in ones application.

> I prefer to pass dates around as single strings. Formatting it for 
> different countries is a formatting (output) issue and not an internal 
> issue. IOW, the internal representation and the external do not
> have to be the same.

Of course; but I need these functions for input/output issues, right? :)

[snip more tuple stuff]
> I guess I don't see a significant net value of touples. They
> just complicate the syntax and are often used for
> stuff that can be done other ways.

That's true for most things in programming. Assembler language has a
very simple syntax. :)

[more tuples]
>> I tried to show you how this *is* common stuff. Not swapping variables,
>> but collecting a bunch of things together and passing them around as
>> a whole, and returning them as a whole, and easily separating them into
>> pieces again. It happens frequently in software.

> Not in a way that makes much use in touples. Perhaps you use
> touples the way that I use relational tables, and that is
> why my approach has less use for them.

Hm, I don't know how you use relational tables, but I'm hard pressed
to think of an example here. :)

[snip]
>> Yes, a small record can grow into a large one, and you will have to adapt
>> some code when it does (in a dynamically typed language, not a lot). 
>> There are many cases when this just doesn't happen, though; x, y coordinates
>> are an example, so are year/weeknumbers, or 'year/month/day' pairs, or
>> 'amount/currency_type' pairs, and so on.

> Well, I don't use much X and Y coordinates, and would probably
> use tables if I did, since such an app probably has lots of them.

r/g/b color triplets are another good example. A table can be rather
heavy weight if it's a persistent relational one; in graphics 
applications one doesn't need that. Also, x, y pairs are often
calculated, not stored.

> Often in table-land you pass around a record ID or record
> reference instead of the record contents. Such a
> record reference is similar in concept to a touple
> of X and Y.

Yes, though a bit harder to handle, and likely less efficient once you
need to access the fields (though that depends on implementation,
I guess). I frequently pass around record IDs as well, and of course
everything in a language like Python is a reference *already* so that
happens all the time. You're just passing around a reference to the
tuple, by the way; the tuple itself stays where it was. It may be
in a list or dictionary or wherever. 

[snip]
>> > Performance often only becomes an issue when stupid programmers play with
>> > too many features. Thus, reduce the syntax features and you have less 
>> > playing around with wasteful things and cryptic tricks.
>> 
>> The tradeoff here is that one thing often doesn't fit all. 

> But most :-)

> That is the prestine beauty of tables. They flex like nothing
> else I have ever seen in programming. I am trying to spread
> the Table Godspel. Mabye we will get our own common language
> and widely used buzzwords just like everybody else.

Oh no! So in 2010 everybody is using TOP, the Java of its day, and
I hang out in comp.table saying how OO works great and you have object
databases and so on? :)

Anyway, I'd be interested in seeing a language that tries to push the
table concept further into the syntax and semantics. It could have
some interesting features. In fact many OO languages try to gain such
features by doing things like mapping objects onto tables, etc.

Regards,

Martijn
-- 
History of the 20th Century: WW1, WW2, WW3?
No, WWW -- Could we be going in the right direction?