Comparisons and sorting of a numeric class....

Thu Jan 15 20:45:22 EST 2015

On 01/15/2015 12:41 AM, Steven D'Aprano wrote:
> On Wed, 14 Jan 2015 23:23:54 -0800, Andrew Robinson wrote:
>
> [...]
>> A subclass is generally backward compatible in any event -- as it is
>> built upon a class, so that one can almost always revert to the base
>> class's meaning when desired -- but subclassing allows extended meanings
>> to be carried.  eg: A subclass of bool is a bool -- but it can be MORE
>> than a bool in many ways.
> You don't have to explain the benefits of subclassing here.
>
> I'm still trying to understand why you think you *need* to use a bool
> subclass. I can think of multiple alternatives:
>
> - don't use True and False at all, create your own multi-valued
>    truth values ReallyTrue, MaybeTrue, SittingOnTheFence, ProbablyFalse,
>    CertainlyFalse (or whatever names you choose to give them);
>
> - use delegation to proxy True and False;
>
> - write a class to handle the PossiblyTrue and PossiblyFalse cases,
>    and use True and False for the True and False cases;
>
> There may be other alternatives, but what problem are you solving that
> you think
>
>      class MyBool(bool): ...
>
> is the only solution?
That's a unfair question that has multiple overlapping answers.
Especially since I never said subclassing bool is the 'only' solution; I 
have indicated it's a far better solution than many.

So -- I'll just walk you through my thought processes and you will see 
what I consider problems:

Start with the concept that as an engineer, I have spent well over 
twenty years on and off dealing with boolean values that are very often 
mixed indistinguishably with 'don't care' or 'tri-state' or 'metastable 
states'.   A metastable state *is* going to be True or False once the 
metastability resolves by some condition of measurement/timing/etc.; but 
that value can not be known in advance.   eg: similar to the idea that 
there is early and late binding in programming.... Sometimes there is a 
very good reason to delay making a final decision until the last 
possible moment; and it is good to have a default value defined if no 
decision is made at all.

So -- From my perspective, Guido making Python go from an open ended and 
permissive use of anything goes as a return value that can handle 
metastable states -- into to a historical version of 'logic' being 
having *only* two values in a very puritanical sense, is rather -- well 
-- disappointing.  It makes me wonder -- what hit the fan?!  Is it 
lemmings syndrome ? a fight ? no idea....  and is there any hope of 
recovery or a work around ?

eg: To me -- (as an engineer) undefined *IS* equivalent in useage to an 
acutal logic value, just as infinity is a floating point value that is 
returned as a 'float'.  You COULD/CAN separate the two values from each 
other -- but always with penalties.  They generally share an OOP 'is' 
relationship with respect to how and when they are used. (inf) 'IS' a 
float value and -- uncertain -- 'IS' a logic value.

That is why I automatically thought before I ever started writing on 
this list (and you are challenging me to change...) -- that 'uncertain' 
should share the same type (or at least subtype) as Bool.  
Mathematicians can argue all they want that 'infinity' is not a float 
value, and uncertain is not a True or False.  And they are/will be 
technically right -- But as a practical matter -- I think programmers 
have demonstrated over the years that good code can handle 'infinity' 
most efficiently by considering it a value rather than an exception.  
And I think the same kind of considerations very very likely apply to 
Truth values returned from comparisons found in statistics, quantum 
mechanics, computer logic design, and several other fields that I am 
less familiar with.

So -- let's look at the examples you gave:

> - don't use True and False at all, create your own multi-valued
>    truth values ReallyTrue, MaybeTrue, SittingOnTheFence, ProbablyFalse,
>    CertainlyFalse (or whatever names you choose to give them);
>
OK.  So -- what do I think about when I see your suggestion:

First I need to note where my booleans come from -- although I've never 
called it multi-valued logic... so jargon drift is an issue... though 
you're not wrong, please note the idea of muti-value is mildly misleading.

The return values I'm concerned about come from a decimal value after a 
comparison with another decimal value.
eg:

a = magicFloat( '2.15623423423(1)' )
b = magicFloat('3()')

myTruthObject = a>b

Then I look at python development historically and look at the built in 
class's return values for compares; and I notice; they have over time 
become more and more tied to the 'type' bool.  I expect sometime in the 
future that python may implement an actual type check on all comparison 
operators so they can not be used to return anything but a bool.  (eg:  
I already noticed a type check on the return value of len() so that I 
can't return infinity, even when a method clearly is returning an 
infinitely long iterator -- such as a method computing PI dynamically.  
That suggests to me that there is significant risk in python of having 
type checking on all __xx__ methods in the future. )  This inspection is 
what points me foremost to saying that very likely, I am going to want a 
bool or subtye of it (if possible) as a return type as self defense 
against future changes in Python -- although at present, I can still get 
away with returning other types if bool turns out to be impossible.

Next, I notice that for compatibility it *is* very desirable that I use 
the existing '>' operator,  because programmers generally want to be 
able to use '>' when they are testing greater than -- and in legacy code 
I expect people have exclusively done so -- and I know from past 
experience that programmers in general will not be happy with typing 
'a.greaterThan(b)' religiously. ( Extend my reasoning to all other 
comparison operators.)

It would be worse to use '>' and have it trigger an exception when a 
non-bool is encountered to force the programmer to attend to special 
metastable states differently; because then the programmer has to write 
a compete set of secondary handling routines or a 'try' statement around 
a very large number of lines of code and that makes for legacy code 
rewriting rather than minor upgrading...

It would also be bad to have my code have modal settings because I don't 
want to bother with thread information or have programmers consider 
thread issues unless it's a last resort; although that's the approach 
used by the Decimal class and other examples I have seen.

So:  In general, the most desirable return type is determined by what 
python actually returns for normal comparison operations; eg: apparently 
a bool -- but with some way of signaling (if the user cares) that more 
precise information is available as to why a value is False if it is False.

Unfortunately, the '>', '<', '==', and other operators have no way of 
returning additional information on their own; so again, a second 
(undesirable) function/method would need to be invoked to overcome the 
limitation if only a strict bool is allowed as a return type; and that 
means con-concomitant issues of storage and wasted re-computation and 
threads.

So, your first alternative is the most at risk of future problems due to 
constraints likely to be placed on comparison return types ; and as I 
don't want to do much maintenance on my library in the future -- I don't 
think that is a very good choice for making my library with.

> - use delegation to proxy True and False;
That sounds like a far more likely to succeed alternative, and is one of 
a handful of alternatives I have been exploring on my own.

Proxies allow detection of an actual deterministic False vs. a default 
False.  So a proxy's id() can signal to a user when it is possible to 
upgrade a False to True should they care.   Therefore -- If Guido would 
see fit to permanently allow proxied True and False values to be 
returned in lieu OF an actual True and False value, then this would be a 
near ideal alternative.  But Python does not implement a general purpose 
proxy that I know of ...

I have gotten single instance of a class acting as a proxy to mostly 
work; and I have gotten isinstance( myTruthValue, bool ) to return True 
for the proxy object -- which is not a bool itself.  However, when I 
attempt multiple instances of the proxy -- it becomes more difficult.  I 
think a pure python implementation might be possible -- and I'll 
continue to try for a while -- but python may not be able to do it 
totally from the python side because there is a difference in how Python 
handles type() checks and isinstance() checks.

> - write a class to handle the PossiblyTrue and PossiblyFalse cases,
>    and use True and False for the True and False cases;
I very much would want to do as you state here because it would preserve 
both True and False unaltered --- which would ALWAYS work in legacy 
code; but I don't know how to do it safely.

Although I can use True for absolute Truth -- I can not use False as 
absolute False without inviting confusion as to when to allow advanced 
compares.

When I do a comparison on any False < False, in legacy code -- it needs 
to return False.

But, when looking at uncertainty values, if totally False 'is' the same 
as base type False -- then the issue arises that a comparison False < 
AnyOtherPartTrueFalse   needs to be False for legacy compares but True 
for advanced compares;

It's inconsistent and I have no way of detecting where the False I am 
comparing with to make a proper decision.

So:  The only solution I see is to assume that whenever a uncertainty is 
compared against a legacy bool -- that the legacy style of comparison is 
absolutely required for safety; and a second version of False must be 
defined to detect when the compare needs to take uncertainty into account.

All of these issues are handled correctly in the example tuple class I 
already showed.  So the tuple class I showed is presently the best 
solution with the most compatability that I have found so far.
>
>
>> One example: It can also be a union.
> I don't understand what you think this means. I know what *I* think it
> means, but "subclass = union" doesn't make sense to me, so wonder what
> you think it means.

It's a fringe use-case in Python that I don't think many people use/know 
about.  I was just being thorough in listing it.

I haven't seen it used in actual python code myself -- but I know from 
the literature I've read on Python that it is occasionally used in a 
manner analogous to that of C/C++.

In C/C++ unions are a datatype that allow two (or more) different types 
of data to be defined as held by one object, but only one of them is 
allowed to be initialized at a time because their location in computer 
memory which overlaps.  C places no restrictions on the compatibility of 
the datatypes -- which is different than Python, but Python has a 
similar ability.

In Python, when multiple inheritance is invoked -- it does some kind of 
check on the base types for compatibility; but still appears to be able 
/ or simply does overlap the allocated memory for the different base 
types; eg: according to several sources I have read (at least on 
C-python internals).

So one can semantically instantiate a subclass of one subtype without 
semantically instantiating the other.

ALl I know about it is what I have seen it in Python literature -- and I 
tested the examples I was shown to see if they still work, and they do 
-- and noted that at least at one time Guido apparently thought it was a 
good idea ; but I haven't pursued it beyond that.

>> So when Guido chose to cut off
>> subclassing -- his decision had a wider impact than just the one he
>> mentioned; eg: extra *instances* of True and False.... as if he were
>> trying to save memory or something.
> *shrug* well maybe he was.
:) LOL.  I don't have any real idea.... but it would be useful to know 
for sure.

>
>> The reason Guido's action puzzles me is twofold -- first it has been
>> standard industry practice to subclass singleton  (or n-ton) objects to
>> expand their meaning in new contexts,
> I dispute that. I *strongly* dispute that.
>
> Industry practice, in my experience, is that there is one and only one
> case where you can subclass singleton classes: when you have a factory
> which chooses at runtime which subclass to instantiate, after which you
> can no longer instantiate any of the other subclasses.
OK.
Well, I'll just say that I believe you -- and I'm not really sure what 
you're objecting to in what I said -- but if a singleton subclass / 
factory existed for my purpose -- I would be happy to choose it at 
runtime just like your maze guys do...!  If Guido would do that... he 
would give me a subtype of bool and that would be very nice indeed.

But dreams aside -- I still note your admission shows that industry does 
allow subclassing of singletons even if it requires the owner of the 
singleton (Guido) to allow the subtypes.

Cf: Design Patters, Elements of Reusable Object Oriented Software ( 
Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides ) pp. 127 
"Chapter, Singleton -- Object Creational"
--------------------------------------------------------

-- Applicability:
Use the singleton pattern when:
-- There must be exactly one instance of a class, and it must be 
accessible to clients from a well known access point.
-- When the sole instance should be extensible by subclassing, and 
clients should be able to use an extended instance without modifying 
their code.

...

Consequences:
The singleton pattern has several benefits:
...
4. Permits a variable number of instances.   The pattern makes it easy 
to change your mind and allow more than one instance of the Singleton 
Class.  Moreover, you can use the same approach to control the number of 
instances that the application uses.  Only the operation that grants 
access to the singleton instance needs change.

...

Implementation:
2. Subclassing the Singleton Class:  The main issue is not so much 
defining the subclass but installing it's unique instance so that 
clients will be able to use it.
...
A more flexible approach uses a registry of singletons.  Instead of 
having Instance define the set of possible Singleton Classes, The 
Singleton classes can register their singleton instance by name in a 
well known registry.
...
Of course, the constructor won't get called unless someone instantiates 
the class, which echoes the problem the Singleton is trying to solve.
---------------------------------------------------------

> Why are we limited to a single Maze? Making Maze a singleton in the first
> place was a bad idea. The singleton design pattern is *highly* over-used
> and abused. But that is another story. Let's just assume that the
> designer has good reason to insist that there be only one Maze, perhaps
> it uses some resource which truly is limited to there being one only. If
> that is the case, then allowing the caller to break that invariant by
> subclassing will just lead to horrible bugs.
Right -- when a user does not know the reason for a singleton ; breaking 
it is just ASKING for bugs.  I agree.  That's why I have been asking 
about why Guido did it...  there are times to avoid breaking the rules, 
and times to crush them.

>   By using a factory and
> controlling access to the subclasses, Maze can keep the singleton
> invariant and allow subclasses.
>
> This is not relevant to bool, since True and False are already
> instantiated.

There's nothing stopping Guido from making it relevant...
>
> [...]
>> In general -- it's not the goal of subclassing to create more instances
>> of the base types
> That might not be the goal, but it is the effect. When you instantiate
> the subclass, by definition the instances are also instances of the base
> classes.
All right -- I can agree to that and will concede that point -- as I 
don't see much purpose in pursuing it further as I suspect (without 
proof) that Guido might not like that extra instances of class 
definitions that I might use as a work-around... although I don't really 
know why it's so important to him.
>> -- but rather to refine meaning in a way that can be
>> automatically reverted to the base class value (when appropriate) and to
>> signal to users that the type can be passed to functions that require a
>> bool because of backward compatibility.
> And I am wondering what you think you can extend bools to perform that is
> completely backwards compatible to code that requires bools?
I've never said 'completely compatible', and have been very careful not 
to make extremest remarks.
I want to get as close as I can to fully backward compatible -- and am 
willing to put some time into it rather than taking the first solution 
that vaguely works...

> I don't think you can. I think you are engaged on a fool's errand, trying
> to do something impossible *even if subclassing bool were allowed*. But I
> could be wrong. I just don't think you can possibly write code which is
> backwards-compatible with code that expects bools while still extending
> it. People are so prone to write:
>
>      if flag is True: ...
>      if flag is False: ...
D'Aprano -- I think your making what is known as a straw man argument.

Refer back to your earlier suggestion of re-using True and False to 
represent themselves, and some other type to represent the intermediate 
metastates;  From your remark here, I surmise that you must have already 
figured out that the alternative you gave me was never meant to work -- 
otherwise it would solve the very problem you now present me with for 
any case where my numbers are identical in meaning with legacy numbers 
-- eg: it *would* work perfectly for any truly legacy application.  The 
failures -- would show up with new applications or non legacy data which 
could erroneously trigger a legacy compare when it ought not do so 
because you can not get the new types returned unless non-legacy data 
has been encountered.
> (which is naughty of them, but what are you going to do?)
>
Nothing, except hope that the people who wrote Python itself didn't do 
anything naughty in sort() min() and max() and friends.  So far my tests 
show that they didn't.  But -- you're right -- Non core language 
implementation programmers, are going to have occasional bugs that 
either they or I will have to hunt down, depending on who it is that 
needs their software to work with my library.
> C doesn't have instances because it doesn't have objects. I'm not
> certain, but I don't think the other languages you refer to are object-
> oriented either. Verilog is a structured programming language, Silos is a
> Verilog simulator, and I think VHDL and HDL are versions of Verilog (that
> is, I've only seen them written as "Verilog-VHDL" and "Verilog-HDL").
OOP programming in C is not done using formal class keywords, etc, but 
it is done by defining structs and compiler modules and pointers to 
functions;   So -- C -- doesn't have the security measures that a C++ 
compiler implements for OOP ( 'private' ,'protected' ); but OOP can 
still be done in C including inheritance.  'C' most certainly does have 
instances and singletons.  Several packages available under GPL, such as 
the GTK widget set, are implemented in strict C (not C++) and as full 
object oriented packages, then another optional package can be compiled 
if C++ bindings to the objects are desired.

Verilog is the originator of the language family I mentioned, yes; and 
they are all variations on a theme -- but there are versions of HDL's by 
other companies, and the US government;  Most are based on C syntax, 
some are based on ADA, and other languages that engineers happen to like 
for various applications; etc.  I mentioned only the most used versions.

> In any case, Verilog *by design* uses four-state logic, modelling 1, 0,
> floating, undefined. It is not a bool, since *by definition* bools model
> only two states.

Not quite, verilog is meant to handle two state logic. AKA: Binary bit 
or Boolean, and to also work with  metastable data; eg: In electronics, 
floating or unknown or oscillating or frozen between states for a period 
of time while settling are traditionally called metastable.  I am not 
sure if this is a mathematican's definition, or if it's because these 
quasi-states were defined with/after (meta) the two stable ones.  It's 
something I will have to check. But I remember from my early college 
courses that it is technically wrong to call them all states, even 
though 'don't care' is often referred to as the tri-state.

In any event -- Your comment about verilog still just demonstrates that 
Guido has downgraded python's return types into a more more primitive 
system than is warranted by the history of the creation of the 
computer.  Verilog (1984) existed before Python (1989) so *even* 
verilog's conventions predate python's.  And I don't even remember when 
HiLo used to be around but I'm sure it's older than verilog. So -- from 
the very beginning of Python, design logic for boolean systems has 
*always* included meta-state information with boolean values.

>> The third value is usually called "TRI-state" or "don't care". (Though
>> its sometimes a misnomer which means -- don't know, but do care.)
> And SQL has NULL, which makes it an example of tri-state logic. (To be
> precise, SQL uses a version of Kleene K3 logic.)
OK.  I agree -- it does.
>
> [snip description of modelling hardware circuits]
> All very interesting, but completely irrelevant to the question of
> subclassing bool.

No, not really -- but I'll respect your difference of opinion.
I'm getting the message that the reason Guido though this was important 
was because the historical meaning of bool is more important than the 
idea that metastability is an issue to be dealt with at the same time as 
the data value -- like electrical engineers do regularly.

>
>> We've discovered that we live in a quantum-mechanical universe -- yet
>> people still don't grasp the pragmatic issue that basic logic can be
>> indeterminate at least some of the time ?!
> Of course they do. My first post to you in this thread suggested that
> before you start re-inventing the wheel you look at prior art in the
> multi-value logic field.
Did you ? -- Did I reply to that e-mail?  I'm not sure I read it...
But the word is different from what I am used to -- eg: meta-stable 
logic 'states' ... ?

Now that I'm looking up words -- I see that wikipedia is calling the 
indeterminate states 'multi-value'; I'm getting old... I am used to the 
term metastable; not multi-value.  Weird.  Jargon problems...

Even so -- I seriously don't think of Quantum mechanics as multi-value ; 
it's uncertain and 'collapses' to a definite value when measured.  I can 
understand your intention now... I'll have to go back and search for the 
old email. My apology.

>
>
>> The name 'boolean logic' has never been re-named in honor of the many
>> people who developed the advancements in computers -- including things
>> like data sheets for electronic parts,
> Are you really suggesting that the name of Boolean Logic should be
> renamed away from the person who invented the field and instead named
> after the person who first wrote down a list of electronic part numbers
> and their specifications?
Nope.
Though I DO want to point out that Charles Bool did not invent the 
computer, build the microprocessor, or any of the things which would 
give a logical reason why his more archaic *usage* is given preference 
over the useage preferred by the very people who DID invent the 
microprocessor, computer, and programming languages.

Name recognition is great for honoring a man -- but makes for a poor 
reason to choose a strict implementation of bool.

>
>> or the code base used for solving
>> large numbers of simultaneous logic equations with uncertainty included
>> -- which have universally refined the boolean logic meanings found in
>> "Truth" tables having clearly more than two values -- but don't take my
>> word for it -- look in any digital electronics data book, and there they
>> will be more than two states; marked with rising edges, falling edges,
>> X's for don't cares, and so forth.
> And those truth tables are not part of Boolean algebra.
Oh wow!!!! I never expected to hear that --  But I guess you were never 
trained to do boolean algebra, formally ? Or did you mean something else ?

http://en.wikipedia.org/wiki/Truth_table

The truth tables on data sheets are VERY VERY much intended to be 
related to boolean logic.
Electronic engineers routinely put the words "Truth table" on datasheets 
where the boolean information is recorded (and I'm sure even on relay 
logic prior to the vaccum tube) but still add  x's because *as 
inventors* they knew Bool's usage wasn't enough to convey ideas 
efficiently and fully.

It's pragmatism over rigid formalism left over from an age where the 
computer as we have it was not even conceived.

http://www.eleccircuit.com/cd4027b_datasheet-of-dual-j-k-flip-flop/

> [...]
>> As I said to D'Aprano -- even a *cursory* examination (eg: as in not
>> detailed) shows I could do things which he wasn't considering.
> Andrew, I think you will be surprised at what I have considered. If you
> search the archives, you will find that (by memory) a decade ago I had
> considered using classes without instantiating them.
No surprise.  I know from reading your work that you have been doing 
programming a long time, and have fairly well substantiated / reasonable 
opinions even if I disagree with some of them as trying to overemphasize 
to a fault a definition which has never been honored in the past by 
those who used it most.

> The questions I have about your strategy is not what can be done in
> Python, but how you think these things you want to do will solve the
> problem you apparently have?
>
> To give an analogy... I have no doubt that you can build an television.
> But I question how building a television solves your problem of
> transporting a goat, a wolf and a cabbage across the river.
Ask away.  I already have one solution that works reasonably well, the 
tuple rich compare;
So it's not like I don't have a solution -- it's just that I'm not sure 
that I can't do better.

If Python had never added the bool definition to the language, I 
wouldn't even have to bother with any of this supposed 'fools' errand 
nuisance in the first place... but I'll make the best of it.

> [...]
>> I don't have a setting on my email to turn off html.  Sorry. Can't help.
> You are using Thunderbird. You certainly do have such a setting.
It's nice to know that you read and believe what you see in an email header.
Note: Headers are sometimes modified by sysadmins who actually care 
about security.

PPS: If there is a way to turn off HTML in this email program -- it is 
not obvious -- and I have looked.
I've done my best not to push any HTML enhancement buttons...