Comparisons and sorting of a numeric class....

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Jan 17 07:27:20 EST 2015


Andrew, sorry for the delay in responding. Your response has been extremely
long, so for the interest of brevity I've tried to trim things down to the
most pertinent points.

Andrew Robinson wrote:

[...]
> So -- From my perspective, Guido making Python go from an open ended and
> permissive use of anything goes as a return value that can handle
> metastable states

That absolutely has not changed. You seem to be mixing the concept of
*subclassing* with the concept of being able to handle multi-state
(metastable) logic. You can certainly write multi-state logic in Python.
 

> -- into to a historical version of 'logic' being 
> having *only* two values in a very puritanical sense, is rather -- well
> -- disappointing.  It makes me wonder -- what hit the fan?!  Is it
> lemmings syndrome ? a fight ? no idea....  and is there any hope of
> recovery or a work around ?

Do you have much programming experience outside of the niche of modelling
electronic hardware?

I can tell you that it is standard for general purpose programming languages
to be restricted to two-state (Boolean) comparisons. The basic decision
construct is "if...else", which is based on two states.

The only exception that I know of is the very early, and long deprecated,
Fortran "Arithmetic IF" statement:

    IF (e) label1, label2, label3

which did a three-way jump depending on whether e was less than zero, equal
to zero, or greater than zero. Some assemblers also included three-way
jumps.

All forms of N-valued logic can be reduced to Boolean logic with appropriate
comparisons. Instead of trying to come up with syntax for a three-way
branch:

if flag: print True
else: print False
otherwise: print Indeterminate

and four-way branches, a five-way branches, ... ad infinitum, it is easier
to leave it up to the programmer to build N-way logic on top of 2-way
True/False comparisons:

# equality here returns True or False
if flag == True: print True
else:
    if flag == False: print False
    else: print Indeterminate


Some programming languages add syntax to make this more friendly:

if flag == True: print True
elif flag == False: print False
else: print Indeterminate

or even:

case flag of:
    True: ...
    False: ...
    Indeterminate: ...
    Maybe: ...
    Uncertain: ...
    Could_Be: ...

(for languages with a case or switch statement).

Not only can all N-way logics be mathematically generated from Boolean
algebra, but I'll go further and say that *almost certainly* the hardware
simulation languages you are used to with 3-way or 4-way logic use 2-way
Boolean logic under the hood.

The point of this is that you need not think of Guido "taking away"
anything. It is normal for programming languages to use Boolean
comparisons.

But in fact Python is *less* restrictive than many other languages. In
Pascal, for instance, comparisons like <= or > etc. can only return true or
false, while Python allows them to return anything.


> eg: To me -- (as an engineer) undefined *IS* equivalent in useage to an
> acutal logic value, just as infinity is a floating point value that is
> returned as a 'float'.  You COULD/CAN separate the two values from each
> other -- but always with penalties.  They generally share an OOP 'is'
> relationship with respect to how and when they are used. (inf) 'IS' a
> float value and -- uncertain -- 'IS' a logic value.

That's fine, we have no dispute with the need for multi-state logics. But
*which* multi-state logic? There are *at least* three useful ternary
logics: Kleene logic, Łukasiewicz logic, Bochvar logic. And then there is
Belnap's four-valued relevance logic (True, False, Both True and False,
Neither True nor False), and an infinite number of other possible logics.
Which should the programming language support?

In a sense, any Turing Complete programming language supports *all of them*.
All you need to do is program the details yourself. But in the sense of
which multi-state logic should the language itself include, I think that
the answer is ... none of them. Most programming languages have no need to
include this complexity, just as most languages don't include tensors as a
language primitive.

I guess this is just a long-winded way of me saying that HDLs are
effectively a "Domain Specific Language" for hardware, and can *and should*
contain specialist semantics needed by hardware designers, e.g. 4-way
logic. But Python is a general purpose language, and while you *can*
program your own domain specific features, you have to do so within the
constraints of the needs of a general-purpose language. In Python's case,
those constraints are actually remarkably loose (not as loose as languages
like Lisp or Forth, where you can literally redesign the interpreter on the
fly!) but they do exist.

[...]
> So -- let's look at the examples you gave:
> 
>> - don't use True and False at all, create your own multi-valued
>>    truth values ReallyTrue, MaybeTrue, SittingOnTheFence, ProbablyFalse,
>>    CertainlyFalse (or whatever names you choose to give them);
>>
> OK.  So -- what do I think about when I see your suggestion:
> 
> First I need to note where my booleans come from -- although I've never
> called it multi-valued logic... so jargon drift is an issue... though
> you're not wrong, please note the idea of muti-value is mildly misleading.

I accept that jargon is an issue, but I believe that "N-way logic"
and "multi-value logic" is, if not standard, at least something that we can
agree to use. The use of "boolean" to describe anything with more than two
states is just weird.

I don't understand what you mean by "note where my booleans come from".



> The return values I'm concerned about come from a decimal value after a
> comparison with another decimal value.
> eg:
> 
> a = magicFloat( '2.15623423423(1)' )
> b = magicFloat('3()')
> 
> myTruthObject = a>b
> 
> Then I look at python development historically and look at the built in
> class's return values for compares; and I notice; they have over time
> become more and more tied to the 'type' bool.  I expect sometime in the
> future that python may implement an actual type check on all comparison
> operators so they can not be used to return anything but a bool.

That will never happen. Comparison operators are explicitly documented as
being allowed to return anything that you wish. Changing that will break so
much code, including important, high-profile libraries like numpy, that it
is never going to occur even if somebody wanted it to.


> (eg: 
> I already noticed a type check on the return value of len() so that I
> can't return infinity, even when a method clearly is returning an
> infinitely long iterator

Yes, that's a known issue, and you're not the first to notice it. That's
partly due to history, and partly to disagreements about API design.


> Next, I notice that for compatibility it *is* very desirable that I use
> the existing '>' operator,  because programmers generally want to be
> able to use '>' when they are testing greater than -- and in legacy code
> I expect people have exclusively done so -- and I know from past
> experience that programmers in general will not be happy with typing
> 'a.greaterThan(b)' religiously. ( Extend my reasoning to all other
> comparison operators.)

Don't worry, customising comparison operators for your own classes is fully
supported.


[...]
> So:  In general, the most desirable return type is determined by what
> python actually returns for normal comparison operations; eg: apparently
> a bool -- but with some way of signaling (if the user cares) that more
> precise information is available as to why a value is False if it is
> False.

Forget the "apparently a bool" part. Comparison operators can return
anything you like. If you have a good reason to return a bool, feel free,
but you are not required too.

You could, should you so choose, implement a complete fuzzy logic system:

py> class C(object):
...     def __eq__(self, other):
...             return 0.5
...
py> c = C()
py> c == 23
0.5

Although you cannot attach extra information to built-in floats like 0.5,
you can certainly create your own classes and attach extra information to
their instances.



> Unfortunately, the '>', '<', '==', and other operators have no way of
> returning additional information on their own; 

They can return any value you like. Perhaps the biggest, most important
third-party library in the Python ecosystem uses this extensively:


py> import numpy
py> a = numpy.array([1, 2, 3, 4, 5, 6])
py> b = numpy.array([2, 1, 3, 6, 5, 4])
py> a < b
array([ True, False, False,  True, False, False], dtype=bool)
py> a == b
array([False, False,  True, False,  True, False], dtype=bool)

If numpy can return an array, you can return a MaybeTrue or ProbablyFalse
object, with whatever diagnostic information you require.

(You will also note that numpy arrays don't inherit from bool.)


> So, your first alternative is the most at risk of future problems due to
> constraints likely to be placed on comparison return types ; 

"Most at risk" in this case equals zero risk. Comparison operators will not
be restricted to only returning bools. That will never happen.


>> - write a class to handle the PossiblyTrue and PossiblyFalse cases,
>>    and use True and False for the True and False cases;
>
> I very much would want to do as you state here because it would preserve
> both True and False unaltered --- which would ALWAYS work in legacy
> code; but I don't know how to do it safely.
> 
> Although I can use True for absolute Truth -- I can not use False as
> absolute False without inviting confusion as to when to allow advanced
> compares.
>  
> When I do a comparison on any False < False, in legacy code -- it needs
> to return False.

That's trivial. Give your Maybe objects __lt__ methods that return False
when compared to False. Here is a stub to get you started:


    def __lt__(self, other):
        if other is False:
            return False
        elif other == ... 



> But, when looking at uncertainty values, if totally False 'is' the same
> as base type False -- then the issue arises that a comparison False <
> AnyOtherPartTrueFalse   needs to be False for legacy compares but True
> for advanced compares;

What's a "legacy compare"? What's "advanced compares"? It seems to me that
you are inventing difficulties with Python that don't exist, and then
creating complex solutions to work around the non-existent problem.

I'm going to take a guess here... 

You have "fuzzy decimals", and when you compare two fuzzy decimals, you want
to return something like "equal", "completely less than", "partially less
than" and so on. For simplicity, I'm only going to discuss the less than
operator, but the same could apply to any comparison operator:

a < b can return:

True -- means that a certainly is less than b

False -- means that a certainly is not less than b

MostlyTrue -- means that a is probably less than b

MostlyFalse -- means that a is probably not less than b


So *fuzzy decimals* have to compare by returning four (or three, or five, or
however many you choose) tetra-logic (tri-logic, quin-logic...) values. To
implement this, you define a __lt__ method on the fuzzy decimal class,
together with the other comparison operators.

But the tetra-logic values themselves shouldn't compare fuzzily with each
other. You should define a fixed order for them, most likely:

False < MostlyFalse < MostlyTrue < True


and have comparison operators on the tetra-logic values themselves return
True or False:

False < True --> returns True
MostlyFalse < True --> returns True


However, should you disregard my advice and decide that MostlyFalse is only
*mostly* larger than False, but is also *slightly* equal to False, you can
certainly have the tetra-logic objects return *themselves* when compared. I
advise against it, not because it is hard to do in Python, but because I
think it will be hard for people to understand. But it can certainly be
done, by defining a __lt__ method on the tetra-logic class itself.


The only thing you can't do in Python is to override the boolean operators
`or` and `and` to implement three-value or four-value logic, e.g. you
cannot have this:

    MostlyFalse and True --> MostlyTrue  # or whatever rule you prefer

There have been proposals to allow overriding `and` and `or`, but so far
they have all run into difficulties with semantics, backwards
compatibility, efficiency, or all three.

I can only suggest that you accept that the `and` and `or` operators will
implicitly coerce their arguments into boolean True/False, and if you want
fuzzy operators, define some functions which do what you want:

def and_(a, b): ...
def or_(a, b): ...

A small inconvenience.


> It's inconsistent and I have no way of detecting where the False I am
> comparing with to make a proper decision.

I don't understand why you think you need to detect where False is. Where in
what sense? Objects in Python don't have a spacial location.


> So:  The only solution I see is to assume that whenever a uncertainty is
> compared against a legacy bool -- that the legacy style of comparison is
> absolutely required for safety; and a second version of False must be
> defined to detect when the compare needs to take uncertainty into account.

I don't understand this.


[...]
>> Industry practice, in my experience, is that there is one and only one
>> case where you can subclass singleton classes: when you have a factory
>> which chooses at runtime which subclass to instantiate, after which you
>> can no longer instantiate any of the other subclasses.
>
> OK.
> Well, I'll just say that I believe you -- and I'm not really sure what
> you're objecting to in what I said -- but if a singleton subclass /
> factory existed for my purpose -- I would be happy to choose it at
> runtime just like your maze guys do...!  If Guido would do that... he
> would give me a subtype of bool and that would be very nice indeed.

You cannot choose True and False at runtime because they are required by the
language. You cannot say "I don't wish to use True and False as my bools, I
wish to instantiate this alternate class instead and use CouldBe and
BuckleysChance as the two bools." That cannot work because long before your
code gets to run and create the subclass, Python has already instantiated
and used True and False. Functions are documented to return True or False,
not "some arbitrary bool subclass instances". The interpreter *cannot work*
without two known values for True and False, so replacing them at runtime
is impossible.

Could Python have been designed differently? Well yes, of course it could
have. If it were designed to be like PHP or Smalltalk or C++ then it would
have been just like PHP or Smalltalk or C++, but it wasn't. Python has the
design it has now, and some changes (like allowing the user to select which
two bools are used at runtime) are simply too extensive a change to be
practical. A bit like changing a three-story house built on a concrete slab
into a two story house on wooden foundations with a deep cellar, without
tearing the house down and starting again from scratch.



> But dreams aside -- I still note your admission shows that industry does
> allow subclassing of singletons even if it requires the owner of the
> singleton (Guido) to allow the subtypes.

And you are free to allow subclassing of your own singletons. Feel free to
allow an million instances of your singleton class, Python won't stop you
from shooting yourself in the foot if it is your own class.


[...]
>> I don't think you can. I think you are engaged on a fool's errand, trying
>> to do something impossible *even if subclassing bool were allowed*. But I
>> could be wrong. I just don't think you can possibly write code which is
>> backwards-compatible with code that expects bools while still extending
>> it. People are so prone to write:
>>
>>      if flag is True: ...
>>      if flag is False: ...
>
> D'Aprano -- I think your making what is known as a straw man argument.

I am sorry if I have misunderstood your position, but believe me, I am
responding to what I understand your position to be as accurately and
fairly as I can. But honestly Andrew, I believe that either your design is
confused and confusing, or you have failed to explain it well enough for me
to see that it is not confused.


> Refer back to your earlier suggestion of re-using True and False to
> represent themselves, and some other type to represent the intermediate
> metastates;  From your remark here, I surmise that you must have already
> figured out that the alternative you gave me was never meant to work --
> otherwise it would solve the very problem you now present me with for
> any case where my numbers are identical in meaning with legacy numbers

This is exactly the sort of confusion I'm talking about.

Logically, your numbers (I presume you are talking about the fuzzy decimals
you mentioned earlier, with some sort of uncertainty or error-bars
attached) cannot possibly be identical in meaning to legacy numbers (i.e.
regular decimals).

There is no logical sense that "number plus or minus uncertainty" can be
*identical* to "number alone", even if the uncertainty is zero. Because the
fuzzy decimals will have an uncertainty but the regular decimals will not.
If you have methods for dealing with that uncertainty, regular decimals
won't have those methods. If they have an "uncertainty" attribute, regular
decimals won't have one. If they print in such a way to show that
uncertainty, regular decimals will not.

If they have *identical* behaviour to regular decimals, then they merely
*are* regular decimals with a different (and likely slower or buggy)
implementation.

None of this is to say that you cannot have fuzzy decimals interoperate with
regular decimals, according to some rules which define how they should
combine. E.g. by treating regular numbers as exact, or giving them some
default uncertainty.


[...]
>>> or the code base used for solving
>>> large numbers of simultaneous logic equations with uncertainty included
>>> -- which have universally refined the boolean logic meanings found in
>>> "Truth" tables having clearly more than two values -- but don't take my
>>> word for it -- look in any digital electronics data book, and there they
>>> will be more than two states; marked with rising edges, falling edges,
>>> X's for don't cares, and so forth.
>>
>> And those truth tables are not part of Boolean algebra.
>
> Oh wow!!!! I never expected to hear that --  But I guess you were never
> trained to do boolean algebra, formally ? Or did you mean something else ?
> 
> http://en.wikipedia.org/wiki/Truth_table
> 
> The truth tables on data sheets are VERY VERY much intended to be
> related to boolean logic.

*Related to* boolean logic, but not *of* boolean logic. It is a
generalisation of boolean logic to more than two states, in a similar way
that complex numbers (3+2j where j = sqrt -1) are extended from the real
numbers. As an electrical engineer, I expect you are familiar with complex
numbers, and I hope that you would also understand that complex numbers are
not part of the Reals.

In any case, while the use of 3- and 4-valued logic is all very interesting
and important to electrical engineering, it is very much domain-specific
and is not so useful for a general purpose programming language. People
have difficulty enough dealing with SQL NULLs and floating point NANs.

You can write your own multi-valued logic in Python, just don't expect
Python to support it natively. Python is not a HDL.



-- 
Steven




More information about the Python-list mailing list