[Tutor] SENTINEL, & more

Sat May 29 10:29:43 CEST 2010

Hello,

from the thread: "class methods: using class vars as args?"

On Sat, 29 May 2010 11:01:10 +1000
Steven D'Aprano <steve at pearwood.info> wrote:

> On Fri, 28 May 2010 07:42:30 am Alex Hall wrote:
> > Thanks for all the explanations, everyone. This does make sense, and
> > I am now using the
> > if(arg==None): arg=self.arg
> > idea. It only adds a couple lines, and is, if anything, more explicit
> > than what I was doing before.
> 
> You should use "if arg is None" rather than an equality test.
> 
> In this case, you are using None as a sentinel value. That is, you want 
> your test to pass only if you actually receive None as an argument, not 
> merely something that is equal to None.
> 
> Using "arg is None" as the test clearly indicates your intention:
> 
> The value None, and no other value, is the sentinel triggering special 
> behaviour
> 
> while the equality test is potentially subject to false positives, e.g. 
> if somebody calls your code but passes it something like this:
> 
> class EqualsEverything:
>     def __eq__(self, other):
>         return True
> 
> instead of None.

I'll try to clarify the purpose and use of sentinels with an example. Please, advanced programmers correct me. A point is that, in languages like python, sentinels are under-used, because everybody tends to une None instead, or as all-purpose sentinel.

Imagine you're designing a kind of database of books; with a user interface to enter new data. What happens when an author is unknown? A proper way, I guess, to cope with this case, is to define a sentinel object, eg:
    UNKNOWN_AUTHOR = Object()
There are many ways to define a sentinel; one could have defined "=0" or "=False" or whatever. But this choice is simple, clear, and secure because a custom object in python will only compare equal to itself -- by default. Sentinels are commonly written upercase because they are constant, predefined, elements.

Say, when users have to deal with an unknown author, they press a special button or enter a special valuen, eg '*', the software then silently converts to UNKNOWN_AUTHOR. Now, cases of unknown authors all are *marked* with the same mark UNKNOWN_AUTHOR; this mark only happens in this very case, thus only means this. In other words, this is a clear & safe *semantic* mark.

Later, when the application operates on data, it can compare the value stored in the "author" field, to catch the special mark case UNKNOWN_AUTHOR. Eg

class Book(Object):
    ...
    AUTHOR_DEFAULT_TEXT = "<unknown>"
    def write(self):
        ...
        if self.author is UNKNOWN_AUTHOR:
            author_text = Book.AUTHOR_DEFAULT_TEXT
            ...

Hope I'm clear. In the very case of UNKNOWN_AUTHOR, it would hardly have any consequence to use "==", instead of "is", as relational operator for comparison. Because, as said above, by default, custom objects only compare equal to themselves in python. But
* This default behaviour can be overriden, as shown by Steven above.
* Using "is" clarifies your intent to the reader, including yourself.
* Not all languages make a difference between "==" and "is". (Actually, very few do it.) Good habits...

=== additional stuff -- more personal reflexion -- critics welcome ===

Sentinels belong to a wider category of programming elements, or objects, I call "marks". (Conventional term for this notion welcome.) Marks are elements that play a role in a programmer's model, but have no value. What is the value of NOVICE_MODE for a game? of the SPADE card suit? of the character 'ø'? These are notions, meaning semantic values, that must exist in an application but have no "natural" value -- since they are not values semantically, unlike a position or a color.
In C, on could use a preprocessor flag for this:
   #define NOVICE_MODE
   ...
   #ifdef NOVICE_MODE ... #endif
NOVICE_MODE is here like a value-less symbol in the program: precisely what we mean. But not all languages have such features. (Indeed, there is a value behind the scene, but it is not accessible to the programmer; so, the semantics is correct.)

Thus, we need to _arbitrarily_ assign marks values. Commonly, natural numbers are used for that: they are called "nominals" (--> http://en.wikipedia.org/wiki/Nominal_number) precisely because they act like symbol names for things that have no value.
The case of characters is typical: that 'ø' is represented by 248 is just arbitrary; we just need something, and software can only deal with values; so, we need a value. the only subset of a character set that is not arbitrarily ordered is precisely the suite of digits: because they are used to compose ordinals, which themselves form the archetype of every order.

In the case of card suits, I could define an independant mark for each suit. But the 4 of them also build a whole, namely the set of card suits. For such a notion, some languages introduce a builtin feature; for instance Pascal has "enumerations" for this (http://en.wikipedia.org/wiki/Enumeration_%28programming%29):
    var suit : (clubs, diamonds, hearts, spades);
A side-advantage of a nominal enumeration is that, each mark silently mapping to an ordinal number, marks happen to be ordered. Then, it's possible to compare them for inequality like in the game of bridge: clubs<diamonds<hearts<spades. Enumerations are thus mark _sequences_. Pascal calls this an ordinal type.

Pascal also has a notion of mark set (not collection set like in python). A bit too complicated to introduce here, maybe.

An interesting exercise is to define, in and for python, practicle types for isolated marks (sentinels), mark sequences (enumerations), and mark sets.

Hope it's clear,

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com