Friday Finking: initialising values and implied tuples

Sun Apr 4 19:47:53 EDT 2021

On 04/04/2021 01.00, Rob Cliffe via Python-list wrote:
> 
> 
> On 03/04/2021 04:09, 2QdxY4RzWzUUiLuE at potatochowder.com wrote:
>> On 2021-04-03 at 02:41:59 +0100,
>> Rob Cliffe via Python-list <python-list at python.org> wrote:
>>
>>>      x1 = 42; y1 =  3;  z1 = 10
>>>      x2 = 41; y2 = 12; z2 = 9
>>>      x3 =  8;  y3 =  8;  z3 = 10
>>> (please imagine it's in a fixed font with everything neatly vertically
>>> aligned).
>>> This has see-at-a-glance STRUCTURE: the letters are aligned vertically
>>> and the "subscripts" horizontally.  Write it as 9 lines and it becomes
>>> an amorphous mess in which mistakes are harder to spot.
>> I agree that writing it as 9 lines is an accident waiting to happen, but
>> if you must see that structure, then go all in:
>>
>>      (x1, y1, z1) = (43,  3, 10)
>>      (x2, y2, z2) = (41, 12,  9)
>>      (x3, y3, z3) = ( 8,  8, 10)
> Agreed, that is even easier to read.  (It would be kinda nice if the
> compiler could optimise the tuples away, for those of us who are
> paranoid about performance.)

I think I've read that the compiler is smart-enough to realise that the
RHS 'literal-tuples'?'tuple-literals' are being used as a 'mechanism',
and thus the inits are in-lined. Apologies: I can't find a web.ref.
Likely one of our colleagues, who is 'into' Python-internals, could
quickly clarify...

Can I invite you/us/all/each to think on these ideas a bit further?
(I'd like to conflate a few earlier contributions, if I may)

Way back, when people like @Alan and myself were young and innocent (and
the power to run mainframe computers was produced by dinosaurs running
in treadmills) we didn't have PEP-008 advice, but suffered hard-limits,
such as 80-column cards. Both RAM and secondary-storage costs were $big,
so we were taught all manner of 'conservation' techniques, eg

scX = 1    # starting co-ordinates
scY = 2
scZ = 3

The abbreviated names required noticeably shorter processing times. The
comment was necessary because "sc" or "scX" could be interpreted as
abbreviations for many, diverse, things. However, the obvious draw-back?
Later in the code, when we came to "scX", we had no such
handy-reminder/Cliff-Notes!

(and yes, teaching-languages, such as Dartmouth BASIC, did limit
variable-names to single letters, etc, - and yes, once common letters
such as r, e, i, n, x, etc had been assigned one might be forced to use
"q" to count the number of eggs. What? Once such languages have warped
one's mind, recovery is all-but impossible! So, now you know...)

Accordingly, having over-come the limits of the past, current advice is
to choose names wisely - and in doing-so, obviate the need for such
comments, eg

starting_coordinate_X = 1
...

The initial 'complaint' against the idea of "long" identifiers relates
to typing competence. It's all very well to have meaningful-names, but
the more typing that has to be done, the more opportunity for errors to
be introduced/creep-in/insinuate themselves (let's be clear - it's not
my typing, it's all the error's fault!)

However, today's editors and IDEs are sufficient agile as to be able to
suggest intelligent and context-aware "completion" suggestions - most
even allowing the choice to be made without lifting hands from keyboard.
So, can we discard the 'too hard' complaint?

A point made earlier (in the thread), was that some of the data-items
'belonged together' and thus it would be reasonably sensible to group
them, eg

(x1, y1, z1) = (43,  3, 10)

This is often the reason for a longer 'list' of initialisations (but the
converse does not apply: a longer list may not consist of
closely-related data-items!)

The implication is that the x, y, and z values are related, eg a 3D
co-ordinate; and thus it does not impose "cognitive load" to initialise
them in a relative fashion (the second value "3", relates to the second
identifier "y1"). The small number of elements is relevant!

Good thinking!

Another suggestion for this situation, is to use a named-tuples. From
the (fine) manual:

"Named tuples assign meaning to each position in a tuple and allow for
more readable, self-documenting code. They can be used wherever regular
tuples are used, and they add the ability to access fields by name
instead of position index."
(https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple)

Borrowing the manual's example-code (apologies for any unfortunate
word-wrapping by my email-client):

>>> # Basic example
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22)     # instantiate with positional or keyword
arguments
>>> p[0] + p[1]             # indexable like the plain tuple (11, 22)
33
>>> x, y = p                # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y               # fields also accessible by name
33
>>> p                       # readable __repr__ with a name=value style
Point(x=11, y=22)

The drawback/cost/effort-required is that the named-tuple must have been
defined previously, and only then can we employ such to initialise a
group of values.

Assuming we chose a more illuminating identifier than "p", later in the
code it would be quite apparent what "p.x" means (per critique, above), eg:

starting_point = Point(11, y=22)

It may also be worth noting that named_tuples can be pre-set with
default values. This may reduce the lines of code to achieve
initialisation - assuming we (coders) remember the default-values
(correctly).

Thinking a bit more deeply, if we are going to take advantage of the
named-tuple which offers access to the data as a collection (eg "p"), as
well as to the individual/component values (eg "p.x") perhaps we should
be examining just how this data is used within the code.

In which case, a class-y guy such as I (if I keep saying it
often-enough, will you believe me?) will quickly prefer a custom-class
over a named-tuple (also they've been available for longer, thus
'tradition'). Not only will the individual data-items be mutable, but
the initialisation process happens as/at instantiation:

class coordinates():
    def __init__( x, y, z=None ):
        self.x = x
        ...

...
starting_coordinates = coordinates( 1, 2, 3 )

Alternately, the dataclass approach (per earlier contribution), or
remove the relative nature of argument-tuples by using explicit (named-)
parameters:

    def __init__( x=0, y=0, z=None ):
        ...

...
starting_coordinates = coordinates( x=1, y=2, z=3 )

This is 'self-documenting'. Thus no need for explanatory comments.

Using a structured-object, we have the capability to do more with the
data, either as an entity or individually, eg

    def move_horizontally( delta=1 ):
        self.x += delta

Of course, it all hinges on how the data-items will be used after the
initialisation stage. There is utterly no point in coding a class merely
to shorten an initialisation-phase! (Same logic applies to named-tuples)

Did you spot how various contributors identified when they prefer one
method in a specific situation, but reach for another under differing
circumstances!
-- 
Regards,
=dn