[Python-ideas] SI scale factors in Python

Fri Aug 26 02:54:03 EDT 2016

On Thu, Aug 25, 2016 at 08:46:54PM -0700, Ken Kundert wrote:

> This idea is new to general purpose languages, 

For the record, at least some HP calculators include a "units" data type 
as part of the programming language "RPL", e.g. the HP-28 and HP-48 
series. I've been using those for 20+ years so I'm quite familiar with 
how useful this feature can be.

> but it has been used for over 40 
> years in the circuit design community. Specifically, SPICE, an extremely heavily 
> used circuit simulation package, introduced this concept in 1974. In SPICE the 
> scale factor is honored but any thing after the scale factor is ignored.  Being 
> both a heavy user and developer of SPICE, I can tell you that in all that time 
> this issue has never come up. In fact, the users never expected there to be any 
> support for dimensional analysis, nor did they request it.

I can't comment about the circuit design community, but you're trying to 
extrapolate from a single specialist application to a general purpose 
programming language used by people of many, many varied levels of 
expertise, of competence, with many different needs.

It makes a lot of sense for applications to allow SI prefixes as 
suffixes within a restricted range. For example, the dd application 
allows the user to specify the amount of data to copy using either bytes 
or blocks, with optional suffixes:

    BLOCKS  and BYTES may be followed by the following multiplicative 
    suffixes: xM M, c 1, w 2, b 512, kB 1000, K 1024, MB 1000*1000, 
    M 1024*1024, GB 1000*1000*1000, G 1024*1024*1024, and so on for  
    T, P, E, Z, Y.

(Quoting from the man page.)

That makes excellent sense for a specialist application where numeric 
quantities always mean the same thing, or in this case, one of two 
things. As purely multiplicative suffixes, that even makes sense for 
Python: earlier I said that it was a good idea to add a simple module 
defining SI and IEC multiplicative constants in the std lib so that we 
could do x = 42*M or similar.

But that's a far cry from allowing and ignoring units.

> > Don't think of people writing code like this:
> > 
> >     result = 23mA + 75MHz
> > 
> > which is obviously wrong. Think about them writing code like this:
> > 
> >     total = sum_resistors_in_parallel(input, extra)
> 
> You say that '23mA + 75MHz' is obviously wrong, but it is only obviously wrong 
> because the units are included, which is my point. If I had written '0.023 
> + 76e6', it would not be obviously wrong.

I understand your point and the benefit of dimensional analysis. But the 
problem is, as users of specialised applications we may be used to 
doing direct arithmetic on numeric literal values, with or without 
attached units:

    23mA + 75MHz  # error is visible
    23 + 75  # error is hidden

but as *programmers* we rarely do that. Generally speaking, it is rare 
to be doing arithmetic on literals where we might have the opportunity 
to attach a unit. We doing arithmetic on *variables* that have come from 
elsewhere. Reading the source code doesn't show us something that might 
be improved by adding a unit:

    # we hardly ever see this
    23 + 75

    # we almost always see something like this
    input + argument

At best, we can choose descriptive variable names that hint what the 
correct dimensions should be:

    weight_of_contents + weight_of_container

The argument that units would make it easier for the programmer to spot 
errors is, I think, invalid, because the programmer will hardly ever get 
to see the units.

[...]
> Indeed that is the point of dimensional analysis. However, despite the 
> availability of all these packages, they are rarely if ever used because you 
> have to invest a tremendous effort before they can be effective. For example, 
> consider the simple case of Ohms Law:
> 
>     V = R*I
> 
> To perform dimensional analysis we need to know the units of V, R, and I. These 
> are variables not literals, so some mechanism needs to be provided for 
> specifying the units of variables, even those that don't exist yet, like V. 

This is not difficult, and you exaggerate the amount of effort required. 

To my sorrow, I'm not actually familiar with any of the Python libraries 
for this, so I'll give an example using the HP-48GX RPL language. 
Suppose I have a value which I am expecting to be current in amperes. 
(On the HP calculator, it will be passed to me on the stack, but the 
equivalent in Python will be a argument passed to a function.) For 
simplicity, let's assume that if it is a raw number, I will trust that 
the user knows what they are doing and just convert it to a unit object 
with dimension "ampere", otherwise I expect some sort of unit object 
which is dimensionally compatible:

1_A CONV

is the RPL program to perform this conversion on the top of the stack, 
and raise an error if the dimensions are incompatible. Converting to a 
more familiar Python-like API, I would expect something like:

current = Unit("A").convert(current)

or possibly:

current = Unit("A", current)

take your pick. That's not a "tremendous" amount of effort, it is 
comparable to ensuring that I'm using (say) floats in the first place:

if not isinstance(current, float):
    raise TypeError

> And what if the following is encountered:
> 
>     V = I
> 
> Dimensional analysis says this is wrong, 

That's because it is wrong.

> but the it may be that the resistance 
> is simply being suppressed because it is unity.

Your specialist experience in the area of circuit design is 
misleading you. There's effectively only one unit of resistance, the 
ohm, although my "units" program also lists:

R_K         25812.807 ohm
abohm       abvolt / abamp
intohm      1.000495 ohm
kilohm      kiloohm
megohm      megaohm
microhm     microohm
ohm         V/A
siemensunit 0.9534 ohm
statohm     statvolt / statamp

So even here with resistence "unity" is ambiguous. Do you mean one 
intohm, one microhm, one statohm or something else? I'll grant you that 
in the world of circuit design perhaps it could only mean the SI ohm.

But you're not in the world of circuit design any more, you are dealing 
with a programming language that will be used by people for many, 
many different purposes, for whom "unity" might mean (for example):

1 foot per second
1 foot per minute
1 metre per second
1 kilometre per hour
1 mile per hour
1 lightspeed
1 knot
1 mach

Specialist applications might be able to take shortcuts in dimensional 
analysis when "everybody knows" what the suppressed units must be. 
General purpose programming languages *cannot*. It is better NOT to 
offer the illusion of dimensional analysis than to mislead the user into 
thinking they are covered when they are not.

Better to let them use a dedicated units package, not build a half-baked 
bug magnet into the language syntax.

-- 
Steve