Does Python really follow its philosophy of "Readability counts"?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Jan 18 00:28:43 EST 2009


On Sat, 17 Jan 2009 20:49:38 +0100, Bruno Desthuilliers wrote:

> Russ P. a écrit :
>> On Jan 15, 12:21 pm, Bruno Desthuilliers
>> <bdesth.quelquech... at free.quelquepart.fr> wrote:
>> 
>>> Once again, the important point is that there's a *clear* distinction
>>> between interface and implementation, and that you *shouldn't* mess
>>> with implementation.
>> 
>> If you "*shouldn't* mess with the implementation", then what is wrong
>> with enforcing that "shouldn't" in the language itself?

Russ: There are SHOULD NOTs and there are MUST NOTs.

In Python, direct access to pointers is a MUST NOT. So the language 
itself prohibits arbitrary access to memory, and pure Python programs 
aren't subject to the same security holes and crashes that C programs are 
subject to because of their use of pointers.

Messing with the implementation is a SHOULD NOT. There are times 
(possibly rare) where you are allowed to mess with the implementation. 
Hence the language doesn't prohibit it, only discourage it. The language 
designers choose how many hoops you have to jump through (and what 
performance penalty you suffer) in order to mess with the implementation.

 
> Because sometimes you have a legitimate reason to do so and are ok to
> deal with the possible issues.

Bruno: Yes, but most of the time you don't.

The consequence of this dynamism is that the Python VM can't do many 
optimizations at all, because *at any time* somebody might mess with the 
implementation. But 90% of the time nobody does, so Python is needlessly 
slow 90% of the time. Wouldn't it be nice if there was a way to speed up 
that 90% of the time while still allowing the 10% to take place?

The current solution to this problem is to try to push as much as 
possible into functions written in C: built-ins and custom C extensions. 
It's not a bad solution: most Python programs rely on many C built-ins, 
which enforces real encapsulation and data hiding. With the possible 
exception of mucking about with ctypes, you simply can't access mess with 
the internals of (say) lists *at all*. Is this a bad thing?

Would it be so terrible if we could do the same thing in pure Python? Why 
should I have to write in C if I want the same protection?


 
>> Why leave to
>> coding standards and company policy what can be encoded right into the
>> language?
> 
> Because human are smarter than computers.

That's an awfully naive statement. It's a sound-byte instead of a 
reasoned argument. We're smarter than computers? Then why are we 
programming in languages like Python instead of directly in machine code? 
Why can optimizing C compilers make more efficient code than the best 
human assembly language programmers?

Humans and computers are smart at different things. Human beings are not 
terribly good at keeping track of more than about seven things at once, 
on average, and consequently we easily forget what scope variables are 
in. It's probably been at least 15 years since any released version of 
Python has been buggy enough to "forget" whether a name was in one scope 
or another, and yet human programmers still generate NameError and 
AttributeError exceptions *all the time*. I bet even Guido still makes 
them occasionally.



>> Why leave to humans (who are known to err) what can be automated
>> relatively easily? Isn't that what computers are for?
> 
> Error is human. For a real catastrophic failure, it requires a computer.

Oh rubbish. That's a sound-byte invented by nervous technophobes scared 
of computers. I expected better from a programmer. When computers fail, 
it's is almost certainly because of human error: some human being wrote 
buggy code, some human being turned off a test because it was generating 
too many warnings, some human being failed to prove their code was 
correct.

Chernobyl was a catastrophic failure that happened when *humans* turned 
off their safety systems to do a test, then couldn't turn them back on.


>> All those "setters" and
>> "getters" are a kludge. I think Python "properties" are a major step
>> forward here. Just for fun, I checked to see if Scala has properties.
>> Guess what? Not only does it have them, but they are generated
>> automatically for all member data. That's even better than Python
>> properties!
> 
> Oh yes ? "Better", really ? So it's better to have a language that
> automagically breaks encapsulation (requiring an additionnal level of
> indirection) than a language that do the right thing by default ? I'm
> afraid I missed the point ???

You certainly do. How do properties "break" encapsulation rather than 
enforcing it?



>>>> As I said before, enforced encapsulation may not be appropriate for
>>>> every application, but it is definitely appropriate for some.
>>> No. It is appropriate for dummy managers hiring dummy programmers. The
>>> project's size and domain have nothing to do with it.
>> 
>> Let me try to be very clear here. We are dealing with two separate but
>> related issues. The first is whether data hiding should be added to
>> Python.
> 
> No need to add it, it's already there : every name that starts with an
> underscore is hidden !-)

That's not hidden. It's there in plain sight. 

In Unix, file names with a leading dot are hidden in the shell: when you 
do a file listing, you don't see them. It's not difficult to get to see 
them: you just pass -a to the ls command. As data hiding goes, it's 
pretty lame, but Python doesn't even suppress _ names when you call dir. 
Frankly, I wish that by default it would -- 99% of the time when I call 
dir, I *don't* want to see _ names. They just get in the way.


[...] 
>> Whether it can be added without screwing up the language, I don't know.
>> If not, then so be it. That just limits the range of domains where
>> Python is suitable.
> 
> That's just plain stupid.

No it's not. It's *practical*. There are domains where *by law* code 
needs to meet all sorts of strict standards to prove safety and security, 
and Python *simply cannot meet those standards*.


>> As for whether data hiding provides a net benefit in any language, it
>> certainly does for large programs and for safety-critical programs. For
>> large, safety-critical systems, it's a no-brainer.
> 
> Only if you fail to use your brain. Now, except for regurgitating the
> official OMG prose, do you have *anything* to back these claims ? Python
> is older than Java, and there are quite enough man/years of experience
> and Python success stories to prove that it *just work*.

Nobody doubts that Python works for many applications. But can you point 
to any large, safety-critical system programmed in Python?



>> I like to use the example of the flight software for a large commercial
>> transport aircraft, but many other examples could be given. How about
>> medical systems that control radiation therapy or chemotherapy? How
>> about financial systems that could take away your retirement account in
>> 1.5 milliseconds. Or how about the control software for the strategic
>> nuclear arsenals of the US or Russia? When you consider the sea, air,
>> and land-based components, I'm sure that's one hell of a lot of code!
> 
> And ? Such systems have been written (and quite a lot are still running)
> with languages way more permissive than Python. You know, languages like
> C or assembly. 

Yes, and it is *hard* because the programmer has to worry about data 
hiding *on his own*. That's why people no longer write large systems in 
assembly and use high-level languages that deal with all those data 
hiding issues for you.

One of my friends has worked for many years programming some pretty high-
powered banking software. Their approach is to move data-hiding into the 
database, or the operating system. Cobol doesn't enforce encapsulation, 
but the database and OS certainly do, with a permissions-based approach.

Speaking of banking software, consider a typical banking application. It 
may have dozens or hundreds of programmers working on it. It's too big 
for any one person to understand all of it. Once deployed it may 
potentially have access to hundreds of billions of dollars of other 
people's money. Don't you imagine that one or two of these programmers 
might be tempted to skim a little off the top?

Data hiding is a good way of making sure that the guy writing the front 
end can't just turn of the audit trail and transfer $60,000,000 into his 
bank account. Why don't you approach your bank and suggest that it would 
be a Good Thing if he could? Think of the programming time they would 
save with the added dynamism! Why, it might shave off *weeks* from a six 
year project!



> Until you understand that *no technology is idiot-proof*,
>   you'll get nowhere in "software engineering".

I suspect that Russ has got a lot further in software engineering than 
you have. I suspect your attitude is part of the reason why, as they say, 
"If engineers built bridges the way programmers build software, the first 
woodpecker than came along would destroy civilization".

No technology is failure proof. But there's no reason in the world why 
most technology can't be idiot-proof. Televisions are idiot-proof, 
because they protect people from casual mistakes. If televisions were 
built according to the Python model, the internals of the TV would be 
exposed, without even a cover. All the major parts would be plug-in 
rather than soldered in, and there would be no cover over the parts that 
were live. Every year, tens of thousands of people would electrocute 
themselves fatally (because parts of the TV holds a massive charge for 
days after you unplug them from the mains) but that would be okay, 
because you never know when somebody might want to pull out the fly-back 
transformer and replace it with a six ohm resistor. That sort of dynamism 
is important!

Well, maybe so, but not in a television. Consequently televisions are 
built to hide the internals from the user. If you *really* want to, then 
you can get a screwdriver and remove the back and unsolder the fly-back 
transformer and replace it with a six ohm resistor. It's your TV, do what 
you want. But nobody is going to die because the picture was fuzzy and 
they heard from some chat forum that the way to fix that was to poke the 
back of the picture tube with a screw driver "to let out the excess ohms".



[...]
>> An FMS programmer could perhaps
>> decide to change the parameters of the engine controls, for example.
> 
> Why on earth would he do something so stupid ?

I'm sure he'd think he had a good reason. As you said, 

"Because sometimes you have a legitimate reason to do so and are ok to 
deal with the possible issues."

Maybe other people would disagree whether or not it was a legitimate 
reason, or if he was OK dealing with the possible issues.

Perhaps he needed the extra optimization of skipping the getter/setters. 
Perhaps he needed it for testing, and somehow one thing led to another. 
Who knows?

It is strange that on the one hand you should insist that programmers 
sometimes need to mess with internals, and on the other dismiss those who 
do as "stupid". Your position is inconsistent.



>> To prevent that sort of thing from happening, the management could
>> decree that henceforth all "private" variable names will start with an
>> underscore. Problem solved, eh?
> 
> Certainly not. The only way to solve such a problem is to fire this
> cretin.

Again, we shouldn't enforce encapsulation and data hiding because there 
are legitimate reasons for breaking it, but anyone who does break it is a 
cretin. You have a very strange attitude.

Besides, it's not very practical. When you fire "the cretin", it has 
consequences. Everyone else in the project has to work harder, which has 
costs, or you have to replace him, which also has costs. Maybe you can't 
fire him, because he's the only one who can debug problems in the auto-
pilot. Perhaps the rest of the team downs tools and walks off the job in 
sympathy. Perhaps he sues you for unfair dismissal. Expenses rise. Time-
lines slip. Uncertainty increases.

It's also bad for moral when you fire somebody for messing with the 
internals when you have a policy that it is allowed to mess with the 
internals. That's why you picked a dynamic language like Python in the 
first place, because it doesn't prevent you from messing with the 
internals. And now when somebody does, you sack him? If you ask me, it's 
management who needs to be sacked, not the programmer who merely used the 
tools given to him.

Perhaps you can't fire the cretin, because the first time you discover 
the problem is eight years later when a place filled with nuns and 
orphans flips upside down and flies straight into the ground.

Perhaps it would have been better to prevent him from messing with the 
internals in the first place, even at some extra cost. When you're in 
business, you have to make decisions like:

* do I write the software in Python, which will have a 99% chance of 
costing $100,000 and a 1% chance of costing $100,000,000?

* or do I write it in a B&D language like Java, which will have a 100% 
chance of costing $2,000,000?


[...]
> Please educate yourself and learn about why Ariane 5 crashed on it's
> first flight, due to an error in a module written in ADA (which is such
> a psychorigid language that C++ and Java are even looser than Javascript
> in comparison). Perhaps will it light a bulb for you.

Others have already pointed out that the error in the Ariane 5 rocket was 
*human* error due to somebody messing with the internals, namely 
defeating the compiler's default type checking. I'd just like to ask: 
Bruno, were you aware of the cause of the crash, and if not, why did you 
raise the issue in the first place? Did you think it was a compiler bug 
that caused the crash?


>>>> Not
>>>> every door needs a lock, but certainly some do.
>>> You only need locks when you don't trust your neighbours.
>> 
>> Yeah, if you live in Nome, Alaska.
> 
> Brillant. You obviously failed to understand the differences between
> "software engineering" and real life. When it comes to computers, the
> only doors I care closing are those which would let someone crack my
> computer.

If only there was some way to know which bugs could let people crack our 
computer and which bugs couldn't.

Anyway, that's your choice. Personally, I'd much prefer my software not 
to cause data loss, not to crash, not to DoS me, not to hang, and not to 
generate bogus data, as well as not letting strangers crack into my 
system. There are many, many program paths which could potentially lead 
to these results. It would be nice if I could close those doors.


 
> But FWIW, I never close my car. And, in case you don't know, it's
> perfectly possible to access "private" members in Java (and, if you do
> have access to the source code, in C++ too).

Yes, but it is more difficult. There's a larger psychological barrier. 
It's easier to audit for such access, even in a large project. It 
encourages a more careful attitude: "do I *really* need to mess with the 
internals, or is there a safer way?". It forces the programmer to *think* 
before doing something potentially dangerous.

This is why I wish eval and exec were in a module instead of built-ins. 
They'd still be there, but you'd have to jump through one small hoop 
before using them.

Suppose Python worked like this:


>>> class Parrot:
...     _private = 'spam'
...
>>> p = Parrot()
>>> p._private = 'ham'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ProtectionError: attribute is read-only from outside of class Parrot
>>> from protection import unlock
>>> unlock(p)._private = 'ham'
>>> p._private
'ham'


I don't know if this scenario is even possible in Python, but pretend 
that it is. Would it be so terrible? If a particular project wanted to 
enforce encapsulation, all they need do is replace or remove the 
protection module from their Python installations. (I assume the project 
developers aren't *hostile*. If they are, then there's almost nothing you 
can do to make the code safe. Encapsulation is about protecting from 
accidents, not sabotage.) If you wanted to mess with the internals in 
your own project, all you need do is import a module.


We could imagine the same scenario in reverse. Python allows getters and 
setters, but they're more work, and so people just don't use them unless 
they really need to. Suppose Python offered real data encapsulation, but 
you had to work to get it:

>>> class Parrot:
...     _private = 'spam'
...
>>> p = Parrot()
>>> p._private = 'ham'  # allowed by default
>>> from protection import lock
>>> lock(p)._private
>>> p._private = 'spam'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ProtectionError: attribute is read-only from outside of class Parrot


Would that be so bad? I don't think so.


-- 
Steven



More information about the Python-list mailing list