Why does python not have a mechanism for data hiding?

Tue Jun 3 14:02:38 EDT 2008

On Jun 3, 3:07 am, "BJörn Lindqvist" <bjou... at gmail.com> wrote:
> On Mon, Jun 2, 2008 at 10:50 PM, Russ P. <Russ.Paie... at gmail.com> wrote:
> > On Jun 2, 6:41 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> >> You are not realizing that only useful(**) thing about data hiding is
> >> that some code has access to the data, other code does not.  If you
> >> "hide" data equally from everyone it's just a useless spelling change.
>
> > I think you're missing the point.
>
> > As I see it, the primary value of data hiding is that it provides
> > useful information on which data and methods are intended for the
> > client and which are intended for internal use. It's like putting a
> > front panel on a TV set with the main controls intended for the
> > viewer.
>
> Here's my two cents. First of all, a TV is a bad analogy compared to
> reusable software libraries. Really bad analogy. A TV is a horribly
> complicated device which has to be dumbed down because otherwise it
> would be to hard to use for ordinary people.
>
> A software developers relation to a third party library is more
> similar to a TV repair man trying to repair a TV than to a random
> person watching TV. For a repair man, the front panel is just useless
> and in the way.
>
> Oh, and to continue on the TV analogy, one of the reason why a TV is
> complicated is because its interface is totally different from its
> implementation. Channels are just a bad abstraction for tuning the
> receiver to different frequencies and for switching inputs. Merely
> using a TV doesn't teach you anything about how it actually works.
>
> KISS: Keep It Simple Stupid. And it is always simpler to not implement
> the gunk needed for data hiding than to do it. By keeping things
> simple you keep your code easy to implement, easy to understand and
> easy to reuse.
>
> Data hiding sacrifices implementation simplicity supposedly to make
> the interface simpler and to keep backwards compatibility. It allows
> you to change implementation details without affecting the
> interface. But do you really want to do that? Consider this silly Java
> example:
>
>     class Foo {
>         private int bar;
>         public int getBar() {
>             return bar;
>         }
>     };
>
> Then for some reason you decide that hm, "bar" is not a good attribute
> name so you change it to "babar". And you can do that without changing
> the public interface! Woho! So now you have a public getter named
> "getBar" that returns an attribute named "babar". That's in reality
> just bad and whoever is maintaining the implementation is going to be
> annoyed that the getters name doesn't match the attribute name.
>
> What would have happened without data hiding? Renaming the public
> attribute "bar" to "babar" probably cause some grief for someone
> reusing your library, but you would keep your implementation pure.
>
> What about semantic changes? Data hiding doesn't protect you against
> that, so you'll have to change your interface anyway. The interface
> for a car hasn't changed much in the last 100 years, but the
> implementation has. How easy is it to repair a car nowadays compared
> to 30 years ago?
>
> And data hiding as a documentation aid is just a sham. "These methods
> are public so you can call them, these aren't so hands off!" A reuser
> of your library *will* want to know what happens on the inside, by
> trying to make stuff impossible to reach you are just making that kind
> of information much harder to come by.
>
> The better method is to just write proper docstrings that tell the
> user what the methods do and when they can be called.
>
> Another good way to see how useless data hiding is, is to try and unit
> test a very encapsulated library. You'll see that it is almost
> impossible to write good unit tests unless you publicly export
> almost everything in the code. At which point you come to realize that
> all the data hiding was for naught.
>
> --
> mvh Björn

I really like this message and find it very true.  Writing unit tests
for private data is nigh impossible.  You end up either creating
accessors, or passing in parameters via the constructor (resulting in
a huge constructor).  Personally, I'd rather have better test coverage
than data hiding.

Second, private vars with third party libs suck, and are nothing but
an infuriating frustration.  I'm currently dealing with about 3 or 4
different libs, one of them uses private variables and its a huge
headache.  I have to access some of those private vars occasionally to
make my thing work.  The other libs i'm using don't have any private
vars (__) (only a couple protected ones, _), and its a breeze.  The
docs say "this does x" or there's a comment that says "don't use this
unless you really know what you're doing," and I respect their
warnings.

When I was fooling around with sqlalchemy, it made heavy use of
protected vars but had a straight forward public api.  Unfortunately,
writing plugins for it required access to some of those protected
vars.  It wouldn't be possible if they were strictly controlled and
restricted by the language itself.  Whenever I'd use those protected
vars, I expected an odd behavior or two.  When using private vars, I
don't expect it to work at all, and really, refrain from using them
unless i've grokked the source.

My point is that I currently like the private/protected/public scheme
python has going on.  It lets me fix or alter things if I have to, but
also provides a warning that I shouldn't be doing this.

As for customers using the internals and worrying about an upgrade
breaking them, it seems likes a silly issue, at least in python.  If
there are internals that the customer would be playing with, then it
should be exposed publically, since they want it that way to begin
with. If they're using defunct variables or methods, you use
properties and __getattr__ to maintain backwards compatibility for a
version or two.