Why does python not have a mechanism for data hiding?

Tue Jun 3 14:29:13 EDT 2008

On Jun 3, 11:02 am, Richard Levasseur <richard... at gmail.com> wrote:
> On Jun 3, 3:07 am, "BJörn Lindqvist" <bjou... at gmail.com> wrote:
>
>
>
> > On Mon, Jun 2, 2008 at 10:50 PM, Russ P. <Russ.Paie... at gmail.com> wrote:
> > > On Jun 2, 6:41 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > >> You are not realizing that only useful(**) thing about data hiding is
> > >> that some code has access to the data, other code does not.  If you
> > >> "hide" data equally from everyone it's just a useless spelling change.
>
> > > I think you're missing the point.
>
> > > As I see it, the primary value of data hiding is that it provides
> > > useful information on which data and methods are intended for the
> > > client and which are intended for internal use. It's like putting a
> > > front panel on a TV set with the main controls intended for the
> > > viewer.
>
> > Here's my two cents. First of all, a TV is a bad analogy compared to
> > reusable software libraries. Really bad analogy. A TV is a horribly
> > complicated device which has to be dumbed down because otherwise it
> > would be to hard to use for ordinary people.
>
> > A software developers relation to a third party library is more
> > similar to a TV repair man trying to repair a TV than to a random
> > person watching TV. For a repair man, the front panel is just useless
> > and in the way.
>
> > Oh, and to continue on the TV analogy, one of the reason why a TV is
> > complicated is because its interface is totally different from its
> > implementation. Channels are just a bad abstraction for tuning the
> > receiver to different frequencies and for switching inputs. Merely
> > using a TV doesn't teach you anything about how it actually works.
>
> > KISS: Keep It Simple Stupid. And it is always simpler to not implement
> > the gunk needed for data hiding than to do it. By keeping things
> > simple you keep your code easy to implement, easy to understand and
> > easy to reuse.
>
> > Data hiding sacrifices implementation simplicity supposedly to make
> > the interface simpler and to keep backwards compatibility. It allows
> > you to change implementation details without affecting the
> > interface. But do you really want to do that? Consider this silly Java
> > example:
>
> >     class Foo {
> >         private int bar;
> >         public int getBar() {
> >             return bar;
> >         }
> >     };
>
> > Then for some reason you decide that hm, "bar" is not a good attribute
> > name so you change it to "babar". And you can do that without changing
> > the public interface! Woho! So now you have a public getter named
> > "getBar" that returns an attribute named "babar". That's in reality
> > just bad and whoever is maintaining the implementation is going to be
> > annoyed that the getters name doesn't match the attribute name.
>
> > What would have happened without data hiding? Renaming the public
> > attribute "bar" to "babar" probably cause some grief for someone
> > reusing your library, but you would keep your implementation pure.
>
> > What about semantic changes? Data hiding doesn't protect you against
> > that, so you'll have to change your interface anyway. The interface
> > for a car hasn't changed much in the last 100 years, but the
> > implementation has. How easy is it to repair a car nowadays compared
> > to 30 years ago?
>
> > And data hiding as a documentation aid is just a sham. "These methods
> > are public so you can call them, these aren't so hands off!" A reuser
> > of your library *will* want to know what happens on the inside, by
> > trying to make stuff impossible to reach you are just making that kind
> > of information much harder to come by.
>
> > The better method is to just write proper docstrings that tell the
> > user what the methods do and when they can be called.
>
> > Another good way to see how useless data hiding is, is to try and unit
> > test a very encapsulated library. You'll see that it is almost
> > impossible to write good unit tests unless you publicly export
> > almost everything in the code. At which point you come to realize that
> > all the data hiding was for naught.
>
> > --
> > mvh Björn
>
> I really like this message and find it very true.  Writing unit tests
> for private data is nigh impossible.  You end up either creating
> accessors, or passing in parameters via the constructor (resulting in
> a huge constructor).  Personally, I'd rather have better test coverage
> than data hiding.
>
> Second, private vars with third party libs suck, and are nothing but
> an infuriating frustration.  I'm currently dealing with about 3 or 4
> different libs, one of them uses private variables and its a huge
> headache.  I have to access some of those private vars occasionally to
> make my thing work.  The other libs i'm using don't have any private
> vars (__) (only a couple protected ones, _), and its a breeze.  The
> docs say "this does x" or there's a comment that says "don't use this
> unless you really know what you're doing," and I respect their
> warnings.
>
> When I was fooling around with sqlalchemy, it made heavy use of
> protected vars but had a straight forward public api.  Unfortunately,
> writing plugins for it required access to some of those protected
> vars.  It wouldn't be possible if they were strictly controlled and
> restricted by the language itself.  Whenever I'd use those protected
> vars, I expected an odd behavior or two.  When using private vars, I
> don't expect it to work at all, and really, refrain from using them
> unless i've grokked the source.
>
> My point is that I currently like the private/protected/public scheme
> python has going on.  It lets me fix or alter things if I have to, but
> also provides a warning that I shouldn't be doing this.
>
> As for customers using the internals and worrying about an upgrade
> breaking them, it seems likes a silly issue, at least in python.  If
> there are internals that the customer would be playing with, then it
> should be exposed publically, since they want it that way to begin
> with. If they're using defunct variables or methods, you use
> properties and __getattr__ to maintain backwards compatibility for a
> version or two.

If you think that private data and methods should not be allowed
because they complicate unit testing, then I suggest you take a look
at how unit testing is done is C++, Java, and Ada. They seem to do
just fine. Also, I have stated several times now that "back door"
access should be allowed. That should satisfy any need for access to
"private" data in unit testing.

But I think there is a more fundamental issue here. You complain about
problems with software that uses data encapsulation. So two
possibilities exist here: either the designers of the code were not
smart enough to understand what data or methods the client would need,
or the client is not smart enough to understand what they need. Maybe
the solution is smarter programmers and clients rather than a dumber
language.