data validation when creating an object

Thu Jan 16 11:58:03 EST 2014

On 2014-01-16 16:18, Roy Smith wrote:
> On Thursday, January 16, 2014 10:46:10 AM UTC-5, Robert Kern wrote:
>
>> I prefer to keep my __init__() methods as dumb as possible to retain the
>> flexibility to construct my objects in different ways. Sure, it's convenient to,
>> say, pass a filename and have the __init__() open() it for me. But then I'm
>> stuck with only being able to create this object with a true, named file on
>> disk. I can't create it with a StringIO for testing, or by opening a file and
>> seeking to a specific spot where the relevant data starts, etc. I can keep the
>> flexibility and convenience by keeping __init__() dumb and relegating various
>> smarter and more convenient ways to instantiate the object to classmethods.
>
> There's two distinct things being discussed here.
>
> The idea of passing a file-like object vs. a filename gives you flexibility, that's for sure.  But, that's orthogonal to how much work should be done in the constructor.  Consider this class:

Where the two get conflated is that both lead to advice that looks the same (or 
at least can get interpreted the same by newbies who are trying to learn and 
don't have the experience to pick out the subtleties): "do nothing in __init__". 
That's why I am trying to clarify where this advice might be coming from and why 
at least one version of it may be valid.

> class DataSlurper:
>      def __init__(self):
>          self.slurpee = None
>
>      def attach_slurpee(self, slurpee):
>          self.slurpee = slurpee
>
>      def slurp(self):
>          for line in self.slurpee:
>              # whatever
>
> This exhibits the nice behavior you describe; you can pass it any iterable, not just a file, so you have a lot more flexibility.  But, it's also exhibiting what many people call the "two-phase constructor" anti-pattern.  When you construct an instance of this class, it's not usable until you call attach_slurpee(), so why not just do that in the constructor?

That's where my recommendation of classmethods come in. The result of __init__() 
should always be usable. It's just that its arguments may not be as convenient 
as you like because you pass in objects that are closer to the internal 
representation than you normally want to deal with (e.g. file objects instead of 
filenames). You make additional constructors (initializers, whatever) as 
classmethods to restore convenience.

class DataSlurper:
   def __init__(self, slurpee):
     self.slurpee = slurpee

   @classmethod
   def fromfile(cls, filename):
     slurpee = open(filename)
     return cls(slurpee)

   @classmethod
   def fromurl(cls, url):
     slurpee = urllib.urlopen(url)
     return cls(slurpee)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco