[Tutor] How to refactor a simple, straightforward script into a "proper" program?

boB Stepp robertvstepp at gmail.com
Sun Jan 5 00:40:57 EST 2020


On Sat, Jan 4, 2020 at 5:43 PM Alan Gauld via Tutor <tutor at python.org> wrote:
>
> On 04/01/2020 07:10, boB Stepp wrote:

> I've just joined this thread, but I believe the intent was to produce an
> industrial standard module?
>
> A few comments...

Uh, oh.  Alan is joining in.  Now I am going to get it good!  ~(:>))

> > <final version2.py>
>
> Terrible name for a module! :-)
> Rename it to express what it does.

Drat!  I was hoping to keep upping the number on the end...

> > ============================================================================
> > #!/usr/bin/env python3
> > """Calculate the number of pages per day needed to read a book by a
> > given date."""
>
> Give some examples. eg. If a book has 400 pages and there is one day it
> will be 400, if 20 days, 20 pages/day. Something like that.

As I said in response to Joel's comments, I am currently engaged on
some supplementary reading on this topic.  But in particular for a
stand-alone program file, I have found a lot of contradictory advice
in the past of what *should* be put at the top of the file.

> >
> > from datetime import date
> >
> >
> > def get_input(str_converter, msg):
> >     """Prompt user with a message and return user input with the
> > correct data type."""
>
> That's what it says it does but....
>
> >     while True:
> >         try:
> >             return str_converter(input(msg))
> >         except ValueError:
> >             if str_converter == int:
> >                 print("\nPlease enter a whole number!")
> >             elif str_converter == date.fromisoformat:
> >                 print("\nPlease enter a valid date in the following
> > format yyyy-mm-dd!")
>
> And if its not one of those types? I get a ValueError raised then it
> does nothing. It just returns None. That's not very friendly.

Huh?  Joel said something similar.  Maybe I am being dense, but it
*does* do something.  If the user types in something that is not one
of the allowed types, he gets a helpful message telling him/her what
the program is expecting, and the program loops back to the original
message requesting input.  The function will not terminate until the
user enters the expected input.  Once the user does enter
syntactically valid input, it will return that value.  What am I
missing or not understanding?

> Maybe you should mention that. And describe the type limitations
> in the doc string. Or rename the function to get_number_input()

But it also does date input.  And if in the future I wanted other
sorts of input I need only add new string converters for the new data
types.

> This sounds like a reusable function, but in fact its pretty tied into
> the problem at hand.

Again, I wonder if I am being dense, but my intent with this function
is to handle a variety of input types.  I would have to add new
ValueError possibilities for new data types.  Is this what bothers
you?

Hmm.  I see that in case of ValueError it could look elsewhere for
what sort of user friendly error message to display for a particular
value of str_converter.  Is this what you are getting at?

> > def get_input_params():
> >     """Collect all needed input parameters."""
>
> What? For anything? Wow! a magic function that I can use to get any
> input parameters I ever need. Fantastic. Except.... That's not
> really what it does.

Not a winning docstring selection.

> Again this is not reusable code, its just a bit of the main function
> broken out. Thats OK but you need to tell people that its problem
> specific and maybe indicate its not reusable by renaming it
> _get_input_params() or something?
>
>
> >     while True:
> >         str_converters = [int, int, date.fromisoformat]
> >         input_msgs = [
> >             "How many pages are there in your book?  ",
> >             "How many pages have you read?  ",
> >             "What is your date to finish the book (yyyy-mm-dd)?  ",
> >         ]
> >         num_pages_in_book, num_pages_read, goal_date = map(
> >             get_input, str_converters, input_msgs
> >         )
> >         input_params = num_pages_in_book, num_pages_read, goal_date
> >         if validate_input_params(*input_params):
> >             return input_params
> >         print()
>
> Frankly this isn't helping with DRY, you could just as well put the
> innards into main(). It is all specific to the main function, there is
> only one call to the function. So you might as well put the code where
> it is needed.
>
> Personally, I think 3 calls to get_input() with the different
> arguments would be nicer too. Its not repeated code because you are
> passing different values. Its like using print(). Just because we call
> print() multiple times doesn't mean we should build some kind of output
> function that stores all of the output strings and then displays them,
> with pauses at key points etc. It doable (via yield etc) but not good
> code. We just call print() with different values.

This function/portion of the code is what I struggled with the most.
Yes, its existence is to call get_input() as many times as needed for
the total number of pieces of data it needs from the user and to
validate each datum for logical correctness.  And it is where a lot of
problem specific things are being kept. str_converters and input_msgs
and the identifiers for each datum needed from the user.  I still feel
awkward about this function, but it looks worse to me as originally
presented in the "simple" script.  I have quickly looked over D. L.
Neil's comments, and, combined with yours, wonder if I should redesign
the whole shebang from the ground up?  Maybe this problem does need an
OO approach.

> > def validate_input_params(num_pages_in_book, num_pages_read, goal_date):
> >     """Verify input parameters for logical correctness."""
>
> Needs much more work to describe what goes in, what comes out and what
> exceptions might be thrown.... see comments below on that topic.
>
>
> >     valid_input = True
> >     print()
> >     if num_pages_in_book <= 0:
> >         print("The number of pages in a book must be a positive integer!")
> >         valid_input = False
> >     if num_pages_read < 0:
> >         print("The number of pages read must be a whole number!")
> >         valid_input = False
> >     if num_pages_read == num_pages_in_book and num_pages_in_book > 0:
> >         print(
> >             "You have already read the entire book!  Why are you
> > running this program?"
> >         )
>
> Note, that last clause returns True. So even though it makes no sense
> you consider these valid inputs? Maybe that needs to be explained in the
> doc-string too.

After some thought I decided that this should be logically valid input
despite the user being a bit thick-headed in this case.  So if the
user has already read the book, he is dutifully told this and the
program will ultimately tell him he needs to read zero pages per day.
I did not explicitly code "valid_input = True" as that is the default.
Perhaps I should have put in a comment regarding this design decision?

> But...
> Validation functions really shouldn't have lots of prints in them.
> Separate logic from presentation. This should either return a boolean.
> Or you can throw suitable errors as found. Let the caller decide what
> to tell the user. This function should only validate the inputs.

I grudgingly concede your point.  Grudgingly because now I have to
figure out how to accomplish this.  ~(:>))

> >     if num_pages_read > num_pages_in_book:
> >         print("You cannot have read more pages than there are in the book!")
> >         valid_input = False
> >     if goal_date < date.today():
> >         print("You cannot set a goal date in the past!")
> >         valid_input = False
> >     return valid_input
> >
>
> Personally I'd opt to raise ValueErrors for te fails with the strings
> attached. Then return True if the function passes (so you can use
> it in a test).
>
> > def calc_pages_per_day(num_pages_in_book, num_pages_read, goal_date):
> >     """Return number of pages to read each day to attain goal."""
>
> I'd remove the calc from the name. Call it after what it returns.
> I'd rather read
>
> ppd = pages_per_day(bp, rp, due)
>
> than
>
> ppd = calc_pages_per_day(bp,rp,due)
>
> Maybe its just me but the calc seems implied by the function call.

But I thought the great preference was for function and method names
to be verbs not nouns?

> >     COUNT_TODAY = 1
> >     days_to_goal = (goal_date - date.today()).days
> >     pages_per_day = (num_pages_in_book - num_pages_read) /
> > (days_to_goal + COUNT_TODAY)
> >     return pages_per_day
>
> > def display_result(pages_per_day):
> >     """Display how many pages per day the reader must read to finish
> > the book."""
> >     if 0 < pages_per_day <= 1:
> >         word_to_display = "page"
>
> Would you really say page if it was 0.5 pages per day?

Actually, yes.  Just as I would say a half page per day.  OTOH, I
would say zero pages per day.  Maybe I need to dust off a grammar
book...

> I'd only use 'page' if it was exactly 1.
>
> But since its a float you need to use an epsilon value.
> So
>
> e = 0.0001
> if 1-e < pages_per_day < 1+e:
>     word_to_display = 'page'
> else: word_to_display = 'pages'

Do I really need an epsilon value for this particular case?  I am not
seeing where my intentions would fail here.  Can you provide an
example?  I am happy, say, if a user puts in some absurd goal date
that results in a pages per day of 1.7 e -100, if he/she really wants
to.  Originally I thought I would prohibit such things, but I finally
thought if the user wishes to be silly here, why not?

> >     else:
> >         word_to_display = "pages"
> >     print(
> >         "You must read {0:.2n} {1} per day to reach your goal.".format(
> >             pages_per_day, word_to_display
> >         )
> >     )
>
> > def main():
> >     """Run program and display results."""
> >     print(
> >         'Welcome to "HOW MANY PAGES???"\n'
> >         "Follow on-screen directions to determine how many pages to
> > read each day to "
> >         "finish your book.\n\n"
> >     )
>
> Why the mixed quotes? Make them all single quotes?
> Or use one pair of triple quotes?

I originally in my code did not use mixed quotes.  But when I saved
the file my new code formatting tyrant, Black, changed my consistent
quoting style to mixed quotes.  Apparently it insists on all double
quotes unless there are embedded quotes within when it converts those
to single quotes.

> >     while True:
> >         input_params = get_input_params()
> >         display_result(calc_pages_per_day(*input_params))
> >         run_again = input("Enter 'q' to quit or anything else to continue:  ")
> >         if run_again.lower().startswith("q"):
> >             break
> >
> > if __name__ == "__main__":
> >     main()
>
> In summary:
> 1) expand the doc-strings to describe input/output/errors
>    and limitations. Think about using the type inference
>    style notation for showing input and return types.
> 2) if a function isn't generally reusable make it obvious
>    by putting an underscore in front of the name. (or
>    make it generally reusable...usually much harder :-)

This is where I was hoping for some concrete, helpful examples.  I do
indeed find this "much harder".

> 3) separate presentation and logic.

Again, I find this not so easy.

> 4) use exceptions more
>
> Finally, consider wider usage. eBooks don't have consistent
> numbers of pages - it depends on font-size set by the user
> etc. So maybe an option to use lines or characters instead
> of pages? Or even percent as reported by a Kindle?
>
> The calculations are the same.
> The validation might need extra work.
> The output display would be different - needing a unit added?

FEATURE CREEP!  Actually sounds like an interesting addition.  But I
don't much like eReaders myself.  I do read my newspaper on an iPad
the company kindly supplied, but usually not books.  I like the feel
of a tangible book, but I'm an old fart, I guess.  I'd have to
research this some to consider implementing it.  My wife would make a
good source of info as she is always reading stuff on her phone or
Surface Pro.

Thanks for the detailed breakdown!

boB


More information about the Tutor mailing list