help me debug my "word capitalizer" script

Chris Angelico rosuav at gmail.com
Wed Aug 22 03:43:33 EDT 2012


On Wed, Aug 22, 2012 at 4:21 PM, Santosh Kumar <sntshkmr60 at gmail.com> wrote:
> Purpose of the script:
> To capitalize the first letter of any word in a given file, leaving
> words which have 3 or less letters.
>
> Bugs:
> I know it has many bugs or/and it can be improved by cutting down the
> code, but my current focus is to fix this bug:
>   1. When I pass it any file, it does it stuff but inserts a blank
> line everytime it processes a new line. (Please notice that I don't
> want the output in an another file, I want it on screen).

Firstly, THANK YOU for making your question so easy to work with!

I'll start with a quick answer to your immediate question, then add
some other comments about the code.

If you print(repr(words[-1])) somewhere in there, you'll notice that
the lines you're iterating over actually have a \n at the end. That's
where your duplication is coming from; you're adding a line separator,
but one's already being added for you.

Since you're working in text, you can just let \n be your line
separator; input and output should both translate as required.
Probably the easiest fix is to simply ignore os.linesep and keep the
newlines on the last words of the lines; all you care about is the
beginning of the word. However, this might raise another odd issue, in
that a two-letter word at the end of a line would be capitalized. This
can be fixed by adding '\n' to the set of characters that you strip.

Alternatively, trim off the newlines and then join then back in again.
Not difficult and I'm sure you'll see where that needs to be done :)

So! Unsolicited comments follow.

> Here is the script I am using:
>
> from os import linesep
> from string import punctuation
> from sys import argv

I would be inclined instead to simply "import os" "import string"
"import sys" and then use the qualified names. It's not something
you're doing a lot of - each of these is used in precisely one place
in the script - so the adorned names aren't going to get much in the
way.

> script, givenfile = argv
>
> with open(givenfile) as file:

Presumably you don't need a full-on arg parser. I'd actually just
inline this as "open(argv[1])" rather than unpack into separate
variables; that way, if the user provides more args, the extras will
be ignored (instead of throwing an exception).

Something you may want to look into at some point is list
comprehensions. Whenever you have a loop that iterates over a list and
builds another list, consider a more functional notation. For
instance, your inner loop could become:

words = [(word.capitalize if len(word.strip(punctuation)) > 3 else
word) for word in line.split(' ')]

The parentheses are optional, but may help you see how the interpreter
parses that.

Code looks pretty good though, and again, a big thank you for making
your question so clear :) It's a pleasure to help you.

Chris Angelico



More information about the Python-list mailing list