My first ever Python program, comments welcome

Peter Otten __peter__ at web.de
Sun Jul 22 03:56:50 EDT 2012


Lipska the Kat wrote:

> Greetings Pythoners
> 
> A short while back I posted a message that described a task I had set
> myself. I wanted to implement the following bash shell script in Python
> 
> Here's the script
> 
> sort -nr $1 | head -${2:-10}
> 
> this script takes a filename and an optional number of lines to display
> and sorts the lines in numerical order, printing them to standard out.
> if no optional number of lines are input the script prints 10 lines
> 
> Here's the file.
> 
> 50	Parrots
> 12	Storage Jars
> 6	Lemon Currys
> 2	Pythons
> 14	Spam Fritters
> 23	Flying Circuses
> 1	Meaning Of Life
> 123	Holy Grails
> 76	Secret Policemans Balls
> 8	Something Completely Differents
> 12	Lives of Brian
> 49	Spatulas
> 
> 
> ... and here's my very first attempt at a Python program
> I'd be interested to know what you think, you can't hurt my feelings
> just be brutal (but fair). There is very little error checking as you
> can see and I'm sure you can crash the program easily.
> 'Better' implementations most welcome

> #! /usr/bin/env python3.2
> 
> import fileinput
> from sys import argv
> from operator import itemgetter
> 
> l=[]
> t = tuple
> filename=argv[1]
> lineCount=10
> 
> with fileinput.input(files=(filename)) as f:

Note that (filename) is not a tuple, just a string surrounded by superfluous 
parens. 

>>> filename = "foo.bar"
>>> (filename)
'foo.bar'
>>> (filename,)
('foo.bar',)
>>> filename,
('foo.bar',)

You are lucky that FileInput() tests if its files argument is just a single 
string.

>         for line in f:
>                 t=(line.split('\t'))
>                 t[0]=int(t[0])
>                 l.append(t)
>         l=sorted(l, key=itemgetter(0))
> 
>         try:    
>                 inCount = int(argv[2])
>                 lineCount = inCount
>         except IndexError:
>                 #just catch the error and continue              
>                 None
> 
>         for c in range(lineCount):
>                 t=l[c]
>                 print(t[0], t[1], sep='\t', end='')
> 

I prefer a more structured approach even for such a tiny program:

- process all commandline args
- read data
- sort
- clip extra lines
- write data

I'd break it into these functions:

def get_commmandline_args():
    """Recommended library: argparse.
       Its FileType can deal with stdin/stdout.
    """
def get_quantity(line):
    return int(line.split("\t", 1)[0])

def sorted_by_quantity(lines):
    """Leaves the lines intact, so you don't 
       have to reassemble them later on."""
    return sorted(lines, key=get_quantity)

def head(lines, count):
    """Have a look at itertools.islice() for a more
       general approach"""
    return lines[:count]

if __name__ == "__main__":
    # protecting the script body allows you to import
    # the script as a library into other programs
    # and reuse its functions and classes.
    # Also: play nice with pydoc. Try
    # $ python -m pydoc -w ./yourscript.py

    args = get_commandline_args()
    with args.infile as f:
        lines = sorted_by_quantity(f)
    with args.outfile as f:
        f.writelines(head(lines, args.line_count))

Note that if you want to handle large files gracefully you need to recombine 
sorted_by_quantity() and head() (have a look at heapq.nsmallest() which was 
already mentioned in the other thread).




More information about the Python-list mailing list