Evaluate my first python script, please

Thu Mar 4 14:54:55 EST 2010

On Mar 4, 2:30 pm, MRAB <pyt... at mrabarnett.plus.com> wrote:
> Pete Emerson wrote:
> > I've written my first python program, and would love suggestions for
> > improvement.
>
> > I'm a perl programmer and used a perl version of this program to guide
> > me. So in that sense, the python is "perlesque"
>
> > This script parses /etc/hosts for hostnames, and based on terms given
> > on the command line (argv), either prints the list of hostnames that
> > match all the criteria, or uses ssh to connect to the host if the
> > number of matches is unique.
>
> > I am looking for advice along the lines of "an easier way to do this"
> > or "a more python way" (I'm sure that's asking for trouble!) or
> > "people commonly do this instead" or "here's a slick trick" or "oh,
> > interesting, here's my version to do the same thing".
>
> > I am aware that there are performance improvements and error checking
> > that could be made, such as making sure the file exists and is
> > readable and precompiling the regular expressions and not calculating
> > how many sys.argv arguments there are more than once. I'm not hyper
> > concerned with performance or idiot proofing for this particular
> > script.
>
> > Thanks in advance.
>
> > ########################################################
> > #!/usr/bin/python
>
> > import sys, fileinput, re, os
>
> > filename = '/etc/hosts'
>
> > hosts = []
>
> > for line in open(filename, 'r'):
> >    match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>
> Use 'raw' strings for regular expressions.
>
> 'Normal' Python string literals use backslashes in escape sequences (eg,
> '\n'), but so do regular expressions, and regular expressions are passed
> to the 're' module in strings, so that can lead to a profusion of
> backslashes!
>
> You could either escape the backslashes:
>
>         match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line)
>
> or use a raw string:
>
>         match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>
> A raw string literal is just like a normal string literal, except that
> backslashes aren't special.
>
> >    if match is None or re.search('^(?:float|localhost)\.', line):
> > continue
>
> It would be more 'Pythonic' to say "not match".
>
>         if not match or re.search(r'^(?:float|localhost)\.', line):
>
> >    hostname = match.group(1)
> >    count = 0
> >    for arg in sys.argv[1:]:
> >            for section in hostname.split('.'):
> >                    if section == arg:
> >                            count = count + 1
>
> Shorter is:
>
>                                 count += 1
>
> >                            break
> >    if count == len(sys.argv) - 1:
> >            hosts.append(hostname)
>
> > if len(hosts) == 1:
> >    os.system("ssh -A " + hosts[0])
> > else:
> >    print '\n'.join(hosts)
>
> You're splitting the hostname repeatedly. You could split it just once,
> outside the outer loop, and maybe make it into a set. You could then
> have:
>
>          host_parts = set(hostname.split('.'))
>         count = 0
>         for arg in sys.argv[1:]:
>                 if arg in host_parts:
>                         count += 1
>
> Incidentally, the re module contains a cache, so the regular expressions
> won't be recompiled every time (unless you have a lot of them and the re
> module flushes the cache).

I think that you really just need a set intersection for the count
part and this should work:

host_parts = set(hostname.split('.'))
count = len(host_parts.intersection(sys.argv[1:]))