Python vs. C/C++/Java: quantitative data ?
GerritM
gmuller at worldonline.nl
Wed Mar 20 13:19:51 EST 2002
<..snip...>
> > For comparison of utilization level still other types of programs are
> > needed. In this type of comparison I expect even more discussion on the
> > "rules"; Are you allowed to use the plain language only, or also the
> > batteries which are included or everything on CPAN like archives; Are
> > packages which are used included in the line count; etcetera.
>
> I agree. Even things like a strong, helpful community for support are
> important for being productive. Python certainly has that. But I'm not
> sure how to measure that fairly :-)
>
> cheers,
> doug
> --
> http://www.bagley.org/~doug/contact.shtml
In a recent thread (see below) the wc utility was shown both in c and
python. These utilities are relatively well defined and one step larger than
the current shootout programs. They might be useful for benchmarking
purposes.
regards Gerrit
www.extra.research.philips.com/natlab/sysarch
---begin included message---
Re: Book Draft Available: Text Processing in Python
From: jimd at vega.starshine.org (Jim Dennis)
In article <mailman.1015993462.32701.python-list at python.org>,
David Mertz, Ph.D. wrote:
> Pythonistas:
> As some folks know, I am working on a book called _Text Processing in
> Python_. With the gracious permission of my publisher, Addison Wesley,
> I am hereby making in-progress drafts of the book available to the
> Python community. It's about half done right now, but that will
> increase over time.
> Take a look at the book URL: http://gnosis.cx/TPiP/
> I welcome any comments or feedback the Python community has about my
> book.
> Yours, David...
I was glancing through it and stopped when I read your word
counter (with no support for the command line options). I just
had to do one to emulate the GNU wc utility as closely as I can
in one quick session.
Below is a somewhat more faithful rendering of the GNU wc command.
Although it's about 120 lines long, almost forty of those are
blank lines, docstrings, or comments. In most cases it gives output
that is identical to GNU wc (including the character spacing).
The only discrepancies I've seen are in the -L (--max-line-length)
calculation (particularly on binary files).
It's pedagogical value is more in the use of the getopts module
and possibly in file iteration (for line in file: ...). The text
processing being done here is trivial. There's also a little bit
of exception handling, and a minimal amount of error avoidance ---
since Python will allow me to open a directory but will complain if
I try to read lines therefrom).
It is mildly interesting that this Python implementation of wc
is only about a third the length of the GNU version from the
text utils package (wc.c is 371 lines). Actually counting words
and characters the Python version is only about half the length.
(Glancing at the sources I see that I missed support for the
POSIXLY_CORRECT environment variable -- which modifies, or uglifies
if you prefer, the output format; I could add that in a few lines).
David, you're welcome to use this script as an example. Perhaps
you could list this as an example of how 14 lines of simple, focused
code grows to 140 lines by the time we add option handling, help
and error messages, exception handling and error avoidance, and all
that other stuff. (If you really want to scare people you could
include the wc.c from the GNU textutils package by way of comparison).
#!/usr/bin/env python2.2
import sys, os
""" wc: Emulate GNU wc (word count) """
help = '''Usage: wc [OPTION]... [FILE]...
Print line, word, and byte counts for each FILE, and a total line if
more than one FILE is specified. With no FILE, or when FILE is -,
read standard input.
-c, --bytes, --chars print the byte counts
-l, --lines print the newline counts
-L, --max-line-length print the length of the longest line
-w, --words print the word counts
--help display this help and exit
--version output version information and exit
Report bugs to <jimd+python at starshine.org>'''
version = """Python word count: wc(1) emulation by James T. Dennis
version 0.1"""
def options():
"""Process command line options"""
import getopt
short = "clLw"
long = ('bytes', 'chars', 'lines', 'max-line-length',
'words', 'help', 'version')
try:
opts, args = getopt.getopt(sys.argv[1:], short, long)
except getopt.GetoptError,err:
msg = "wc: invalid option \nTry `wc --help' for more information."
print >> sys.stderr, sys.argv[0], err
print >> sys.stderr, msg
sys.exit(1)
return opts, args
def count(f=None):
"""Count and return words, chars, lines, and maxlength"""
# We count them all, since that's and much easier than alot of
# conditional logic to decide what to count.
# We return it all, and main() can decide what to return.
lines = words = chars = maxline = 0
if f == None: file = sys.stdin
else:
if os.path.isdir(f):
print >> sys.stderr, "wc: %s: Is a directory" % f
return lines, words, chars, maxline
try:
file = open(f,'r')
except IOError:
print >> sys.stderr, "Error opening:", f
return lines, words, chars, maxline
# If we get this far, we can count stuff
for line in file:
length = len(line)
lines += 1
chars += length
words += len(line.split())
if length - 1 > maxline: maxline = length - 1
# GNU wc doesn't count line terminator in maxlength?
# +++ binary files anve much different line length semantics!
return lines, words, chars, maxline
def printcount(flags, totals, filename=None):
"""Print counts for each file and for the grand totals
takes two 4-tuples, the flags for which items to print, and
the total lines, words, characters, and max-line-length
and an optional filename"""
if filename == None: filename = ""
dolines, dochars, dowords, domaxln = flags
l, w, c, m = totals
print "", # GNU wc prints one leading space?
if dolines: print "%6d" % l,
if dowords: print "%7d" % w,
if dochars: print "%7d" % c,
if domaxln: print "%7d" % m,
print filename
if __name__ == "__main__":
opts, args = options()
dolines = dochars = dowords = domaxln = 0
for opt,arg in opts:
if opt == '--help':
print help
sys.exit()
elif opt == '--version':
print version
sys.exit()
elif opt in ('-l', '--lines'): dolines = 1
elif opt in ('-c', '--chars', '--bytes'): dochars = 1
elif opt in ('-w', '--words'): dowords = 1
elif opt in ('-L', '--max-line-length'): domaxln = 1
if dolines + dochars + dowords + domaxln == 0:
# None specified so default is to do lines, chars, and words
dolines = dochars = dowords = 1
# Else we do only the ones that are specified
# GNU wc always prints the stats in the same order, regardless
# of the order of the options/switches.
printflags = (dolines, dochars, dowords, domaxln)
if not args:
# No files named: so just do stdin
# No grand totals. and no filename
l, w, c, m = count()
printcount (printflags, (l,w,c,m))
else: # Else we do each file and keep track of grand totals
all_lines = all_words = all_chars = longest_line = 0
files_processed = 0
for i in args:
if i == '-': l, w, c, m = count()
else: l, w, c, m = count(i)
all_lines += l
all_words += w
all_chars += c
if m > longest_line: longest_line = m
printcount (printflags, (l,w,c,m), i)
files_processed += 1
if files_processed > 1: # Print totals
totals = (all_lines, all_words, all_chars, longest_line)
printcount (printflags, totals, "total")
---end included message---
More information about the Python-list
mailing list