[Edu-sig] [edupython] Python in Education Advocacy Article
Toby Donaldson
tjd at sfu.ca
Tue Mar 27 06:39:34 CEST 2007
Actually, it appears your code makes a common error: according to the
problem specification, when frequencies are tied, they should be
ordered alphabetically. But your code orders tied frequencies in
reverse, e.g.
7 which # oops: wrong order
7 this
7 there
7 one
7 me
7 by
7 about
7 ``the
Toby
On 3/26/07, Michael Tobis <mtobis at gmail.com> wrote:
> <grin>
>
> Seven lines seems reasonable to me:
>
> ##########
> import sys
>
> concord = {}
> for word in [token.lower() for token in
> open(sys.argv[1],"r").read().split()]:
> concord[word] = concord.get(word,0) + 1
> result = sorted([(item[1],item[0]) for item in
> concord.items()],reverse=True)
> for pair in result:
> print "%s\t%s" % pair
> ###########
>
> While I wonder if the problem wasn't set up to be in Python's favor, this
> sort of thing is a big plus for Python in daily use.
>
> Terseness, though, is not enough in general and certainly not in education.
> Perl makes a virtue of terseness, and I have a strong sense it is not very
> useful as a first language.
>
> mt
>
>
> On 3/26/07, Toby Donaldson <tjd at sfu.ca> wrote:
> >
> > Michael,
> >
> > On the sigcse list there was recently a coding challenge
> > (http://www.cs.duke.edu/csed/code) that asked for
> solutions to a word
> > frequency problem in various languages. I believe the author is
> > planning to eventually list all the solutions received (entries came
> > in many different languages); that could make for some interesting
> > comparisons.
> >
> > I wrote one solution in Python, and one in Java, which I give below;
> > needless to say, I found the Python version far easier to write.
> >
> > Toby
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------
> > import sys
> >
> > if __name__ == '__main__':
> > # get the path from the command-line
> > fname = sys.argv[1]
> >
> > # read in the file as a list of tokens
> > tokens = [tok.strip().lower() for tok in open(fname,
> 'r').read().split()]
> >
> > # calculate the frequency of each token
> > freq = {}
> > for tok in tokens:
> > if tok in freq:
> > freq[tok] += 1
> > else:
> > freq[tok] = 1
> >
> > # Sort the list in highest frequency to lowest frequency,
> > # with ties sorted by lexicographics order of the words.
> > # Uses the Python sort-decorate-unsort idiom. We sort by
> > # the negation of the frequency values to get the proper
> > # ordering.
> >
> > lst = [(-freq[tok], tok) for tok in freq] # decorate
> > lst.sort() # sort
> > lst = [(-freq, tok) for freq, tok in lst] # undecorate
> >
> > # print the results
> > for freq, tok in lst:
> > print '%s\t%s' % (freq, tok)
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------
> >
> > import java.io.File;
> > import java.io.IOException;
> > import java.util.ArrayList ;
> > import java.util.Collections;
> > import java.util.Scanner;
> > import java.util.TreeMap;
> > import java.util.TreeSet;
> >
> > public class Puzzle {
> >
> > public static void main(String[] args) throws IOException {
> > // get the file to process
> > String fname = args[0];
> >
> > Scanner sc = new Scanner(new File(fname));
> >
> > // initialize the map for the words and counts;
> > // a TreeMap is always ordered by keys
> > TreeMap<String, Integer> map = new TreeMap<String,
> Integer>();
> >
> > // process the file a line at a time
> > while ( sc.hasNextLine()) {
> > // chop each line into its constituent tokens
> > // "\\s+" is a regular expression that matches
> > one or more
> > // whitespace characters
> > String[] tokens = sc.nextLine().split("\\s+");
> >
> > // make all the strings lower case, and remove
> > any excess whitespace
> > for (int i = 0; i < tokens.length; ++i) {
> > tokens[i] = tokens[i].toLowerCase().trim();
> > }
> >
> > // add each token to the map
> > for (String tok : tokens) {
> > if (map.containsKey(tok)) {
> > map.put(tok,
> map.get(tok) + 1);
> > } else {
> > map.put(tok, 1);
> > }
> > }
> > }
> >
> > // remove the empty string if it is present
> > map.remove("");
> >
> > // sort the data by storing each word that occurs the
> > same number of
> > // times in a TreeMap of sets keyed by count; TreeSet
> stores its
> > // values in sorted order
> > TreeMap<Integer, TreeSet<String>> sortMap = new
> TreeMap<Integer,
> > TreeSet<String>>();
> > for (String tok : map.keySet()) {
> > int count = map.get(tok);
> > if (sortMap.containsKey(count)) {
> > TreeSet<String> arr = sortMap.get(count);
> > arr.add(tok);
> > sortMap.put(count, arr);
> > } else {
> > TreeSet<String> arr = new
> TreeSet<String>();
> > arr.add(tok);
> > sortMap.put(count, arr);
> > }
> > }
> >
> > // print the data
> >
> > // first reverse the keys to print data in the proper order
> > ArrayList<Integer> idx = new ArrayList<Integer>();
> > idx.addAll(sortMap.keySet());
> > Collections.reverse(idx);
> >
> > // print it to stdout
> > for (Integer key : idx) {
> > TreeSet<String> toks = sortMap.get(key);
> > for (String t : toks) {
> > System.out.printf("%s\t%s\n", key, t);
> > }
> > }
> > }
> > }
> > _______________________________________________
> > Edu-sig mailing list
> > Edu-sig at python.org
> > http://mail.python.org/mailman/listinfo/edu-sig
> >
>
>
--
Dr. Toby Donaldson
School of Computing Science
Simon Fraser University (Surrey)
More information about the Edu-sig
mailing list