HELP Newbie solve this Problem

Dennis E. Hamilton infonuovo at email.com
Sat Mar 25 01:51:49 EST 2000


Uh, Mark, you left out the richest part of the problem!  Bieber provides a
complete solution and demonstrates that producing the output in the desired
form is a trivial part of the solution, only three of the FORTRAN
statements.

The idea is to derive the frequencies of each letter in a body of text, then
print them out by bucket.  Instead, your program treats the input as already
providing the results and trivially solves for reversing the fields in the
output.   That omits all the work done in lines from the FORTRAN statement
numbered 10 to the close(10) statement.

There is certainly an interesting solution in Python, using dictionaries or
lists, but not so easy as one that doesn't do the key part of the problem.

Bieber's comment is accurate: Getting the output in the desired form is
easy, even in FORTRAN.  The statements from "do 30 ..." to the "30 continue"
do the same thing as your Python sample, except you have added a read to the
loop to have something to output.  Bieber's program first reads the file of
text and builds the letter frequencies in the vector count[1:27]  (FORTRAN
has 1-origin indexing of arrays by default and doesn't use the same slice
model.  I am translating to the Python here.)  As he observes, it is trivial
to output the result in rows with either order of columns.

Bieber finesses the problem of sorting the dictionary and of tallying
characters other than letters a .. z and A .. Z by translating from FORTRAN
character codes to integers, after filtering out the non-letters.  He is
depending on the character codes for letters a .. z being 26 consecutive
integers, and likewise for the capitals, A .. Z.  This is true for any
character code based on ASCII for languages limited to the 26 Roman letters.

It seems to me that it might be just as useful to tally all (single-byte)
characters in the text, and then use a given vector as a source of indices
for which the output is to be produced, in the alphabetical order of
interest for the language community of the text being analyzed.  The basic
solution model is useful for other text-analysis activities: counting lines
(by the number of \n occurrences) and doing frequency distributions, in
decending popularity of character, etc.  Folding upper and lower case
letters together is also part of the original problem, but that's not hard
in Python, as Bieber notes.

There is a bug in the Bieber solution: it is not assured to print a .. z in
the first column. In the print statement, I would replace char(i) by
char(i+ichar('a')-1), and I would put a space at the beginning of each
output line just to be safe from some old FORTRAN printer-spacing artifacts.
If I used Python dictionaries, I would consider using the characters as
keys, rather than synthesizing them from the indices of the count vector.

You want to give it another good-old Python try?

-- Dennis

------------------
Dennis E. Hamilton
InfoNuovo
mailto:infonuovo at email.com
tel. +1-206-779-9430 (gsm)
fax. +1-425-793-0283
http://www.infonuovo.com

-----Original Message-----
From: python-list-admin at python.org
[mailto:python-list-admin at python.org]On Behalf Of Mark Hathaway
Sent: Friday, March 24, 2000 16:06
To: python-list at python.org
Subject: Re: HELP Newbie solve this Problem


>> On Fri, 24 Mar 2000 03:43:38 GMT, race9047 at my-deja.com declaimed the
>> following in comp.lang.python:
>>
>> 7 a
>> 12 b
>> 16 c
>> . . .
>> 3 z
>> But the real challenge is to invert the list to look like this:
>> a 7
>> b 12
>> c 16
>> . . .
>> z 3
>>

> Dennis Lee Bieber wrote:
>
>         I must be from a different world then... The second part is what
> I'd find easy... even in FORTRAN (maybe too easy in FORTRAN)
>
>         integer counts(26)      / 26 * 0 /      ! or is it 0 * 26?
>         character       line*132        ! set for the longest line the
>                                         ! the data file can contain
>         open(10, file="data", status="old")
>         read(10, '(a132)', iostat=ios) line
>    10   continue
>         if (ios .eq. 0) then
>             do 20 i=1, 132              ! or the line length max
>                 if (line(i:i) .ge. 'a' .and. line(i:i) .le. 'z') then
>                         inx = ichar(line(i:i)) - ichar('a') + 1
>                         counts(inx) = counts(inx) + 1
>                 elseif (line(i:i) .ge. 'A' .and. line(i:i) .le. 'Z')
>      x                                                  then
>                         inx = ichar(line(i:i)) - ichar('A') + 1
>                         counts(inx) = counts(inx) + 1
>                 else
>                         continue        ! just my style
>                 endif
>    20       continue
>             read(10, '(a132)', iostat=ios) line
>         goto 10
>         endif
>         close(10)
>         do 30 i=1, 26
>             print *, char(i), '     ', counts(i)
>    30   continue
>         stop
>
>         It would look much cleaner in Python, and I'd use case
> conversion modules rather than the duplicates in the IF statements...

I can't really believe the Fartran code takes that much space.
I suspect it's given as a set-up, so the Python code to follow
will be all the more impressive. Well, here's a Python version.

import string
lines = open('file.txt').readlines()
for each in lines:
    eachlist = string.split(each)
    str = eachlist[1] + " " + eachlist[0]
    print str

If you'd wanted the output printed back to a file then you could
add something other than the final 'print str' statement.

Amazing isn't it? When a language doesn't get in your way the code
is brief and more readable. It's all the niggling little details
that a language like Fortran or C require that makes them very ugly.


Mark Hathaway
e-mail: hathawa2 at marshall.edu
--
http://www.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list