[Tutor] Difficult loop?

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Wed Oct 15 23:59:58 CEST 2008


Dear Tutors,
I needed a program to go through  a word like this "Almuta$r~id"

1-  a, i,u,  and o are the short vowels
2-  ~ is the double consonant.
I wanted the program to do the following:

- For each letter of the word, if the letter is not a short vowel, print the
letter with the 5 preceding letters (that are not short vowels),  then the
letter itself, then the 5 following letters (no short vowels), then the
short vowel following the letter.
- If the letter is followed by the double character "~", do the same, but
instead of printing just the vowel following the letter as the last
character, print the "~" + the short vowel
-  if the letter is not followed by a vowel, print an underscore as the last
character.

- if there a "+" sign, ignore it.
- If there are fewer than  5 preceding or following letters, print an
underscore in place of each missing letter.

For example, the word "Almuta$r~id" would be printed as follows:

_  _  _  _  _  A  l  m  t  $  r  _
_  _  _  _  A  l  m  t  $  r  d  _
_  _  _  A  l  m  t  $  r  d  _  u
_  _  A  l  m  t  $  r  d  _  _  a
_  A  l  m  t  $  r  d  _  _  _  a
A  l  m  t  $  r  d  _  _  _  _  ~i
l  m  t  $  r  d  _  _  _  _  _  _

I took the problem to a friend of mine who is a Haskel programmer. He wrote
the following python script which works perfectly, but I'm wondering whether
there is an easier, more Pythonic, way to write this:


# Given a transliterated Arabic text like:
# luwnog biyt$ Al+wilAy+At+u ...

# + are to be ignored
# y, A, and w are the long vowels
# shadda is written ~

# produce a table with:
# (letter, vowel, pre5, post5)
# where vowel is either - if the letter is not followed by a vowel or a
vowel
# or a shadda with a vowel, pre5 and post5 are the letters surrounding the
# letter

###############################################################################

# Initializes a few variables to be used or updated by the main loop below
space = ' '
plus = '+'
dash = '_'
shadda = '~'
vowels = ('a', 'e', 'i', 'o', 'u')
skips = (space,plus) + vowels

# a small input for testing

inputString = "Al+muta$ar~id+i Al+muta$ar~id+i Al+muta$ar~id+i"

# for convenience surround the input with six dashes both ways to make the
# loop below uniform

def makeWord (s): return dash*6 + s + dash*6

# A few utility constants, variables, and functions...

def shiftLeft (context, ch): return context[1:] + (ch,)

def isSkip (ch): return (ch in skips)

def isShadda (ch): return (ch == shadda)

def isVowel (ch): return (ch in vowels)

def nextCh (str,i):
    c = str[i]
    try:
        if isSkip(c):
            return nextCh(str,i+1)
        elif isShadda(str[i+1]):
            if isVowel(str[i+2]):
                return (c,str[i+1:i+3],i+3)
            else:
                return (c,dash,i+2)
        elif isVowel(str[i+1]):
            return (c,str[i+1],i+2)
        else:
            return (c,dash,i+1)
    except IndexError:
        return (c,dash,i+1)

def advance (str,pre,post,horizon):
    (cc,cv) = post[0]
    (hc,hv,nextHorizon) = nextCh(str,horizon)
    nextPre = shiftLeft(pre,(cc,cv))
    nextPost = shiftLeft(post,(hc,hv))
    return (cc,cv,nextPre,nextPost,nextHorizon)

def printLine (cc,cv,pre,post):
    if cc == dash: return
    simplePre = [c for (c,v) in pre]
    simplePost = [c for (c,v) in post[1:]]
    for c in simplePre: print "%s " % c,
    print "%s " % cc,
    for c in simplePost: print "%s " % c,
    print cv

def processWord (str):
    d = (dash,dash)
    pre = (d,d,d,d,d)
    post = (d,d,d,d,d,d)
    horizon = 6
    while horizon < len(str):
        (cc,cv,nextPre,nextPost,nextHorizon) = advance(str,pre,post,horizon)
        printLine(cc,cv,pre,post)
        pre = nextPre
        post = nextPost
        horizon = nextHorizon

def main ():
    strlist = map(makeWord, inputString.split())
    map(processWord,strlist)

main()


###############################################################################



-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
http://emnawfal.googlepages.com
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20081015/468f4269/attachment-0001.htm>


More information about the Tutor mailing list