[Tutor] Finding the shortest word in a list of words

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Tue Jan 20 22:08:00 CET 2009


On Tue, Jan 20, 2009 at 3:14 PM, Marc Tompkins <marc.tompkins at gmail.com>wrote:

> On Tue, Jan 20, 2009 at 11:23 AM, Lie Ryan <lie.1296 at gmail.com> wrote:
>
>> what I meant as wrong is that it is possible that the code would be used
>> for a string that doesn't represent human language, but arbitrary array
>> of bytes. Also, it is a potential security issue.
>
>
> This is something I need to know, then - sys.maxint is a potential security
> issue?  How?   Should it be avoided?  (Guess I'd better get Googling...)
>
>
>> > You could just simply use the len of the first word.
>> >
>> > True dat.  Requires an extra step or two, though - initializing with
>> > some impossibly huge number is quick.
>>
>> len() is fast, and it also removes the need to import sys, which actually
>> removes an extra step or two
>>
> Unintended consequence - initializing minLen with the length of the first
> word results in trying to append to a list that doesn't exist yet - so must
> create minWord and maxWord ahead of time.  (Could use try/except... no.)
> Also - it occurred to me that the input might be sentences, and sentences
> contain punctuation... I already took the liberty of adding "is" to the end
> of the OP's signature quotation; now I add a period:
>
>> corpus = "No victim has ever been more repressed and alienated than the
>> truth is."
>
> Now "is." has length 3, not 2; probably not what we had in mind.
> So, new version:
>
>
>> def MinMax(corpus=""):
>>     import string
>>     corpus = "".join( [x for x in corpus if x not in string.punctuation] )
>>     words = corpus.split()
>>     minLen = len(words[0])
>>     maxLen = 0
>>     minWord, maxWord = [],[]
>>     for word in words:
>>         curLen = len(word)
>>         if curLen == minLen:
>>             minWord.append(word)
>>         if curLen == maxLen:
>>             maxWord.append(word)
>>         if curLen > maxLen:
>>             maxWord = [word]
>>             maxLen = curLen
>>         if curLen < minLen:
>>             minWord = [word]
>>             minLen = curLen
>>     return minLen, minWord, maxLen, maxWord
>>
>
> Is there a good/efficient way to do this without importing string?
> Obviously best to move the import outside the function to minimize
> redundancy, but any way to avoid it at all?
>
>
> which, at the time of writing, was my impression on the OP's request.
>>
>
> Quote: "I need to find the shortest / longest word(s) in a sequence of
> words."
>
> I'm sure the OP has moved on by now... time I did likewise.
> --
> www.fsrtechnologies.com
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
Yes, I have moved on. I got what i needed . I have nonetheless been
following your advanced discussion trying to understand what you're saying.
Have not done this fully yet  As far as punctuation is concerned, I separate
the punctuation marks from the text before I find for the shortest / longest
words.
I'm working with two agglutinative languages (Swahili and Arabic), and I
wanted to see how much agglutination  there could be in both languages.
Thank you all for your help

-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
http://emnawfal.googlepages.com
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090120/4aa24564/attachment.htm>


More information about the Tutor mailing list