how to remove the same words in the paragraph

S.Selvam s.selvamsiva at gmail.com
Wed Nov 4 09:04:27 EST 2009


On Wed, Nov 4, 2009 at 4:27 AM, Tim Chase <python.list at tim.thechases.com>wrote:

> kylin wrote:
>
>> I need to remove the word if it appears in the paragraph twice. could
>> some give me some clue or some useful function in the python.
>>
>
> Sounds like homework.  To fail your class, use this one:
>
> >>> p = "one two three four five six seven three four eight"
> >>> s = set()
> >>> print ' '.join(w for w in p.split() if not (w in s or s.add(w)))
> one two three four five six seven eight
>
> which is absolutely horrible because it mutates the set within the list
> comprehension.  The passable solution would use a for-loop to iterate over
> each word in the paragraph, emitting it if it hadn't already been seen.
>  Maintain those words in set, so your words know how not to be seen. ("Mr.
> Nesbitt, would you please stand up?")
>
>
  Can we use inp_paragraph.count(iter_word) to make it simple ?

This also assumes your paragraph consists only of words and whitespace.  But
> since you posted your previous homework-sounding question on stripping out
> non-word/whitespace characters, you'll want to look into using a regexp like
> "[\w\s]" to clean up the cruft in the paragraph.  Neither solution above
> preserves non white-space/word characters, for which I'd recommend using a
> re.sub() with a callback.  Such a callback class might look something like
>
> >>> class Dedupe:
> ...     def __init__(self):
> ...             self.s = set()
> ...     def __call__(self, m):
> ...             w = m.group(0)
> ...             if w in self.s: return ''
> ...             self.s.add(w)
> ...             return w
> ...
> >>> r.sub(Dedupe(), p)
>
> where I leave the definition of "r" to the student.  Also beware of
> case-differences for which you might have to normalize.
>
> You'll also want to use more descriptive variable names than my one-letter
> tokens.
>
> -tkc
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091104/6a5c50d0/attachment-0001.html>


More information about the Python-list mailing list