replace only full words

Tim Chase python.list at tim.thechases.com
Sat Sep 28 12:54:35 EDT 2013


On 2013-09-28 09:11, cerr wrote:
> I have a list of sentences and a list of words. Every full word
> that appears within sentence shall be extended by <WORD> i.e. "I
> drink in the house." Would become "I <drink> in the <house>." (and
> not "I <d<rink> in the <house>.")

This is a good place to reach for regular expressions.  It comes with
a "ensure there is a word-boundary here" token, so you can do
something like the code at the (way) bottom of this email.  I've
pushed it off the bottom in the event you want to try and use regexps
on your own first.  Or if this is homework, at least make you work a
*little* :-)

> Also, is there a way to make it faster?

The code below should do the processing in roughly O(n) time as it
only makes one pass through the data and does O(1) lookups into your
set of nouns.  I included code in the regexp to roughly find
contractions and hyphenated words.  Your original code grows slower
as your list of nouns grows bigger and also suffers from
multiple-replacement issues (if you have the noun-list of ["drink",
"rink"], you'll get results that you don't likely want.

My code hasn't considered case differences, but you should be able to
normalize both the list of nouns and the word you're testing in the
"modify()" function so that it would find "Drink" as well as "drink"

Also, note that some words serve both as nouns and other parts of
speech, e.g. "It's kind of you to house me for the weekend and drink
tea with me."

-tkc

































import re

r = re.compile(r"""
  \b    # assert a word boundary
  \w+   # 1+ word characters
  (?:   # a group
   [-']  # a dash or apostrophe
   \w+   # followed by 1+ word characters
   )?    # make the group optional (0 or 1 instances)
  \b    # assert a word boundary here
  """, re.VERBOSE)

nouns = set([
  "drink",
  "house",
  ])

def modify(matchobj):
  word = matchobj.group(0)
  if word in nouns:
    return "<%s>" % word
  else:
    return word

print r.sub(modify, "I drink in the house")



More information about the Python-list mailing list