[Tutor] about python RE

Michael Janssen Janssen@rz.uni-frankfurt.de
Thu Mar 20 03:49:01 2003


On Wed, 19 Mar 2003, Abdirizak abdi wrote:
> hi everyone,
>
> can anyone give me an idea how to setup a regular expression that
> deals with full stops after the last word is read and commas after the
> words and double quotes in words. I am having a go, any given idea
> will be helpfull.

Hello Abdirizak,

you want to know, how to find all "words" from a natural language
sentence, right? A "word" is, what stands between whitespace (and Start/
End Of Sentence) without leading or trailing quotes, commas and so on,
right?

You will need a set of characters, which are allowed for words:
"[-a-zA-Z0-9]" # correct? note: leading "-" means to take this character
as itself, despite of its special meaning in character sets. This can
be enhanced (obviously). Compare the \w sequence or string.letters .

re.findall("[-a-zA-Z0-9]+", sentence) now already finds any "word" and
leave whitespace and punctuation alone. The regular expression "comsumes"
(while iterate through sentence) any character given in [-a-zA-Z0-9]. It
stops when coming to a character not given (that means: you needn't to
explicitly forbid "not-word-characters").

In case you want to *preserve* punctuation and/or quotes, put it into your
character set.

Is this sufficient for you? If, not please give us an example, where it
isn't.

Michael

>
> thanks in advance
>
>
>
>
>
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!