[Tutor] regular expression question
D Elliott
debe at comp.leeds.ac.uk
Thu Apr 7 14:01:39 CEST 2005
I wonder if anyone can help me with an RE. I also wonder if there is an RE
mailing list anywhere - I haven't managed to find one.
I'm trying to use this regular expression to delete particular strings
from a file before tokenising it.
I want to delete all strings that have a full stop (period) when it is not
at the beginning or end of a word, and also when it is not followed by a
closing bracket. I want to delete file names (eg. fileX.doc), and websites
(when www/http not given) but not file extensions (eg. this is in .jpg
format). I also don't want to delete the last word of each sentence just
because it precedes a fullstop, or if there's a fullstop followed by a
closing bracket.
fullstopRe = re.compile (r'\S+\.[^)}]]+')
I've also tried
fullstopRe = re.compile (r'\S+[.][^)}]]+')
I understand this to represent - any character one or more times, a full
stop (I'm using the backslash, or putting it in a character class to make
it literal), then any character but not any kind of closing bracket, one
or more times.
If I forget about the bracket exceptions, the following works:
fullstopRe = re.compile (r'\S+[.]\S+')
But the scripts above are not deleting eg. bbc.co.uk
Can anyone enlighten me?
Thanks
Debbie
--
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: debe at comp.leeds.ac.uk
***************************************************
More information about the Tutor
mailing list