regexp qns

Gabriel Genellina gagsl-py at yahoo.com.ar
Sat Jan 20 01:13:27 EST 2007


<eight02645999 at yahoo.com> escribió en el mensaje 
news:1169268024.164643.71320 at l53g2000cwa.googlegroups.com...
> hi
> suppose i have a string like
>
> test1?test2t-test3*test4*test5$test6#test7*test8
>
> how can i construct the regexp to get test3*test4*test5 and
> test7*test8, ie, i want to match * and the words before and after?
> thanks

I suppose this is just an example and you mean "any word" instead of test1, 
test2, etc.
So your pattern would be: word*word*word*word, that is, word* repeated many 
times, followed by another word.
To match a word we'll use "\w+", to match an * we have to use "\*" (it's a 
special character)
So the regexp would be: "(\w+\*)+\w+"
Since we are not interested in the () as a group by itself -it was just to 
describe the repeating pattern- we change it into a non-grouping 
parenthesis.
Final version: "(?:\w+\*)+\w+"

import re
rexp = re.compile(r"(?:\w+\*)+\w+")
lines = [
 'test1?test2t-test3*test4*test5$test6#test7*test8',
 'test1?test2t-test3*test4$test6#test7_test8',
 'test1?nada-que-ver$esto.no.matchea',
 'test1?test2t-test3*test4*',
 'test1?test2t-test3*test4',
 'test1?test2t-test3*',
]

for line in lines:
  print line
  for txt in rexp.findall(line):
    print '->', txt

Test it with some corner cases and see if it does what you expect: no "*", 
starting with "*", ending with "*", embedded whitespace before and after the 
"*", whitespace inside a word, the very definition of "word"...

-- 
Gabriel Genellina 





More information about the Python-list mailing list