re question
Cliff Crawford
cjc26 at nospam.cornell.edu
Fri Oct 15 19:03:19 EDT 1999
Pada Fri, 15 Oct 1999 14:26:50 -0700, Max M. Stalnaker bilang:
| I have the following code:
|
| def subset(self):
| group=re.search(r"%%%([^%]+)%%%",self.data)
| self.data=group.groups(0)[0]
|
| Essentially, I get a html page, change some tags to %%% and extract the
| stuff between. But the way I do it above fails if the stuff between has a
| single %. The main goal is to extract the stuff. The changing the tags is
| just the way I tried and had sometime success.
|
| Maybe there is a better way to do this. Or someone could perhaps suggest re
| code that would do it. Thank you.
You could try using re.split:
>>> str=r"blah blah blah %%%important stuff to be extracted%%%more useless junk"
>>> import re
>>> re.split(r"%%%", str)
['blah blah blah ', 'important stuff to be extracted', 'more useless junk']
and use the odd-numbered items of the returned list.
| My current idea is to construct a single character sentinel out of something
| greater than chr(128) and use that. This will probably work in this
| application, but I feel like I am missing something.
Another option is to use '$'; it's pretty common to see
"$Last modified: 10 Oct 1999 4:23 PM$"
or something similar on a web page.
--
cliff crawford http://www.people.cornell.edu/pages/cjc26/
There are more stars in the sky than there are
-><- grains of sand on all the beaches of the world.
More information about the Python-list
mailing list