re question

Cliff Crawford cjc26 at nospam.cornell.edu
Fri Oct 15 19:03:19 EDT 1999


Pada Fri, 15 Oct 1999 14:26:50 -0700, Max M. Stalnaker bilang:
| I have the following code:
| 
|  def subset(self):
|   group=re.search(r"%%%([^%]+)%%%",self.data)
|   self.data=group.groups(0)[0]
| 
| Essentially, I get a html page, change some tags to %%% and extract the
| stuff between.  But the way I do it above fails if the stuff between has a
| single %.  The main goal is to extract the stuff.  The changing the tags is
| just the way I tried and had sometime success.
| 
| Maybe there is a better way to do this.  Or someone could perhaps suggest re
| code that would do it.  Thank you.

You could try using re.split:

>>> str=r"blah blah blah %%%important stuff to be extracted%%%more useless junk"
>>> import re
>>> re.split(r"%%%", str)
['blah blah blah ', 'important stuff to be extracted', 'more useless junk']

and use the odd-numbered items of the returned list.


| My current idea is to construct a single character sentinel out of something
| greater than chr(128) and use that.  This will probably work in this
| application, but I feel like I am missing something.

Another option is to use '$'; it's pretty common to see
"$Last modified: 10 Oct 1999 4:23 PM$"
or something similar on a web page.


-- 
cliff crawford   http://www.people.cornell.edu/pages/cjc26/
            There are more stars in the sky than there are
-><-        grains of sand on all the beaches of the world.




More information about the Python-list mailing list