String parsing
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue May 8 21:43:44 EDT 2007
En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <john at datavoiceint.com>
escribió:
> The string below is a piece of a longer string of about 20000
> characters returned from a web page. I need to isolate the number at
> the end of the line containing 'LastUpdated'. I can find
> 'LastUpdated' with .find but not sure about how to isolate the
> number. 'LastUpdated' is guaranteed to occur only once. Would
> appreciate it if one of you string parsing whizzes would take a stab
> at it.
> <input type="hidden" name="RFP" value="-1"/>
> <!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
> <input type="hidden" name="EnteredBy" value="john"/>
> <input type="hidden" name="ServiceIndex" value="1"/>
> <input type="hidden" name="LastUpdated" value="1178658863"/>
> <input type="hidden" name="NextPage" value="../active/active.php"/>
> <input type="hidden" name="ExistingStatus" value="10" ?>
> <table width="98%" cellpadding="0" cellspacing="0" border="0"
> align="center"
You really should use an html parser here. But assuming that the page will
not change a lot its structure you could use a regular expression like
this:
expr = re.compile(r'name\s*=\s*"LastUpdated"\s+value\s*=\s*"(.*?)"',
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for
the reader)
Note that <input value="1178658863" type="hidden" name="LastUpdated" /> is
as valid as your html, but won't match the expression.
--
Gabriel Genellina
More information about the Python-list
mailing list