Getting a value that follows string.find()

Dave Angel davea at davea.name
Tue Aug 13 21:31:58 EDT 2013


englishkevin110 at gmail.com wrote:

> I know the title doesn't make much sense, but I didnt know how to explain my problem.
>
> Anywho, I've opened a page's source in URLLIB
> starturlsource = starturlopen.read()
> string.find(starturlsource, '<a href="/profile.php?id=')
> And I used string.find to find a specific area in the page's source.
> I want to store what comes after ?id= in a variable.
> Can someone help me with this?

Python 3.3.0 (default, Mar  7 2013, 00:24:38) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> help(string.find)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'find'

There is no find function in the string module [1].  But assuming
starturlsource is a str, you could do:

pattern =  '<a href="/profile.php?id='
index = starturlsource.find( pattern )

index will then be -1 if there's no match, or have a non-negative value
if a match is found.

In the latter case, you can extract the next 17 characters with

newstr = starturlsource[index+len(pattern):index+len(pattern)+17]

You are of course making several assumptions about the web page, which
are perfectly reasonable since it's a page under your control.  Or is
it?


[1]  Assuming Python 3.3 since you omitted stating the version you're
using.  But even in Python 2.7, using the string.find function is
deprecated in favor of the str method.

-- 
DaveA




More information about the Python-list mailing list