Getting a value that follows string.find()

Steven D'Aprano steve at pearwood.info
Wed Aug 14 02:29:39 EDT 2013


On Tue, 13 Aug 2013 16:03:46 -0700, englishkevin110 wrote:


> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
[fixing Joel's top-posting]

>> On Tue, Aug 13, 2013 at 6:51 PM,  <> wrote:
>> 
>> > I know the title doesn't make much sense, but I didnt know how to
>> > explain my problem.
>> 
>> 
>> >
>> > Anywho, I've opened a page's source in URLLIB
>> 
>> > starturlsource = starturlopen.read()
>> 
>> > string.find(starturlsource, '<a href="/profile.php?id=')
>> 
>> > And I used string.find to find a specific area in the page's source.
>> 
>> > I want to store what comes after ?id= in a variable.
>> 
>> > Can someone help me with this?


>> lookup urlparse for you answer


> I dont want to do any kind of HTML parsing.


What you are doing *is* HTML parsing, or at least a half-baked, fragile, 
likely to go wrong form of parsing.

But if you insist, the algorithm is simple: after calling find(), you 
have the offset to the search string. You know the length of the search 
string. Therefore you can calculate the index of the first character that 
follows the search string:

text = "blah blah blah blah spam spam... blah blah blah blah..."
needle = "spam spam"  # what we search for

i = text.find(needle)
if i == -1:
    print("not found")
else:
    print(text[i+len(needle):])


Of course, the problem is, you need to know not just the *start* offset 
of the bit that follows, but the *ending* offset as well. Which brings 
you into the realm of half-arsed parsing.



-- 
Steven



More information about the Python-list mailing list