get a field

Mon Feb 15 13:28:25 EST 2010

  Holden wrote:
> mierdatutis mi wrote:
>> I have this:
>>
>> pe="http://www.rtve.es/mediateca/videos/20100211/saber-comer---patatas-castellanas-costillas-11-02-10/691046.shtml"
>>
>> I would like to extract this: 691046.shtml
>>
>> But is dynamically. Not always have the same lenght the string.
> 
>>>> s = "http://server/path/to/file/file.shtml"
>>>> s.rfind("/")         # finds rightmost "/"
> 26
>>>> s[s.rfind("/")+1:]   # substring starting after "/"
> 'file.shtml'

If I didn't use os.path.basename(s) then I'd write this as 
"s.rsplit('/', 1)[-1]"

   >>> "http://server/path/to/file/file.shtml".rsplit('/', 1)[-1]
   'file.shtml'
   >>> "".rsplit('/', 1)[-1]
   ''
   >>> "file.html".rsplit('/', 1)[-1]
   'file.html'

I don't know how much of a difference it makes, but I always 
appreciate seeing how various people solve the same problem.  I 
tend to lean away from the find()/index() methods on strings 
because I have to stop and think which one raises the exception 
and which one returns -1 (and that it's -1 instead of 0) usually 
dropping to a python shell and doing

   >>> help("".find)
   >>> help("".index)

to refresh my memory.

FWIW, Steve's solution and the os.path.basename() both produce 
the same results with all 3 input values, so it's more a matter 
of personal style preference.  The basename() version has 
slightly different results if you have Windows paths with 
backslashes:

   s = r'c:\path\to\file.txt'

but since you (OP) know that they should be URLs with 
forward-slashes, it should be a non-issue.

-tkc