[Python-Dev] Proof of the pudding: str.partition()

Thu Sep 1 13:26:56 CEST 2005

Charles Cazabon wrote:
>>also, a Boolean positional argument is a really poor clue about its meaning,
>>and it's easy to misremember the sense reversed.
> 
> 
> I totally agree.  I therefore borrowed the time machine and modified my
> proposal to suggest it should be a keyword argument, not a positional one :).

The best alternative to rpartition I've encountered so far is Reinhold's 
proposal of a 'separator index' that selects which occurrence of the separator 
in the string should be used to perform the partitioning. However, even it 
doesn't measure up, as you will see if you read on. . .

The idea is that, rather than "partition(sep)" and "rpartition(sep)", we have 
a single method "partition(sep, [at_sep=1])".

The behaviour could be written up like this:
"""
Partition splits the string into three pieces (`before`, `sep`, `after`) - the 
part of the string before the separator, the separator itself and the part of 
the string after the separator. If the relevant portion of the string doesn't 
exist, then the corresponding element of the tuple returned is the empty string.

The `at_sep` argument determines which occurence of the separator is used to 
perform the partitioning. The default value of 1 means the partitioning occurs 
at the 1st occurence of the separator. If the `at_sep` argument is negative, 
occurences of the separator are counted from the end of the string instead of 
the start. An `at_sep` value of 0 will result in the original string being 
returned as the part 'after' the separator.
"""

A concrete implementation is below. Comparing it to Raymond's examples that 
use rpartition, I find that the only benefit in these examples is that the use 
of the optional second argument is far harder to miss than the single 
additional letter in the method name, particularly if partition and rpartition 
are used close together. Interestingly, out of 31 examples in Raymond's patch, 
only 7 used rpartition.

The implementation, however, is significantly less obvious than that for the 
simple version, and likely slower due to the extra conditional, the extra list 
created, and the need to use join.

It also breaks symmetry with index/rindex and split/rsplit.

Additionally, if splitting on anything other than the first or last occurence 
of the separator was going to be a significant use case for str.partition, 
wouldn't the idea have already come up in the context of str.find and str.index?

I actually thought the 'at_sep' argument was a decent idea when I started 
writing this message, but I have found my arguments in favour of it to be 
wholly unconvincing, and the arguments against it perfectly sound ;)

Cheers,
Nick.

def partition(s, sep, at_sep=1):
     """ Returns a three element tuple, (head, sep, tail) where:

         head + sep + tail == s
         sep == '' or sep is t
         bool(sep) == (t in s)       # sep indicates if the string was
found

     >>> s = 'http://www.python.org'
     >>> partition(s, '://')
     ('http', '://', 'www.python.org')
     >>> partition(s, '?')
     ('http://www.python.org', '', '')
     >>> partition(s, 'http://')
     ('', 'http://', 'www.python.org')
     >>> partition(s, 'org')
     ('http://www.python.', 'org', '')

     """
     if not isinstance(t, basestring) or not t:
         raise ValueError('partititon argument must be a non-empty
string')
     if at_sep == 0:
         result = ('', '', s)
     else:
         if at_sep > 0:
             parts = s.split(sep, at_sep)
             if len(parts) <= at_sep:
                 result = (s, '', '')
             else:
                 result = (sep.join(parts[:at_sep]), sep, parts[at_sep])
         else:
             parts = s.rsplit(sep, at_sep)
             if len(parts) <= at_sep:
                 result = ('', '', s)
             else:
                 result = (parts[0], sep, sep.join(parts[1:]))
     assert len(result) == 3
     assert ''.join(result) == s
     assert result[1] == '' or result[1] is sep
     return result

import doctest
print doctest.testmod()

==================================
**** Standard lib comparisons ****
==================================
=====CGIHTTPServer.py=====
       def run_cgi(self):
           """Execute a CGI script."""
           dir, rest = self.cgi_info
!         rest, _, query = rest.rpartition('?')
!         script, _, rest = rest.partition('/')
           scriptname = dir + '/' + script
           scriptfile = self.translate_path(scriptname)
           if not os.path.exists(scriptfile):

       def run_cgi(self):
           """Execute a CGI script."""
           dir, rest = self.cgi_info
!         rest, _, query = rest.partition('?', at_sep=-1)
!         script, _, rest = rest.partition('/')
           scriptname = dir + '/' + script
           scriptfile = self.translate_path(scriptname)
           if not os.path.exists(scriptfile):

=====cookielib.py=====
           else:
               path_specified = False
               path = request_path(request)
!             head, sep, _ = path.rpartition('/')
!             if sep:
                   if version == 0:
                       # Netscape spec parts company from reality here
!                     path = head
                   else:
!                     path = head + sep
               if len(path) == 0: path = "/"

           else:
               path_specified = False
               path = request_path(request)
!             head, sep, _ = path.partition('/', at_sep=-1)
!             if sep:
                   if version == 0:
                       # Netscape spec parts company from reality here
!                     path = head
                   else:
!                     path = head + sep
               if len(path) == 0: path = "/"

=====httplib.py=====
       def _set_hostport(self, host, port):
           if port is None:
!             host, _, port = host.rpartition(':')
!             if ']' not in port:         # ipv6 addresses have [...]
                   try:
!                     port = int(port)
                   except ValueError:
!                     raise InvalidURL("nonnumeric port: '%s'" % port)
               else:
                   port = self.default_port
               if host and host[0] == '[' and host[-1] == ']':

       def _set_hostport(self, host, port):
           if port is None:
!             host, _, port = host.partition(':', at_sep=-1)
!             if ']' not in port:         # ipv6 addresses have [...]
                   try:
!                     port = int(port)
                   except ValueError:
!                     raise InvalidURL("nonnumeric port: '%s'" % port)
               else:
                   port = self.default_port
               if host and host[0] == '[' and host[-1] == ']':

=====modulefinder.py=====
               assert caller is parent
               self.msgout(4, "determine_parent ->", parent)
               return parent
!         pname, found, _ = pname.rpartition('.')
!         if found:
               parent = self.modules[pname]
               assert parent.__name__ == pname
               self.msgout(4, "determine_parent ->", parent)

               assert caller is parent
               self.msgout(4, "determine_parent ->", parent)
               return parent
!         pname, found, _ = pname.partition('.', at_sep=-1)
!         if found:
               parent = self.modules[pname]
               assert parent.__name__ == pname
               self.msgout(4, "determine_parent ->", parent)

=====pdb.py=====
           filename = None
           lineno = None
           cond = None
!         arg, found, cond = arg.partition(',')
!         if found and arg:
               # parse stuff after comma: "condition"
!             arg = arg.rstrip()
!             cond = cond.lstrip()
           # parse stuff before comma: [filename:]lineno | function
           funcname = None
!         filename, found, arg = arg.rpartition(':')
!         if found:
!             filename = filename.rstrip()
               f = self.lookupmodule(filename)
               if not f:
                   print '*** ', repr(filename),

           filename = None
           lineno = None
           cond = None
!         arg, found, cond = arg.partition(',')
!         if found and arg:
               # parse stuff after comma: "condition"
!             arg = arg.rstrip()
!             cond = cond.lstrip()
           # parse stuff before comma: [filename:]lineno | function
           funcname = None
!         filename, found, arg = arg.partition(':', at_sep=-1)
!         if found:
!             filename = filename.rstrip()
               f = self.lookupmodule(filename)
               if not f:
                   print '*** ', repr(filename),

*****
               return
           if ':' in arg:
               # Make sure it works for "clear C:\foo\bar.py:12"
!             filename, _, arg = arg.rpartition(':')
               try:
                   lineno = int(arg)
               except:

               return
           if ':' in arg:
               # Make sure it works for "clear C:\foo\bar.py:12"
!             filename, _, arg = arg.partition(':', at_sep=-1)
               try:
                   lineno = int(arg)
               except:

=====smtplib.py=====
           """
           if not port and (host.find(':') == host.rfind(':')):
!             host, found, port = host.rpartition(':')
!             if found:
                   try: port = int(port)
                   except ValueError:
                       raise socket.error, "nonnumeric port"

           """
           if not port and (host.find(':') == host.rfind(':')):
!             host, found, port = host.partition(':', at_sep=-1)
!             if found:
                   try: port = int(port)
                   except ValueError:
                       raise socket.error, "nonnumeric port"

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.blogspot.com