Need better string methods

Stephen Horne steve at ninereeds.fsnet.co.uk
Sun Mar 7 13:10:53 EST 2004


On Sat, 06 Mar 2004 12:01:16 -0700, David MacQuigg <dmq at gain.com>
wrote:

># Ruby:
># clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)
>
>This is pretty straight-forward once you know what each of the methods
>do.
>
># Current best Python:
>clean = [' '.join(t.split()).strip('.') for t in line.split('|')]

So what you are saying is that non-programmers just naturally
understand what "/\s*\|\s*/" means!

I kind of agree with you about the join method - I far prefer the now
deprecated function. But it's not much of a problem - you don't _have_
to use method-call syntax for Python, just get the unbound method from
the class and call it with the object as the first parameter...

>>> str.join (' ', ['a', 'b', 'c'])
'a b c'

I guess I see the advantage in the Ruby form. It can of course be
replicated in Python using a library, but being able to handle the
task as neatly by default would be a plus.

So, how about this...

>>> line.lstrip ('.'); re.sub (' +', ' ', _).strip (); re.split (' ?\| ?', _)
'/bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n'
'/bgref/stats.stf| SPICE | 3.2.7 | John Anderson'
['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']



Using ';' and '_', you can chain any functions or methods you want.
The downsides are (1) it only works at the command line, and (2) you
get intermediate results displayed.

A temporary variable can handle both issues, of course...

>>> t=line.lstrip('.'); t=re.sub(' +', ' ', t).strip(); re.split(' ?\| ?', t)
['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']


or, to save some hassle...

>>> def squeeze (p) :
...   return re.sub (' +', ' ', p)
...
>>> t=line.lstrip('.'); t=squeeze(t).strip(); re.split(' ?\| ?', t)
['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']


On this basis, perhaps it would be useful to support the '_' variable
outside of the command line, and maybe to suppress all but the last
result when ';' is used on the command line.

OTOH, as you suggest, maybe we could use some extra string methods.
With an equivalent to the Ruby 'squeeze' and support for regular
expression methods, we could write...

line.strip().lstrip('.').squeeze().resplit(' ?\| ?')

Which is very much like the Ruby example.

Finally, it seems to me that this kind of tidy-and-split is probably a
common requirement. The split is easy enough, but after pondering
Robert Brewers argument I wondered if maybe a specialised tidying
class could do the job...

import re

class cleaner :
  steps = []

  def lstrip (self, *args) :
    self.steps.append (lambda s : s.lstrip (*args))
    return self

  def rstrip (self, *args) :
    self.steps.append (lambda s : s.rstrip (*args))
    return self

  def strip (self, *args) :
    self.steps.append (lambda s : s.strip (*args))
    return self

  def squeeze (self) :
    pat = re.compile (' +')
    self.steps.append (lambda s : pat.sub (' ', s))
    return self

  def resub (self, regex, rep) :
    pat=re.compile (regex)
    self.steps.append (lambda s : pat.sub (rep, s))
    return self

  def clean (self, p) :
    for i in self.steps :
      p = i (p)
    return p

line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"

mycleaner = cleaner().lstrip(".").strip()            \
                     .squeeze().resub(' ?\| ?','|')

print mycleaner.clean(line).split("|")


-- 
Steve Horne

steve at ninereeds dot fsnet dot co dot uk



More information about the Python-list mailing list