Need better string methods

David MacQuigg dmq at gain.com
Tue Mar 9 11:52:15 EST 2004


Here is the simplest design pattern so far (taking the suggestions of
Skip Montanaro, Christian Tismer and Stephen Horne a few steps
further):

line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"

import re  # regular-expression module
t = line.lstrip('.').rstrip('\n ')  # strip both ends
t = re.sub ('\s+', ' ', t)  # squeeze excess white space
clean = re.split ('\s*\|\s*', t) # include whitespace

print clean
# ['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']

Notes on this example:
1) Import re is a small minus, but this will probably be needed
elsewhere anyway; it is a long-established part of Python; and it
provides functionality that is well-known in the EDA tools industry.
2) I chose a meaningless temporary variable 't' to chain these
operations together, rather than more colorful names, which I felt
were actually a distraction in this case.  Fancy names suggest - this
is something important you need to rememeber for later.
3) The lstrip and rstrip operations can be chained on one line with no
loss in clarity, which is much more important than compaction.  My
rule is - Use whatever space will improve clarity, but don't waste
any.
4) I have used regular expressions in spite of their complexity,
because in this case, the complexity is 'encapsulated'.  You don't
need to understand the expression to see that the last line splits
around the '|' and includes whitespace.
5) The use of regular expressions allows a pattern like the above code
to serve as a template for other similar tasks. Some of the earlier
solutions relied too much on the specifics of this example.

The design pattern above looks to me like a good general pattern for
processing strings with the limitation that the split operation must
be last.

I still think Ruby has a slight advantage in string processing.  You
just can't beat:
line.chomp.squeeze.split(/\s*\|\s*/)
for clarity and conciseness.  (Again, assuming we can tolerate not
knowing the details of the regex.)

I would like to see Python go one better than Ruby by adding some
methods to simplify working with lists of strings.  The general
pattern is:

string_or_list.op1.op2.split.op3.op4.join.op5.op6

All methods work on either strings or lists of strings.  The split
method makes subsequent operations apply to each split separately.
The join method converts a list of strings back into a simple string.

Having this much flexibility means you don't have to be so clever in
sequencing the operations so the split is last.  In the above example,
we might do something like:

line.lstrip('.').rstrip('\n').squeeze().split('|').strip()

The last strip operation applies to each substring, so we no longer
need a regex in the split.

-- Dave




More information about the Python-list mailing list