Need better string methods

William Park opengeometry at yahoo.ca
Sat Mar 6 15:41:56 EST 2004


David MacQuigg <dmq at gain.com> wrote:
> The resistance will come from people who throw at us little bits and
> pieces of code that can be done more easily in their chosen CPL.
> String processing, for example, is one area where we may face some
> difficulty.  Here is a typical line of garbage from a statefile
> revision control system (simplified to eliminate some items that pose
> no new challenges):
> 
> line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"
> 
> The problem is to break this into its component parts, and eliminate
> spaces and other gradoo.  The cleaned-up list should look like:
> 
> ['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']
> 
> # Ruby:
> # clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)
> 
> This is pretty straight-forward once you know what each of the methods
> do.
> 
> # Current best Python:
> clean = [' '.join(t.split()).strip('.') for t in line.split('|')]

Both Bash shell and Python can split based on regular expression.
However, shell is not a bad alternative here:
    tr -s ' \t' ' ' | sed -e 's/ ?| ?/|/g' -e 's/^ //' -e 's/ $//' |
    while IFS='|' read -a clean; do
	...
    done

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data processing and document management.



More information about the Python-list mailing list