Need better string methods

Skip Montanaro skip at pobox.com
Sun Mar 7 09:29:21 EST 2004


    David> I am convinced that Python can do anything that can be done by
    David> these CPL's, but I know it will be an uphill battle getting
    David> design engineers to learn yet another scripting language....

    David> The resistance will come from people who throw at us little bits
    David> and pieces of code that can be done more easily in their chosen
    David> CPL.

Then throw little bits and pieces of code back at them that can be done more
easily in Python. <0.5 wink>

    David> String processing, for example, is one area where we may face
    David> some difficulty.

    ...

    David> # Ruby:
    David> # clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)

    David> This is pretty straight-forward once you know what each of the
    David> methods do.

    David> # Current best Python:
    David> clean = [' '.join(t.split()).strip('.') for t in line.split('|')]

    David> This is too much to expect of a non-programmer, even one who
    David> undestands the methods.

    ...

My arguments from the "Zen of Python" would be:

    Beautiful is better than ugly.
    Simple is better than complex.
    Sparse is better than dense.
    Readability counts.

These aphorisms are especially important for non-programmers.  They simply
aren't going to be able to remember what the above Ruby or Python code does
in six months without at least a little bit of study, especially if it's
buried in other similar code.  That study will distract them, however
momentarily, from the actual task at hand.  That breaks their chain of
concentration on the actual task at hand and lowers their productivity.

To that end, my proposed solution for your string smashing problem would be
something like:

    import csv

    for row in csv.reader(file("gradoo.csv"), delimiter='|'):
        print row
        # elide spaces
        row = [" ".join(s.split()) for s in row]
        print row
        # trim leading ...
        row = [s.lstrip(".") for s in row]
        print row

given that gradoo.csv contains the line from your example.  The advantages
that I see are:

    * it's got some simple comments which identify the work being done

    * it's easier to add new operations if needed in the future

    * avoiding long chains of string methods makes the code easier to read

Skip




More information about the Python-list mailing list