split encloser

Robin Munn rmunn at pobox.com
Fri Apr 4 12:29:57 EST 2003


Jason Tiller <jtiller at sjm.com> wrote:
> Hi, "Aussie", :)
> 
> On 3 Apr 2003 aussie2010 at yahoo.com wrote:
> 
>> string.split() takes a delimiter and works fine as long as the
>> delimiter isn't part of the data fields. But frequently they are.
>> e.g. 'John Doe,135 South Main St.,#122, Springfield, Iowa' or
>>       ' so long goodbye see ya'
> 
>> Because the fields can contain the delimiter in some cases, an
>> encloser is usually used (typically "") to handle those fields.
> 
>> The above strings would be written:
>>
>> 'John Doe,"135 South Main St., #122", Springfield, Iowa'
>>    and
>> '"so long" goodbye "see ya"'
> 
>> I don't understand regular expressions but I was wondering if anyone
>> that did knew of a way to get re.split() to handle "enclosers" as
>> used above.
> 
> Hmm.  I am not yet a knowledgable user of Python's regex features, but
> I *do* know Perl's pretty well.  With Perl, you might split up your
> fields like this:

[snip regex-based solution]

My rule of thumb regarding regular expressions is that if you can't grok
the *entire* regex in fifteen seconds or less, you're almost certainly
better off with some other solution. Split the regex, use string
methods, anything -- just don't use any regex so complicated that it
takes you more than fifteen seconds to understand it. I'm sure Jason
Tiller's solution is a clever one, but I took one look at that regex and
said, "Too hard -- there's got to be a better way."

In this case, I believe that better way is to use the csv module. It
will be in the standard library in Python 2.3; meanwhile, you can
download and use the csv module written by Object Craft:

    http://www.object-craft.com.au/projects/csv/

As a parting shot, here's a quote from Jamie Zawinski:

    Some people, when confronted with a problem, think "I know,
    I'll use regular expressions." Now they have two problems.

-- 
Robin Munn <rmunn at pobox.com>
http://www.rmunn.com/
PGP key ID: 0x6AFB6838    50FF 2478 CFFB 081A 8338  54F7 845D ACFD 6AFB 6838




More information about the Python-list mailing list