split encloser
Robin Munn
rmunn at pobox.com
Fri Apr 4 12:29:57 EST 2003
Jason Tiller <jtiller at sjm.com> wrote:
> Hi, "Aussie", :)
>
> On 3 Apr 2003 aussie2010 at yahoo.com wrote:
>
>> string.split() takes a delimiter and works fine as long as the
>> delimiter isn't part of the data fields. But frequently they are.
>> e.g. 'John Doe,135 South Main St.,#122, Springfield, Iowa' or
>> ' so long goodbye see ya'
>
>> Because the fields can contain the delimiter in some cases, an
>> encloser is usually used (typically "") to handle those fields.
>
>> The above strings would be written:
>>
>> 'John Doe,"135 South Main St., #122", Springfield, Iowa'
>> and
>> '"so long" goodbye "see ya"'
>
>> I don't understand regular expressions but I was wondering if anyone
>> that did knew of a way to get re.split() to handle "enclosers" as
>> used above.
>
> Hmm. I am not yet a knowledgable user of Python's regex features, but
> I *do* know Perl's pretty well. With Perl, you might split up your
> fields like this:
[snip regex-based solution]
My rule of thumb regarding regular expressions is that if you can't grok
the *entire* regex in fifteen seconds or less, you're almost certainly
better off with some other solution. Split the regex, use string
methods, anything -- just don't use any regex so complicated that it
takes you more than fifteen seconds to understand it. I'm sure Jason
Tiller's solution is a clever one, but I took one look at that regex and
said, "Too hard -- there's got to be a better way."
In this case, I believe that better way is to use the csv module. It
will be in the standard library in Python 2.3; meanwhile, you can
download and use the csv module written by Object Craft:
http://www.object-craft.com.au/projects/csv/
As a parting shot, here's a quote from Jamie Zawinski:
Some people, when confronted with a problem, think "I know,
I'll use regular expressions." Now they have two problems.
--
Robin Munn <rmunn at pobox.com>
http://www.rmunn.com/
PGP key ID: 0x6AFB6838 50FF 2478 CFFB 081A 8338 54F7 845D ACFD 6AFB 6838
More information about the Python-list
mailing list