String splitting with exceptions

Neil Cerutti neilc at norwich.edu
Wed Aug 28 14:08:11 EDT 2013


On 2013-08-28, John Levine <johnl at iecc.com> wrote:
> I have a crufty old DNS provisioning system that I'm rewriting and I
> hope improving in python.  (It's based on tinydns if you know what
> that is.)
>
> The record formats are, in the worst case, like this:
>
> foo.[DOM]::[IP6::4361:6368:6574]:600::
>
> What I would like to do is to split this string into a list like this:
>
> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>
> Colons are separators except when they're inside square
> brackets.  I have been messing around with re.split() and
> re.findall() and haven't been able to come up with either a
> working separator pattern for split() or a working field
> pattern for findall().  I came pretty close with findall() but
> can't get it to reliably match the nothing between two adjacent
> colons not inside brackets.
>
> Any suggestions? I realize I could do it in a loop where I pick
> stuff off the front of the string, but yuck.

A little parser, as Skip suggested, is a good way to go.

The brackets make your string context-sensitive, a difficult
concept to cleanly parse with a regex.

I initially hoped a csv module dialect could work, but the quote
character is (currently) hard-coded to be a single, simple
character, i.e., I can't tell it to treat [xxx] as "xxx".

What about Skip's suggestion? A little parser. It might seem
crass or something, but it really is easier than musceling a
regex into a context sensitive grammer.

def dns_split(s):
    in_brackets = False
    b = 0 # index of beginning of current string
    for i, c in enumerate(s):
        if not in_brackets:
            if c == "[":
                in_brackets = True
            elif c == ':':
                yield s[b:i]
                b = i+1
        elif c == "]":
            in_brackets = False

>>> print(list(dns_split(s)))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']

It'll gag on nested brackets (fixable with a counter) and has no
error handling (requires thought), but it's a start.

-- 
Neil Cerutti



More information about the Python-list mailing list