String splitting with exceptions
Neil Cerutti
neilc at norwich.edu
Wed Aug 28 14:08:11 EDT 2013
On 2013-08-28, John Levine <johnl at iecc.com> wrote:
> I have a crufty old DNS provisioning system that I'm rewriting and I
> hope improving in python. (It's based on tinydns if you know what
> that is.)
>
> The record formats are, in the worst case, like this:
>
> foo.[DOM]::[IP6::4361:6368:6574]:600::
>
> What I would like to do is to split this string into a list like this:
>
> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>
> Colons are separators except when they're inside square
> brackets. I have been messing around with re.split() and
> re.findall() and haven't been able to come up with either a
> working separator pattern for split() or a working field
> pattern for findall(). I came pretty close with findall() but
> can't get it to reliably match the nothing between two adjacent
> colons not inside brackets.
>
> Any suggestions? I realize I could do it in a loop where I pick
> stuff off the front of the string, but yuck.
A little parser, as Skip suggested, is a good way to go.
The brackets make your string context-sensitive, a difficult
concept to cleanly parse with a regex.
I initially hoped a csv module dialect could work, but the quote
character is (currently) hard-coded to be a single, simple
character, i.e., I can't tell it to treat [xxx] as "xxx".
What about Skip's suggestion? A little parser. It might seem
crass or something, but it really is easier than musceling a
regex into a context sensitive grammer.
def dns_split(s):
in_brackets = False
b = 0 # index of beginning of current string
for i, c in enumerate(s):
if not in_brackets:
if c == "[":
in_brackets = True
elif c == ':':
yield s[b:i]
b = i+1
elif c == "]":
in_brackets = False
>>> print(list(dns_split(s)))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
It'll gag on nested brackets (fixable with a counter) and has no
error handling (requires thought), but it's a start.
--
Neil Cerutti
More information about the Python-list
mailing list