String splitting with exceptions

random832 at fastmail.us random832 at fastmail.us
Wed Aug 28 13:14:03 EDT 2013


On Wed, Aug 28, 2013, at 12:44, John Levine wrote:
> I have a crufty old DNS provisioning system that I'm rewriting and I
> hope improving in python.  (It's based on tinydns if you know what
> that is.)
> 
> The record formats are, in the worst case, like this:
> 
> foo.[DOM]::[IP6::4361:6368:6574]:600::
> 
> What I would like to do is to split this string into a list like this:
> 
> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
> 
> Colons are separators except when they're inside square brackets.  I
> have been messing around with re.split() and re.findall() and haven't
> been able to come up with either a working separator pattern for
> split() or a working field pattern for findall().  I came pretty
> close with findall() but can't get it to reliably match the
> nothing between two adjacent colons not inside brackets.
> 
> Any suggestions? I realize I could do it in a loop where I pick stuff
> off the front of the string, but yuck.
> 
> This is in python 2.7.5.

Can you have brackets within brackets? If so, this is impossible to deal
with within a regex.

Otherwise:
>>> re.findall('((?:[^[:]|\[[^]]*\])*):?',s)
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

I'm not sure why _your_ list only has one empty string at the end. Is
the record always terminated by a colon that is not meant to imply an
empty field after it? If so, remove the question mark:

>>> re.findall('((?:[^[:]|\[[^]]*\])*):',s)
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']

I've done this kind of thing (for validation, not capturing) for email
addresses (there are some obscure bits of email address syntax that need
it) before, so it came to mind immediately.



More information about the Python-list mailing list