regexp for sequence of quoted strings
Alexander Schmolck
a.schmolck at gmx.net
Wed May 25 16:55:28 EDT 2005
gry at ll.mit.edu writes:
> I have a string like:
> {'the','dog\'s','bite'}
> or maybe:
> {'the'}
> or sometimes:
> {}
>
> [FYI: this is postgresql database "array" field output format]
>
> which I'm trying to parse with the re module.
> A single quoted string would, I think, be:
> r"\{'([^']|\\')*'\}"
what about {'dog \\', ...} ?
If you don't need to validate anything you can just forget about the commas
etc and extract all the 'strings' with findall,
The regexp below is a bit too complicated (adapted from something else) but I
think will work:
In [90]:rex = re.compile(r"'(?:[^\n]|(?<!\\)(?:\\)(?:\\\\)*\n)*?(?<!\\)(?:\\\\)*?'")
In [91]:rex.findall(r"{'the','dog\'s','bite'}")
Out[91]:["'the'", "'dog\\'s'", "'bite'"]
Otherwise just add something like ",|}$" to deal with the final } instead of a
comma.
Alternatively, you could also write a regexp to split on the "','" bit and trim
the first and the last split.
'as
More information about the Python-list
mailing list