regexp for sequence of quoted strings

Alexander Schmolck a.schmolck at gmx.net
Wed May 25 16:55:28 EDT 2005


gry at ll.mit.edu writes:

> I have a string like:
>  {'the','dog\'s','bite'}
> or maybe:
>  {'the'}
> or sometimes:
>  {}
>
> [FYI: this is postgresql database "array" field output format]
>
> which I'm trying to parse with the re module.
> A single quoted string would, I think, be:
>  r"\{'([^']|\\')*'\}"

what about {'dog \\', ...} ?

If you don't need to validate anything you can just forget about the commas
etc and extract all the 'strings' with findall,

The regexp below is a bit too complicated (adapted from something else) but I
think will work:

In [90]:rex = re.compile(r"'(?:[^\n]|(?<!\\)(?:\\)(?:\\\\)*\n)*?(?<!\\)(?:\\\\)*?'")

In [91]:rex.findall(r"{'the','dog\'s','bite'}")
Out[91]:["'the'", "'dog\\'s'", "'bite'"]

Otherwise just add something like ",|}$" to deal with the final } instead of a
comma.

Alternatively, you could also write a regexp to split on the "','" bit and trim
the first and the last split.

'as







More information about the Python-list mailing list