help with simple regular expression grouping with re

Tim Peters tim_one at email.msn.com
Tue May 11 11:50:13 EDT 1999


[Bob Horvath, elaborates on his flavor of comma-separated values]
> My problem has CSV that does not cross word boundaries, and does
> not contain quotes within the fields

Plus never has whitespace adjacent to the separating commas?  So long as
that's all true, and assuming there's not a newline at the end of a string,
it's enough to do

answer = string.split(s[1:-1], '","')

That is, remove the leading and trailing double quotes, then split on

    ","

If there is a trailing newline, change s[1:-1] to s[1:-2].

> (I had to check), but probably could some day.   I'll have to try it
> and see what it does.

If it does, and an embedded double quote is represented by two adjacent
double quotes, then we're back to regexps; this will do as the guts of the
findall pattern:

    "([^"]*(?:""[^"]*)*)"

Or if it uses backslash escapes,

    "([^"\\]*(?:\\.[^"\\]*)*)"

There are more obvious ways to write those, but these run faster; see
Friedl's "Mastering Regular Expressions" for detailed explanation.  Note
that with any sort of escape convention, regexps can merely *recognize* the
convention and pass it on as-is; you'll need to write some post-regexp code
to undo the escapes (if, of course, that's what you need).

there-are-even-those-who-say-regexps-are-obscure<wink>-ly y'rs  - tim






More information about the Python-list mailing list