help with simple regular expression grouping with re
Tim Peters
tim_one at email.msn.com
Tue May 11 11:50:13 EDT 1999
[Bob Horvath, elaborates on his flavor of comma-separated values]
> My problem has CSV that does not cross word boundaries, and does
> not contain quotes within the fields
Plus never has whitespace adjacent to the separating commas? So long as
that's all true, and assuming there's not a newline at the end of a string,
it's enough to do
answer = string.split(s[1:-1], '","')
That is, remove the leading and trailing double quotes, then split on
","
If there is a trailing newline, change s[1:-1] to s[1:-2].
> (I had to check), but probably could some day. I'll have to try it
> and see what it does.
If it does, and an embedded double quote is represented by two adjacent
double quotes, then we're back to regexps; this will do as the guts of the
findall pattern:
"([^"]*(?:""[^"]*)*)"
Or if it uses backslash escapes,
"([^"\\]*(?:\\.[^"\\]*)*)"
There are more obvious ways to write those, but these run faster; see
Friedl's "Mastering Regular Expressions" for detailed explanation. Note
that with any sort of escape convention, regexps can merely *recognize* the
convention and pass it on as-is; you'll need to write some post-regexp code
to undo the escapes (if, of course, that's what you need).
there-are-even-those-who-say-regexps-are-obscure<wink>-ly y'rs - tim
More information about the Python-list
mailing list