question on pattern

Kragen Sitaker kragen at pobox.com
Mon Jun 10 18:19:44 EDT 2002


sjmachin at lexicon.net (John Machin) writes:
> You have got an answer to your problem of how to get DOTALL behaviour
> with .findall(). However you should be aware that your pattern can
> match strings that are not syntactically C or C++ comments. Here is an
> example from page 172 of "Mastering Regular Expressions" by Jeffrey E.
> F. Friedl:
> 
>    const char *cstart = "/*", *cend = "*/";
> 
> I'd agree that it's a pathological case. I'm just pointing out that in
> most work on program source files, to be 100% correct you need to have
> a lexical analyser for the language in question. Regexes won't take
> you the whole distance.

It's true that you need a lexical analyzer for the language, but most
languages (C and C++) can be lexically analyzed with a regular
expression.  Indeed, lex essentially compiles a big regular expression
into C.

It's also possible that a lexical analyzer suitable for parsing will
make a lot of distinctions, such as between { and 3 and strcpy, that
aren't relevant for some particular application.  In particular, a
regular expression to find comments that handles C string and
character constants correctly is relatively simple, although harder
than you'd think.  Friedl brings up the problem on page 172, as you
say, but solves it on page 173.




More information about the Python-list mailing list