String Manipulation Help!

Sat Jan 28 17:33:25 EST 2006

I really enjoyed your article. I will try to understand this. 
Will you be doing more of this in the future with more complicated examples? 

Paul McGuire wrote:

> "Dave" <davidworley at gmail.com> wrote in message
> news:1138481853.165529.321870 at z14g2000cwz.googlegroups.com...
>> OK, I'm stumped.
>>
>> I'm trying to find newline characters (\n, specifically) that are NOT
>> in comments.
>>
>> So, for example (where "<-" = a newline character):
>> ==========================================
>> 1: <-
>> 2: /*<-
>> 3: ----------------------<-
>> 4:     comment<-
>> 5: ----------------------<-
>> 6: */<-
>> 7: <-
>> 8: CODE CODE CODE<-
>> 9: <-
>> ==========================================
>>
>> I want to return the newline characters at lines 1, 6, 7, 8, and 9 but
>> NOT the others.
>>
> 
> Dave -
> 
> Pyparsing has built-in support for detecting line breaks and comments, and
> the syntax is pretty simple, I think.  Here's a pyparsing program that
> gives your desired results:
> 
> ===============================
> from pyparsing import lineEnd, cStyleComment, lineno
> 
> testsource = """
> /*
> ----------------------
>     comment
> ----------------------
> */
> 
> CODE CODE CODE
> 
> """
> 
> # define the expression you want to search for
> eol = lineEnd
> 
> # specify that you don't want to match within C-style comments
> eol.ignore(cStyleComment.leaveWhitespace())
> 
> # loop through all the occurrences returned by scanString
> # and print the line number of that location within the original string
> for toks,startloc,endloc in eol.scanString(testsource):
>     print lineno(startloc,data)
> ===============================
> 
> The expression you are searching for is pretty basic, just a plain
> end-of-line, or pyparsing's built-in expression, lineEnd.  The curve you
> are throwing is that you *don't* want eol's inside of C-style comments.
> Pyparsing allows you to designate an "ignore" expression to skip
> undesirable content, and fortunately, ignoring comments happens so often
> during parsing, that pyparsing includes common comment expressions for C,
> C++, Java, Python,
> and HTML.  Next, pyparsing's version of re.search is scanString. 
> scanString returns a generator that gives the matching tokens, start
> location, and end location of every occurrence of the given parse
> expression, in your case,
> eol.  Finally, in the body of our for loop, we use pyparsing's lineno
> function to give us the line number of a string location within the
> original string.
> 
> About the only real wart on all this is that pyparsing implicitly skips
> over
> leading whitespace, even when looking for expressions to be ignored.  In
> order not to lose eols that are just before a comment (like your line 1),
> we have to modify cStyleComment to leave leading whitespace.
> 
> Download pyparsing at http://pyparsing.sourceforge.net.
> 
> -- Paul