Regex help needed

Tue Jan 10 20:03:39 EST 2006

rh0dium wrote:
> Michael Spencer wrote:
>>   >>> def parse(source):
>>   ...     source = source.splitlines()
>>   ...     original, rest = source[0], "\n".join(source[1:])
>>   ...     return original, rest_eval(get_tokens(rest))
> 
> This is a very clean and elegant way to separate them - Very nice!!  I
> like this alot - I will definately use this in the future!!
> 
>> Cheers
>>
>> Michael
> 
On reflection, this simplifies further (to 9 lines), at least for the test cases 
your provide, which don't involve any nested parens:

  >>> import cStringIO, tokenize
  ...
  >>> def get_tokens2(source):
  ...     src = cStringIO.StringIO(source).readline
  ...     src = tokenize.generate_tokens(src)
  ...     return [token[1][1:-1] for token in src if token[0] == tokenize.STRING]
  ...
  >>> def parse2(source):
  ...     source = source.splitlines()
  ...     original, rest = source[0], "\n".join(source[1:])
  ...     return original, get_tokens2(rest)
  ...
  >>>

This matches your main function for the three tests where main works...

  >>> for source in sources[:3]: #matches your main function where it works
  ...     assert parse2(source) == main(source)
  ...
  Original someFunction
  Orig someFunction Results ['test', 'foo']
  Original someFunction
  Orig someFunction Results ['test  foo']
  Original someFunction
  Orig someFunction Results ['test', 'test1', 'foo aasdfasdf', 'newline', 'test2']

...and handles the case where main fails (I think correctly, although I'm not 
entirely sure what your desired output is in this case:
  >>> parse2(sources[3])
  ('getVersion()', ['@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 (cicln01) 
$'])
  >>>

If you really do need nested parens, then you'd need the slightly longer version 
I posted earlier

Cheers

Michael