how to use pyparsing for identifiers that start with a constant string

Tue Jun 14 18:20:35 EDT 2005

phil_nospam_schmidt at yahoo.com wrote:
> I am scanning text that has identifiers with a constant prefix string
> followed by alphanumerics and underscores. I can't figure out, using
> pyparsing, how to match for this. The example expression below seems to
> be looking for whitespace between the 'atod' and the rest of the
> identifier.
> 
> identifier_atod = 'atod' + pp.Word('_' + pp.alphanums)
> 
> How can I get pyparsing to match 'atodkj45k' and 'atod_asdfaw', but not
> 'atgdkasdjfhlksj' and 'atod asdf4er', where the first four characters
> must be 'atod', and not followed by whitespace?

Here is one way using pyparsing.Combine:

 >>> from pyparsing import *
 >>> tests = [ 'atodkj45k', 'atod_asdfaw', 'atgdkasdjfhlksj', 'atod asdf4er']
 >>> ident = Combine(Literal('atod') +  Word('_' + alphanums))
 >>> for t in tests:
 ...   try:
 ...     print ident.parseString(t)
 ...   except:
 ...     print 'No match', t
 ...
['atodkj45k']
['atod_asdfaw']
No match atgdkasdjfhlksj
No match atod asdf4er
 >>>

Kent