[Tutor] Parsing problem

Liam Clarke cyresse at gmail.com
Mon Jul 25 06:49:05 CEST 2005


Hi Paul, 

My apologies, as I was jumping into my car after sending that email, it 
clicked in my brain. 
"Oh yeah... initial & body..."

But good to know about how to accept valid numbers.

Sorry, getting a bit too quick to fire off emails here.

Regards, 

Liam Clarke

On 7/25/05, Paul McGuire <paul at alanweberassociates.com> wrote:
> 
> Liam -
> 
> The two arguments to Word work this way:
> - the first argument lists valid *initial* characters
> - the second argument lists valid *body* or subsequent characters
> 
> For example, in the identifier definition,
> 
> identifier = pp.Word(pp.alphas, pp.alphanums + "_/:.")
> 
> identifiers *must* start with an alphabetic character, and then may be
> followed by 0 or more alphanumeric or _/: or . characters. If only one
> argument is supplied, then the same string of characters is used as both
> initial and body. Identifiers are very typical for 2 argument Word's, as
> they often start with alphas, but then accept digits and other 
> punctuation.
> No whitespace is permitted within a Word. The Word matching will end when 
> a
> non-body character is seen.
> 
> Using this definition:
> 
> integer = pp.Word(pp.nums+"-+.", pp.nums)
> 
> It will accept "+123", "-345", "678", and ".901". But in a real number, a
> period may occur anywhere in the number, not just as the initial 
> character,
> as in "3.14159". So your bodyCharacters must also include a ".", as in:
> 
> integer = pp.Word(pp.nums+"-+.", pp.nums+".")
> 
> Let me say, though, that this is a very permissive definition of integer -
> for one thing, we really should rename it something like "number", since 
> it
> now accepts non-integers as well! But also, there is no restriction on the
> frequency of body characters. This definition would accept a "number" that
> looks like "3.4.3234.111.123.3234". If you are certain that you will only
> receive valid inputs, then this simple definition will be fine. But if you
> will have to handle and reject erroneous inputs, then you might do better
> with a number definition like:
> 
> number = Combine( Word( "+-"+nums, nums ) +
> Optional( point + Optional( Word( nums ) ) ) )
> 
> This will handle "+123", "-345", "678", and "0.901", but not ".901". If 
> you
> want to accept numbers that begin with "."s, then you'll need to tweak 
> this
> a bit further.
> 
> One last thing: you may want to start using setName() on some of your
> expressions, as in:
> 
> number = Combine( Word( "+-"+nums, nums ) +
> Optional( point + Optional( Word( nums ) ) )
> ).setName("number")
> 
> Note, this is *not* the same as setResultsName. Here setName is attaching 
> a
> name to this pattern, so that when it appears in an exception, the name 
> will
> be used instead of an encoded pattern string (such as W:012345...). No 
> need
> to do this for Literals, the literal string is used when it appears in an
> exception.
> 
> -- Paul
> 
> 
> 


-- 
'There is only one basic human right, and that is to do as you damn well 
please.
And with it comes the only basic human duty, to take the consequences.'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050725/4ea79eee/attachment.htm


More information about the Tutor mailing list