pyparsing and svg

Thu Nov 8 10:11:13 EST 2007

Paul McGuire wrote:
> On Nov 8, 3:14 am, Donn Ingle <donn.in... at gmail.com> wrote:
>
>   
>> float = nums + dot + nums
>>     
>
> Should be:
>
> float = Combine(Word(nums) + dot + Word(nums))
>
> nums is a string that defines the set of numeric digits for composing
> Word instances.  nums is not an expression by itself.
>
> For that matter, I see in your later tests that some values have a
> leading minus sign, so you should really go with:
>
> float = Combine(Optional("-") + Word(nums) + dot + Word(nums))
>
>
>   

I have a working path data parser (in pyparsing) at 
http://code.google.com/p/wxpsvg.

Parsing the numeric values initially gave me a lot of trouble - I 
translated the BNF in the spec literally and there was a *ton* of 
backtracking going on with every numeric value. I ended up using a more 
generous grammar, and letting pythons float() reject invalid values.

I couldn't get repeating path elements (like M 100 100 200 200, which is 
the same as M 100 100 M 200 200) working right in the grammar, so I 
expand those with post-processing.

The parser itself can be seen at 
http://wxpsvg.googlecode.com/svn/trunk/svg/pathdata.py

> Some other comments:
>
> 1. Read up on the Word class, you are not using it quite right.
>
> command = Word("MLCZ")
>
> will work with your test set, but it is not the form I would choose.
> Word(characterstring) will match any "word" made up of the characters
> in the input string.  So Word("MLCZ") will match
> M
> L
> C
> Z
> MM
> LC
> MCZL
> MMLCLLZCZLLM
>
> I would suggest instead using:
>
> command = Literal("M") | "L" | "C" | "Z"
>
> or
>
> command = oneOf("M L C Z")
>
> 2. Change comma to
>
> comma = Literal(",").suppress()
>
> The comma is important to the parsing process, but the ',' token is
> not much use in the returned set of matched tokens, get rid of it (by
> using suppress).
>
> 3. Group your expressions, such as
>
> couple = Group(float + comma + float)
>
> It will really simplify getting at the resulting parsed tokens.
>
>
> 4. What is the purpose of (couple + couple)?  This is sufficient:
>
> phrase = OneOrMore(command + Group(OneOrMore(couple)) )
>
> (Note use of Group to return the coord pairs as a sublist.)
>
>
> 5. Results names!
>
> phrase = OneOrMore(command("command") + Group(OneOrMore(couple))
> ("coords") )
>
> will allow you to access these fields by name instead of by index.
> This will make your parser code *way* more readable.
>
>
> -- Paul
>
>