Ideas for parsing this text?
Paul McGuire
ptmcg at austin.rr.com
Thu Apr 24 12:08:10 EDT 2008
On Apr 24, 10:42 am, "Eric Wertman" <ewert... at gmail.com> wrote:
> I'm sure there are cooler ways to do some of that. I spent most of my
> time expanding the characters that constitute content. I'm concerned
> that over time I'll have things break as other characters show up.
> Specifically a few of the nodes are of German locale.. so I could get
> some odd international characters.
>
If you want to add international characters without going to Unicode,
a first cut would be to add pyparsing's string constant "ascii8bit".
> It looks like pyparser has a constant for printable characters. I'm
> not sure if I can just use that, without worrying about it?
>
I would discourage you from using printables, since it also includes
'[', ']', and '"', which are significant to other elements of the
parser (but you could create your own variable initialized with
printables, and then use replace("[","") etc. to strip out the
offending characters). I'm also a little concerned that you needed to
add \t and \n to the content word - was this really necessary? None
of your examples showed such words, and I would rather have you let
pyparsing skip over the whitespace as is its natural behavior.
-- Paul
More information about the Python-list
mailing list