pyparsing: match empty line

Paul McGuire ptmcg at austin.rr.com
Wed Sep 3 09:12:47 EDT 2008


On Sep 3, 4:26 am, Marek Kubica <ma... at xivilization.net> wrote:
> Hi,
>
> First of all a big thank you for your excellent library and of course
> also for your extensive and enlightening answer!
>

I'm glad pyparsing has been of help to you.  Pyparsing is building its
own momentum these days.  I have a new release in SVN that I'll put
out in the next week or so.


> Ok, I didn't think about this. But as my program is not only a parser but
> a long-running process and setDefaultWhitespace modifies a global
> variable I don't feel too comfortable with it.

Pyparsing isn't really all that thread-friendly.  You definitely
should not have multiple threads using the same grammar.  The
approaches I've seen people use in multithread applications are: 1)
synchronize access to a single parser across multiple threads, and 2)
create a parser per-thread, or use a pool of parsers.  Pyparsing
parsers can be pickled, so a quick way to reconstitute a parser is to
create the parser at startup time and pickle it to a string, then
unpickle a new parser as needed.


> I could set the whitespace
> on every element, but that is as you surely agree quite ugly. Do you
> accept patches? I'm thinking about some kind of factory-class which would
> automatically set the whitespaces:
>
> >>> factory = TokenFactory(' \t\r')
> >>> word = Factory.Word(alphas)
>
> That way, one wouldn't need to set a grobal value which might interfere
> with other pyparsers running in the same process.

I tried to prototype up your TokenFactory class, but once I got as far
as implementing __getattribute__ to return the corresponding pyparsing
class, I couldn't see how to grab the object generated for that class,
and modify its whitespace values.  I did cook up this, though:

class SetWhitespace(object):
    def __init__(self, whitespacechars):
        self.whitespacechars = whitespacechars

    def __call__(self,pyparsing_expr):
        pyparsing_expr.setWhitespace(self.whitespacechars)
        return pyparsing_expr

noNLskipping = SetWhitespace(' \t\r')
word = noNLskipping(Word(alphas))

I'll post this on the wiki and see what kind of comments we get.

By the way, setDefaultWhitespace only updates global variables that
are used at parser definition time, *not* at parser parse time.  So,
again, you can manage this class attribute at the initialization of
your program, before any incoming requests need to make use of one
parser or another.


> > 4) leaveempty probably needs this parse action to be attached to it:
>
> >     leaveempty =
> > Literal('EMPTY').setParseAction(replaceWith('<EMPTY>'))
>
> I added this in the meantime. replaceWith is really a handy helper.

After I released replaceWith, I received a parser from someone who
hadn't read down to the 'R's yet in the documentation, and he
implemented the same thing with this simple format:

     leaveempty = Literal('EMPTY').setParseAction(lambda : '<EMPTY>')

These are pretty much equivalent, I was just struck at how easy Python
makes things for us, too!


> > If you have more pyparsing questions, you can also post them on the
> > pyparsing wiki - the Discussion tab on the wiki Home page has become a
> > running support forum - and there is also a Help/Discussion mailing
> > list.
>
> Which of these two would you prefer?
>

They are equivalent, I monitor them both, and you can browse through
previous discussions using the Discussion tab online threads, or the
mailing list archive on SF.  Use whichever is easier for you to work
with.

Cheers, and Welcome to Pyparsing!
-- Paul



More information about the Python-list mailing list