pyparsing: match empty line

Marek Kubica marek at xivilization.net
Tue Sep 2 12:38:10 EDT 2008


Hi,

I am trying to get this stuff working, but I still fail.

I have a format which consists of three elements:
\d{4}M?-\d (4 numbers, optional M, dash, another number)
EMPTY (the <EMPTY> token)
[Empty line] (the <PAGEBREAK> token. The line may contain whitespaces, 
but nothing else)

While the ``watchname`` and ``leaveempty`` were trivial, I cannot get 
``pagebreak`` to work properly.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from pyparsing import (Word, Literal, Optional, Group, OneOrMore, Regex,
        Combine, ParserElement, nums, LineStart, LineEnd, White,
        replaceWith)

ParserElement.setDefaultWhitespaceChars(' \t\r')

watchseries = Word(nums, exact=4)
watchrev = Word(nums, exact=1)

watchname = Combine(watchseries + Optional('M') + '-' + watchrev)

leaveempty = Literal('EMPTY')

def breaks(s, loc, tokens):
    print repr(tokens[0])
    #return ['<PAGEBREAK>' for token in tokens[0]]
    return ['<PAGEBREAK>']

#pagebreak = Regex('^\s*$').setParseAction(breaks)
pagebreak = LineStart() + LineEnd().setParseAction(replaceWith    
('<PAGEBREAK>'))

parser = OneOrMore(watchname ^ pagebreak ^ leaveempty)

tests = [
    "2134M-2",
    """3245-3
    3456M-5""",
    """3256-4

    4563-4""",
    """4562M-6
     EMPTY
    3246-5"""
]

for test in tests:
    print parser.parseString(test)

The output should be:
['2134M-2']
['3245-3', '3456M-5']
['3256-4', '<PAGEBREAK>' '4563-4']
['4562M-6', '<EMPTY>', '3246-5']

Thanks in advance!
regards,
Marek



More information about the Python-list mailing list