Regular expression gone mad

Mon Feb 20 10:50:59 EST 2006

"fileexit" <mayyash at gmail.com> wrote in message
news:1140432942.392985.148820 at z14g2000cwz.googlegroups.com...
> Hi,
> Would someone please tell me what is going on here??!! Why does the
> following code work
>
> >>> a=r"Mem"
> >>> pat = re.compile(a)
> >>> m=pat.search(ProcMem, re.DOTALL)
> >>> m
> <_sre.SRE_Match object at 0xb7f7eaa0>
> >>> m.group(0)
> 'Mem'
>
>
> ProcMem contains:
>
> >>> print ProcMem
> MemTotal:      8247952 kB
> MemFree:       5980920 kB
> Buffers:        417044 kB
> Cached:         703036 kB
> SwapCached:          0 kB
> Active:        1440136 kB
> Inactive:       370668 kB
> HighTotal:     7405512 kB
> HighFree:      5977600 kB
> LowTotal:       842440 kB
> LowFree:          3320 kB
> SwapTotal:     8339440 kB
> SwapFree:      8339296 kB
> Dirty:              96 kB
> Writeback:           0 kB
> Mapped:         786672 kB
> Slab:           359208 kB
> Committed_AS:  2453912 kB
> PageTables:      24696 kB
> VmallocTotal:   106488 kB
> VmallocUsed:      8700 kB
> VmallocChunk:    96708 kB
> HugePages_Total:     0
> HugePages_Free:      0
> Hugepagesize:     2048 kB
>

Are you going to create re's for every line of that data?  Here's a
pyparsing version that generates the grammar for you (or you can modify this
example to generate re's if you prefer).

This is an unusual form for pyparsing. Typically, people construct an And in
their grammar by connecting expressions together with '+' operators (as in
the procMemEntry method of the example).  But here, we are generating the
list of entries, and then directly creating the And with the resulting list
of expressions.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

procMemData = """
MemTotal:      8247952 kB
MemFree:       5980920 kB
Buffers:        417044 kB
Cached:         703036 kB
SwapCached:          0 kB
Active:        1440136 kB
Inactive:       370668 kB
HighTotal:     7405512 kB
HighFree:      5977600 kB
LowTotal:       842440 kB
LowFree:          3320 kB
SwapTotal:     8339440 kB
SwapFree:      8339296 kB
Dirty:              96 kB
Writeback:           0 kB
Mapped:         786672 kB
Slab:           359208 kB
Committed_AS:  2453912 kB
PageTables:      24696 kB
VmallocTotal:   106488 kB
VmallocUsed:      8700 kB
VmallocChunk:    96708 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB
"""
from pyparsing import Word,nums,Literal,Suppress,Group,And,Dict

# define an integer, and an integer kB value
integer = Word(nums).setParseAction(lambda s,l,t:int(t[0]))
numKb = integer + Suppress("kB")

# convenience method for extracting procMem entries
def procMemEntry(name, valExpr):
    return Group(Literal(name) + Suppress(":") + valExpr).setName(name)

# extract names from sample procMem data, and create list of matching value
expressions
names = [l.split(':')[0] for l in procMemData.split('\n') if l]
exprs = [n.startswith("HugePages_") and integer or numKb for n in names]

# generate grammar using names and exprs lists
procMemGrammar = Dict(And([ procMemEntry(nam,expr) for nam,expr in
zip(names,exprs) ]))

# check grammar by parsing input string
pmData = procMemGrammar.parseString(procMemData)

# access pmData as a dict
for k in pmData.keys():
    print k,pmData[k]

# or create a standard Python dict from pmData
print dict(pmData)