Regular expression help

Gerard flanagan grflanagan at gmail.com
Fri Jul 18 10:14:49 EDT 2008


nclbndk759 at googlemail.com wrote:
> Hello,
> 
> I am new to Python, with a background in scientific computing. I'm
> trying to write a script that will take a file with lines like
> 
> c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> 3pv=0
> 
> extract the values of afrac and etot and plot them. I'm really
> struggling with getting the values of efrac and etot. So far I have
> come up with (small snippet of script just to get the energy, etot):
> 
> def get_data_points(filename):
>     file = open(filename,'r')
>     data_points = []
>     while 1:
>         line = file.readline()
>         if not line: break
>         energy = get_total_energy(line)
>         data_points.append(energy)
>     return data_points
> 
> def get_total_energy(line):
>     rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
>     p = re.compile(rawstr)
>     return p.match(line,5)
> 
> What is being stored in energy is '<_sre.SRE_Match object at
> 0x2a955e4ed0>', not '-11.020107'. Why? 



1. Consider using the 'split' method on each line rather than regexes
2. In your code you are compiling the regex for every line in the file, 
you should lift it out of the 'get_total-energy' function so that the 
compilation is only done once.
3. A Match object has a 'groups' function which is what you need to 
retrieve the data
4. Also look at the findall method:

data = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 
emad=-3.597647 3pv=0 '

import re

rx = re.compile(r'(\w+)=(\S+)')

data = dict(rx.findall(data))

print data

hth

G.




More information about the Python-list mailing list