Regular expression help
Gerard flanagan
grflanagan at gmail.com
Fri Jul 18 10:14:49 EDT 2008
nclbndk759 at googlemail.com wrote:
> Hello,
>
> I am new to Python, with a background in scientific computing. I'm
> trying to write a script that will take a file with lines like
>
> c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> 3pv=0
>
> extract the values of afrac and etot and plot them. I'm really
> struggling with getting the values of efrac and etot. So far I have
> come up with (small snippet of script just to get the energy, etot):
>
> def get_data_points(filename):
> file = open(filename,'r')
> data_points = []
> while 1:
> line = file.readline()
> if not line: break
> energy = get_total_energy(line)
> data_points.append(energy)
> return data_points
>
> def get_total_energy(line):
> rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
> p = re.compile(rawstr)
> return p.match(line,5)
>
> What is being stored in energy is '<_sre.SRE_Match object at
> 0x2a955e4ed0>', not '-11.020107'. Why?
1. Consider using the 'split' method on each line rather than regexes
2. In your code you are compiling the regex for every line in the file,
you should lift it out of the 'get_total-energy' function so that the
compilation is only done once.
3. A Match object has a 'groups' function which is what you need to
retrieve the data
4. Also look at the findall method:
data = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
emad=-3.597647 3pv=0 '
import re
rx = re.compile(r'(\w+)=(\S+)')
data = dict(rx.findall(data))
print data
hth
G.
More information about the Python-list
mailing list