Regular expression help

nclbndk759 at googlemail.com nclbndk759 at googlemail.com
Fri Jul 18 10:40:02 EDT 2008


On Jul 18, 3:35 pm, Nick Dumas <drako... at gmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I think you're over-complicating this. I'm assuming that you're going to
> do a line graph of some sorta, and each new line of the file contains a
> new set of data.
>
> The problem you mentioned with your regex returning a match object
> rather than a string is because you're simply using a re function that
> doesn't return strings. re.findall() is what you want. That being said,
> here is working code to mine data from your file.
>
> [code]
> line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
> mad=-3.597647 3pv=0'
>
> energypat = r'\betot=(-?\d*?[.]\d*)'
>
> #Note: To change the data grabbed from the line, you can change the
> #'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
> #special character.
>
> energypat = re.compile(energypat)
>
> re.findall(energypat, line)# returns a STRING containing '-12.020107'
>
> [/code]
>
> This returns a string, which is easy enough to convert to an int. After
> that, you can datapoints.append() to your heart's content. Good luck
> with your work.
>
>
>
> nclbndk... at googlemail.com wrote:
> > Hello,
>
> > I am new to Python, with a background in scientific computing. I'm
> > trying to write a script that will take a file with lines like
>
> > c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> > 3pv=0
>
> > extract the values of afrac and etot and plot them. I'm really
> > struggling with getting the values of efrac and etot. So far I have
> > come up with (small snippet of script just to get the energy, etot):
>
> > def get_data_points(filename):
> >     file = open(filename,'r')
> >     data_points = []
> >     while 1:
> >         line = file.readline()
> >         if not line: break
> >         energy = get_total_energy(line)
> >         data_points.append(energy)
> >     return data_points
>
> > def get_total_energy(line):
> >     rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
> >     p = re.compile(rawstr)
> >     return p.match(line,5)
>
> > What is being stored in energy is '<_sre.SRE_Match object at
> > 0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
> > regular expressions for two days now, with no luck. Could someone
> > please put me out of my misery and give me a clue as to what's going
> > on? Apologies if it's blindingly obvious or if this question has been
> > asked and answered before.
>
> > Thanks,
>
> > Nicole
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (MingW32)
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>
> iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
> mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
> =L2VZ
> -----END PGP SIGNATURE-----

Thanks guys :-)



More information about the Python-list mailing list