[Tutor] regex and parsing through a semi-csv file

Wayne Werner waynejwerner at gmail.com
Wed Oct 5 22:24:05 CEST 2011


On Wed, Oct 5, 2011 at 1:12 PM, Mina Nozar <nozarm at triumf.ca> wrote:

> <snip>
> If there is a more graceful way of doing this, please let me know as well.
>  I am new to python...
>

 I just glanced through your email, but my initial thought would be to just
use regex to collect the entire segment that you're looking for, and then
string methods to split it up:

pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name='AC',
number='225'), re.DOTALL)

raw_data = re.search(pat, f.read())
if raw_data is None:
    # we didn't find the isotope, so take appropriate actions, quit or tell
the user
else:
    raw_data = raw_data.string.strip().split('\n')

Then it depends on how you want to process your data, but you could easily
use list comprehensions/generator expressions.

The most terse syntax I know of:

 data = [[float(x) for x in d.split(',')] for d in raw_data if
d[0].isdigit()]

Which is basically the equivalent of:

data = []
for d in raw_data:
   if d[0].isdigit():
        floats = []
        for x in d.split(','):
             floats.append(x)
        data.append(floats)

data will then contain a list of 3-element lists of floating point values.

If you want to "rotate" the list, you can do data = list(zip(*data)). To
illustrate:

>>> d = [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
>>> d = list(zip(*d))
>>> d
[('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

HTH,
Wayne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/77390f68/attachment-0001.html>


More information about the Tutor mailing list