regular expression to extract text
Mark Light
light at soton.ac.uk
Thu Nov 20 10:52:04 EST 2003
"Peter Hansen" <peter at engcorp.com> wrote in message
news:3FBCDFB3.E01417E1 at engcorp.com...
> Mark Light wrote:
> >
> > Hi I have a file read in as a string that looks like below. What I want
to
> > do is pull out the bits of information to eventually put in an html
table.
> > FOr the 1st example the 3 bits are:
> > 1.QEXZUO
> > 2. C26 H31 N1 O3
> > 3. 6.164 15.892 22.551 90.00 90.00 90.00
> >
> > ANy ideas of the best way to do this - I was trying regular expressions
but
> > not getting very far.
> >
> > Thanks,
> >
> > Mark.
> >
> > """
> > Using unit cell orientation matrix from collect.rmat
> > NOTICE: Performing automatic cell standardization
> > The following database entries have similar unit cells:
> > Refcode Sumformula
> > <Conventional cell parameters>
> > ------------------------------------------
> > QEXZUO C26 H31 N1 O3
> > 6.164 15.892 22.551 90.00 90.00 90.00
> > ------------------------------------------
> > ARQTYD C19 H23 N1 O5
> > 6.001 15.227 22.558 90.00 90.00 90.00
> > ------------------------------------------
> > NHDIIS C45 H40 Cl2
> > 6.532 15.147 22.453 90.00 90.00 90.00 """
>
> I don't think you've given enough information here. Are those
> "bits" supposed to be kept intact, complete with internal spacing,
> or are you doing more manipulation of them? What is the definition
> of the "bits"? Specifically, is bit 1 "the first non-space token
> after a line of hyphens"? Is bit 2 "everything on the line after
> bit 1, with leading and trailing spaces stripped"? Is bit 3
> "everything on the following line, with leading/trailing spaces
> stripped"?
>
> Those definitions roughly fit what you describe, and if that's
> all you need, the solution should be pretty trivial, without
> having to use regular expressions which would be overkill in this
> case.
Sorry for being inexact - the definitions you proposed do fit the bill.
Mark.
More information about the Python-list
mailing list