regular expression to extract text
Peter Hansen
peter at engcorp.com
Thu Nov 20 10:37:23 EST 2003
Mark Light wrote:
>
> Hi I have a file read in as a string that looks like below. What I want to
> do is pull out the bits of information to eventually put in an html table.
> FOr the 1st example the 3 bits are:
> 1.QEXZUO
> 2. C26 H31 N1 O3
> 3. 6.164 15.892 22.551 90.00 90.00 90.00
>
> ANy ideas of the best way to do this - I was trying regular expressions but
> not getting very far.
>
> Thanks,
>
> Mark.
>
> """
> Using unit cell orientation matrix from collect.rmat
> NOTICE: Performing automatic cell standardization
> The following database entries have similar unit cells:
> Refcode Sumformula
> <Conventional cell parameters>
> ------------------------------------------
> QEXZUO C26 H31 N1 O3
> 6.164 15.892 22.551 90.00 90.00 90.00
> ------------------------------------------
> ARQTYD C19 H23 N1 O5
> 6.001 15.227 22.558 90.00 90.00 90.00
> ------------------------------------------
> NHDIIS C45 H40 Cl2
> 6.532 15.147 22.453 90.00 90.00 90.00 """
I don't think you've given enough information here. Are those
"bits" supposed to be kept intact, complete with internal spacing,
or are you doing more manipulation of them? What is the definition
of the "bits"? Specifically, is bit 1 "the first non-space token
after a line of hyphens"? Is bit 2 "everything on the line after
bit 1, with leading and trailing spaces stripped"? Is bit 3
"everything on the following line, with leading/trailing spaces
stripped"?
Those definitions roughly fit what you describe, and if that's
all you need, the solution should be pretty trivial, without
having to use regular expressions which would be overkill in this
case.
More information about the Python-list
mailing list