splitting a string into 2 new strings

Andrew Dalke adalke at mindspring.com
Wed Jul 2 15:17:13 EDT 2003


trp:
> I'm, assuming that these are chemical compounds, so you're not limited to
> one-character symbols.

The problem is underspecified.  Usually 2-character (or 3-character for some
elements with high atomic number, and not assuming the newer IUPAC names
like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
letter
capitalized and the rest in lower case.

> re_pat = re.compile('([A-Z]+)(\d+)')

So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
both lower and upper case trailing letters to be more accepting.  (In some
systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
to
write "1 copper".  Though I suspect it's not allowed in the OP's problem.)

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list