Regular Expression

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Mon Oct 22 18:56:03 EDT 2007


On Mon, 22 Oct 2007 22:29:38 +0000, patrick.waldo wrote:

> I'm trying to learn regular expressions, but I am having trouble with
> this.  I want to search a document that has mixed data; however, the
> last line of every entry has something like C5H4N4O3 or CH5N3.ClH.
> All of the letters are upper case and there will always be numbers and
> possibly one .
> 
> However below only gave me none.
> 
> […]
>
> test = re.compile('\u+\d+\.')

There is no '\u'.  'u' doesn't have a special meaning so the '\' is
pointless.  Your expression matches one or more small 'u's followed by one
or more digits followed by a period.  Examples are 'u1.', 'uuuuuuuu42.',
etc.

An expression that matches your first example would be: r'([A-Z]|\d|\.)+'.
That's a non-empty sequence of upper case letters, digits and periods.  To
limit this to just one optional period the expression gets a little
longer: r'([A-Z]|\d)+\.?([A-Z]|\d)+'

Does not match your second example because there is a lower case letter in
it.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list