regular expression

Sun May 19 07:15:53 EDT 2002

Sean 'Shaleh' Perry <shalehperry at attbi.com> wrote:
> On 18-May-2002 Batara Kesuma wrote:

>> Hi Sean,

>> On Sat, 18 May 2002 10:46:19 -0700 (PDT)
>> "Sean 'Shaleh' Perry" <shalehperry at attbi.com> wrote:

>>> you are very close to what you need.

>>> rule = re.compile(r'^\d{6}$') # ^ means start of string, then \d{6} is 6
>>> numbers
>>>                               # then $ is end of string.

>> Thank you very much. But what does the 'r' in (r'^\d{6}$') means?

> r'' is a 'raw' string, the contents of it generally do not need to be 
> escaped. 
> If you did not use the r'' syntax the above call would have been:
> rule = re.compile('^\\d{6}6') # note the escaped backslash.
> If you have a more complex regex all of the escaping makes it hard to read.
> rule = re.compile('^\\d{6}\\s+\\d{3}')

 Getting back to the original question: it may be possible to do
 this efficiently without a regex.  You could start with:

	s=filter(lambda x: len(x) == 3, l)

 ... to prefilter the list, leaving only six char items and

	s=filter(lambda x: x.isdigit(), s)

 ... seems to work for strings that are composed entirely of digits.

 So you can combine these to use:

	s=filter(lambda x: len(x) == 6 and x.isdigit(), l)

 ... for the whole job.

 I don't know if this is faster than using regular expressions,
 but I'd guess that it might be.  Personally I like to avoid regexes
 unless they are clearly the right answer.  A regex like r'^\d{6}$'
 seems simple enough --- but complex regular expressions can be 
 phenomenally difficult to debug;  It's very easy to create regexps that 
 work for all of your test cases, but fail on some form of unanticipated 
 form of input.

 BTW: if you aren't sure that your list is composed exclusively of strings 
 you can use a more defensive and complex lambda expression as follows:

	s=filter(lambda x: hasattr(x,'__len__') and len(x) == 6 and
	    hasattr(x,'isdigit') and x.isdigit(), l )

 ... this should even work if some or all of the objects in l are 
 not strings --- so long as they have meaningful "length" and "isdigit"
 properties.  

 I guess that exhibits the Pythonic value of being flexible about 
 dynamic typing while avoiding the use of explicit type and subclass
 tests.  (Introspecting on subclass and type information breaks on
 objects that are wrapped by Decorators, Proxy, Adapter or other 
 patterns of object usage).

 To gild this lily all the way to the roots I suppose we could also extend
 the lambda expression to handle six digit numerics (it depends on whether
 you later use them as integers, and whether you're willing to coerce 
 some elements from your original list into the desired form from a few
 "equivalent" forms).  (It also depends on if you'd consider '000000' to
 be a valid value for your application; if so you need to use strings).

 Here's one that works:

	s=filter(lambda x: hasattr(x,'__len__') and len(x) == 6 
		  and hasattr(x, 'isdigit') and x.isdigit() 
		or hasattr(x, '__int__') and int(x) == x 
		  and x > 99999 and x < 1000000 , mylist)

 ... though I'll admit that this lambda expression is getting to be
 absurd.  Once we have the resulting list we can then coerce them all
 to the correct type using map() as follows:

	map(int,s)  or map(str,s)

 which should be pretty safe for any sequence of objects that made it
 though our filter.  (Any wrappers, decorators, proxies, etc better
 provide support for these methods, obviously).