regular expression
James T. Dennis
jadestar at idiom.com
Sun May 19 07:15:53 EDT 2002
Sean 'Shaleh' Perry <shalehperry at attbi.com> wrote:
> On 18-May-2002 Batara Kesuma wrote:
>> Hi Sean,
>> On Sat, 18 May 2002 10:46:19 -0700 (PDT)
>> "Sean 'Shaleh' Perry" <shalehperry at attbi.com> wrote:
>>> you are very close to what you need.
>>> rule = re.compile(r'^\d{6}$') # ^ means start of string, then \d{6} is 6
>>> numbers
>>> # then $ is end of string.
>> Thank you very much. But what does the 'r' in (r'^\d{6}$') means?
> r'' is a 'raw' string, the contents of it generally do not need to be
> escaped.
> If you did not use the r'' syntax the above call would have been:
> rule = re.compile('^\\d{6}6') # note the escaped backslash.
> If you have a more complex regex all of the escaping makes it hard to read.
> rule = re.compile('^\\d{6}\\s+\\d{3}')
Getting back to the original question: it may be possible to do
this efficiently without a regex. You could start with:
s=filter(lambda x: len(x) == 3, l)
... to prefilter the list, leaving only six char items and
s=filter(lambda x: x.isdigit(), s)
... seems to work for strings that are composed entirely of digits.
So you can combine these to use:
s=filter(lambda x: len(x) == 6 and x.isdigit(), l)
... for the whole job.
I don't know if this is faster than using regular expressions,
but I'd guess that it might be. Personally I like to avoid regexes
unless they are clearly the right answer. A regex like r'^\d{6}$'
seems simple enough --- but complex regular expressions can be
phenomenally difficult to debug; It's very easy to create regexps that
work for all of your test cases, but fail on some form of unanticipated
form of input.
BTW: if you aren't sure that your list is composed exclusively of strings
you can use a more defensive and complex lambda expression as follows:
s=filter(lambda x: hasattr(x,'__len__') and len(x) == 6 and
hasattr(x,'isdigit') and x.isdigit(), l )
... this should even work if some or all of the objects in l are
not strings --- so long as they have meaningful "length" and "isdigit"
properties.
I guess that exhibits the Pythonic value of being flexible about
dynamic typing while avoiding the use of explicit type and subclass
tests. (Introspecting on subclass and type information breaks on
objects that are wrapped by Decorators, Proxy, Adapter or other
patterns of object usage).
To gild this lily all the way to the roots I suppose we could also extend
the lambda expression to handle six digit numerics (it depends on whether
you later use them as integers, and whether you're willing to coerce
some elements from your original list into the desired form from a few
"equivalent" forms). (It also depends on if you'd consider '000000' to
be a valid value for your application; if so you need to use strings).
Here's one that works:
s=filter(lambda x: hasattr(x,'__len__') and len(x) == 6
and hasattr(x, 'isdigit') and x.isdigit()
or hasattr(x, '__int__') and int(x) == x
and x > 99999 and x < 1000000 , mylist)
... though I'll admit that this lambda expression is getting to be
absurd. Once we have the resulting list we can then coerce them all
to the correct type using map() as follows:
map(int,s) or map(str,s)
which should be pretty safe for any sequence of objects that made it
though our filter. (Any wrappers, decorators, proxies, etc better
provide support for these methods, obviously).
More information about the Python-list
mailing list