[Tutor] confusions about re module

Peter Otten __peter__ at web.de
Sun May 1 09:14:42 CEST 2011


naheed arafat wrote:

> someone please tell me why i'm getting this output?
> specially the 'e3%' ! ! !
>>>> import re
>>>> re.findall('([\w]+.)','abdd.e3\45 dret.8dj st.jk')
> ['abdd.', 'e3%', 'dret.', '8dj ', 'st.', 'jk']
> 
> I am getting the same output for the following too..
>>>> re.findall(r'([\w]+.)','abdd.e3\45 dret.8dj st.jk')
> ['abdd.', 'e3%', 'dret.', '8dj ', 'st.', 'jk']
> 
> wasn't i supposed to get ['abdd.','dret.'] ??
> python version: 2.6.5
> os: windows

Quoting http://docs.python.org/library/re.html :

"""
'.'
(Dot.) In the default mode, this matches any character except a newline. If 
the DOTALL flag has been specified, this matches any character including a 
newline.

[...]

'\'
Either escapes special characters (permitting you to match characters like 
'*', '?', and so forth), or signals a special sequence; special sequences 
are discussed below.
"""

So you get the desired behaviour by escaping the dot:

>>> re.findall(r'([\w]+\.)','abdd.e3\45 dret.8dj st.jk')
['abdd.', 'dret.', 'st.']
>>> re.findall(r'([\w]+[.])','abdd.e3\45 dret.8dj st.jk')
['abdd.', 'dret.', 'st.']

(assuming that you left out the last match accidentally)



More information about the Tutor mailing list