[Tutor] RE expressions

Fri Aug 15 23:46:33 CEST 2008

Johan Nilsson wrote:
> 'text  "http:\123\interesting_adress\etc\etc\" more text'

Does this really use backslashes in the text?  The standard for URLs (if 
that's what it is) is to use forward slashes.

For your RE, though, you can always use [...] to specify a range 
including whatever you like.  Remember that \ is a special symbol, too. 
  If you want to match a literal \ character, the RE for that is \\. 
Also remember to use a raw string in Python so the string-building 
syntax doesn't get confused by the backslashes too.  How about something 
along the lines of:

re.compile(r'"[a-zA-Z0-9_\\]*"')

but why constrain what may be between the quotes?

re.compile(r'"[^"]*"')

or even

re.compile('".*?"')

> 
> I have figured out that if it wasn't for the \ a simple
> p=re.compile('\"\w+\"') would do the trick. From what I understand \w 
> only covers the set [a-zA-Z0-9_] and hence not the "\".
> I assume the solution is just in front of my eyes, and I have been 
> looking on the screen for too long. Any hints would be appreciated.
> 
> 
> In [72]: p=re.compile('"\w+\"')
> 
> In [73]: p.findall('asdsa"123abc123"jggfds')
> Out[73]: ['"123abc123"']
> 
> In [74]: p.findall('asdsa"123abc\123"jggfds')
> Out[74]: ['"123abcS"']
> 
> /Johan
>