raw strings
Michele Simionato
mis6 at pitt.edu
Fri Oct 11 15:29:46 EDT 2002
Duncan Booth <duncan at rcp.co.uk> wrote in message
>> s/regexp1/regexp2/
>... where regexp1 is a regular expression and regexp2 is a string.
Maybe regexp2 is not a regular expression, but certainly is not a
standard string, since can contain grouping characters. For instance
in a text I needed to change expressions of kind
[decimal number] --> (decimal number)
and I used
sub(r'\[(\d+)\]', r'(\1)')
If the second expression was a real string, '(\1)' would be replaced instead
of the correct decimal number ! With this in mind I used the term regular
expression for regexp2, even if I agree which is not a regular expression
in the same sense of regexp1. But it is not a standard string. In lack of
a good term I used the notation regexp2.
> You could try writing re.sub(regexp1, replacement, string), or using
> your terminology:
> re.sub(r'regexp1', r'regexp2', text)
> where regexp2 is not a regular expression.
I had the impression that the use of re.sub(), without compiling first
the regular expression, was quite inefficient. Now I did some profiling and
discovered that it is worse, but only by 10%, practically nothing.
Therefore I will use the non-compiled form in the future.
> I think you have a fundamental misunderstanding of what a 'raw
> string' actually is.
Even if at the time of my first posting I was unsure about the exact
meaning of a raw string, after the reply by Bengt Richter I quickly
realized how things work, this is the reason why I wrote
> The problem seems much more complicated than I expected.
Now I understand well the way Python interprets strings and the reason
why it is not obvious at all to define a raw_string function.
I had already thought to the preprocessor idea suggested by Gerhard Häring
but I discarded it since I wanted raw_string() working on variables, not
only on string constants which would be the case for a preprocessor. In
this way I would simply give a longer name to the r operation !
Therefore for the moment I will stay with the ugly r notation.
Still, I don't believe I am the only one who thinks the "r" is ugly!
It seems to me a last minute hack more than a pythonic construct.
At least, IMHO.
Thanks to all people who answered and helped me to understand,
Michele
More information about the Python-list
mailing list