[Tutor] RE module is working ?

Karim karim.liateni at free.fr
Fri Feb 4 20:37:41 CET 2011


By the way with your helper function algorithm Steven and Peter comments 
you made me think of this change:

karim at Requiem4Dream:~$ echo 'prima " "' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'
prima \" \"
karim at Requiem4Dream:~$ echo 'prima ""' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'
prima \"\"
karim at Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'
prima \"Ich Karim\"
karim at Requiem4Dream:~$ echo 'prima "Ich Karim"' | sed -e 
's/""/\\"\\"/g;s/\([^\]\)"/\1\\"/g'
prima \"Ich Karim\"

Regards
Karim


On 02/04/2011 08:07 PM, Karim wrote:
> On 02/04/2011 02:36 AM, Steven D'Aprano wrote:
>> Karim wrote:
>>
>>>>> *Indeed what's the matter with RE module!?*
>>>> You should really fix the problem with your email program first;
>>> Thunderbird issue with bold type (appears as stars) but I don't know 
>>> how to fix it yet.
>>
>> A man when to a doctor and said, "Doctor, every time I do this, it 
>> hurts. What should I do?"
>>
>> The doctor replied, "Then stop doing that!"
>>
>> :)
>
> Yes this these words made me laugh. I will keep it in my funny box.
>
>>
>>
>> Don't add bold or any other formatting to things which should be 
>> program code. Even if it looks okay in *your* program, you don't know 
>> how it will look in other people's programs. If you need to draw 
>> attention to something in a line of code, add a comment, or talk 
>> about it in the surrounding text.
>>
>>
>> [...]
>>> That is not the thing I want. I want to escape any " which are not 
>>> already escaped.
>>> The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I 
>>> have made regex on unix since 15 years).
>
> Mainly sed, awk and perl sometimes grep and egrep. I know this is the 
> jungle.
>
>> Which regex? Perl regexes? sed or awk regexes? Extended regexes? GNU 
>> posix compliant regexes? grep or egrep regexes? They're all different.
>>
>> In any case, I am sorry, I don't think your regex does what you say. 
>> When I try it, it doesn't work for me.
>>
>> [steve at sylar ~]$ echo 'Some \"text"' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
>> Some \\"text\"
>
> I give you my word on this. Exact output I redid it:
>
> #MY OS VERSION
> karim at Requiem4Dream:~$ uname -a
> Linux Requiem4Dream 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 
> 23:42:43 UTC 2011 x86_64 GNU/Linux
> #MY SED VERSION
> karim at Requiem4Dream:~$ sed --version
> GNU sed version 4.2.1
> Copyright (C) 2009 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
> PURPOSE,
> to the extent permitted by law.
>
> GNU sed home page: <http://www.gnu.org/software/sed/>.
> General help using GNU software: <http://www.gnu.org/gethelp/>.
> E-mail bug reports to: <bug-gnu-utils at gnu.org>.
> Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
> #MY SED OUTPUT COMMAND:
> karim at Requiem4Dream:~$  echo 'Some ""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
> Some \"\"
> # THIS IS WHAT I WANT 2 CONSECUTIVES IF THE FIRST ONE IS ALREADY 
> ESCAPED I DON'T WANT TO ESCAPED IT TWICE.
> karim at Requiem4Dream:~$ echo 'Some \""' | sed -e 's/\([^\\]\)\?"/\1\\"/g'
> Some \"\"
> # BY THE WAY THIS ONE WORKS:
> karim at Requiem4Dream:~$ echo 'Some "text"' | sed -e 
> 's/\([^\\]\)\?"/\1\\"/g'
> Some \"text\"
> # BUT SURE NOT THIS ONE NOT COVERED BY MY REGEX (I KNOW IT AND WANT 
> ORIGINALY TO COVER IT):
> karim at Requiem4Dream:~$ echo 'Some \"text"' | sed -e 
> 's/\([^\\]\)\?"/\1\\"/g'
> Some \\"text\"
>
> By the way in all sed version I work with the '?'  (0 or one match) 
> should be escaped that's the reason I have '\?' same thing with save 
> '\(' and '\)' to store value. In perl, grep you don't need to escape.
>
> # SAMPLE FROM http://www.gnu.org/software/sed/manual/sed.html
>
> |\+|
>     same As |*|, but matches one or more. It is a GNU extension.
> |\?|
>     same As |*|, but only matches zero or one. It is a GNU extension
>
>> I wouldn't expect it to work. See below.
>>
>> By the way, you don't need to escape the brackets or the question mark:
>>
>> [steve at sylar ~]$ echo 'Some \"text"' | sed -re 's/([^\\])?"/\1\\"/g'
>> Some \\"text\"
>>
>>
>>> For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
>>
>> No it is not.
>>
>
> Yes I know, see my latest post in detail I already found the solution. 
> I put it again the solution below:
>
> #Found the solution: '?' needs to be inside parenthesis (saved 
> pattern) because outside we don't know if the saved match argument
> #will exist or not namely '\1'.
>
> >>> re.subn(r'([^\\]?)"', r'\1\\"', expression)
>
> (' \\"\\" ', 2)
>
>
>> The pattern you are matching does not do what you think it does. 
>> "Zero or one of not-backslash, followed by a quote" will match a 
>> single quote *regardless* of what is before it. This is true even in 
>> sed, as you can see above, your sed regex matches both quotes.
>>
>> \" will match, because the regular expression will match zero 
>> characters, followed by a quote. So the regex is correct.
>>
>> >>> match = r'[^\\]?"'  # zero or one not-backslash followed by quote
>> >>> re.search(match, r'aaa\"aaa').group()
>> '"'
>>
>> Now watch what happens when you call re.sub:
>>
>>
>> >>> match = r'([^\\])?"'  # group 1 equals a single non-backslash
>> >>> replace = r'\1\\"'  # group 1 followed by \ followed by "
>> >>> re.sub(match, replace, 'aaaa')  # no matches
>> 'aaaa'
>> >>> re.sub(match, replace, 'aa"aa')  # one match
>> 'aa\\"aa'
>> >>> re.sub(match, replace, '"aaaa')  # one match, but there's no group 1
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/usr/local/lib/python3.1/re.py", line 166, in sub
>>     return _compile(pattern, flags).sub(repl, string, count)
>>   File "/usr/local/lib/python3.1/re.py", line 303, in filter
>>     return sre_parse.expand_template(template, match)
>>   File "/usr/local/lib/python3.1/sre_parse.py", line 807, in 
>> expand_template
>>     raise error("unmatched group")
>> sre_constants.error: unmatched group
>>
>> Because group 1 was never matched, Python's re.sub raised an error. 
>> It is not a very informative error, but it is valid behaviour.
>>
>> If I try the same thing in sed, I get something different:
>>
>> [steve at sylar ~]$ echo '"Some text' | sed -re 's/([^\\])?"/\1\\"/g'
>> \"Some text
>>
>> It looks like this version of sed defines backreferences on the 
>> right-hand side to be the empty string, in the case that they don't 
>> match at all. But this is not standard behaviour. The sed FAQs say 
>> that this behaviour will depend on the version of sed you are using:
>>
>> "Seds differ in how they treat invalid backreferences where no 
>> corresponding group occurs."
>>
>> http://sed.sourceforge.net/sedfaq3.html
>>
>> So you can't rely on this feature. If it works for you, great, but it 
>> may not work for other people.
>>
>>
>> When you delete the ? from the Python regex, group 1 is always valid, 
>> and you don't get an exception. Or if you ensure the input always 
>> matches group 1, no exception:
>>
>> >>> match = r'([^\\])?"'
>> >>> replace = r'\1\\"'
>> >>> re.sub(match, replace, 'a"a"a"a') # group 1 always matches
>> 'a\\"a\\"a\\"a'
>>
>> (It still won't do what you want, but that's a *different* problem.)
>>
>>
>>
>> Jamie Zawinski wrote:
>>
>>   Some people, when confronted with a problem, think "I know,
>>   I'll use regular expressions." Now they have two problems.
>>
>> How many hours have you spent trying to solve this problem using 
>> regexes? This is a *tiny* problem that requires an easy solution, not 
>> wrestling with a programming language that looks like line-noise.
>>
>> This should do what you ask for:
>>
>> def escape(text):
>>     """Escape any double-quote characters if and only if they
>>     aren't already escaped."""
>>     output = []
>>     escaped = False
>>     for c in text:
>>         if c == '"' and not escaped:
>>             output.append('\\')
>>         elif c == '\\':
>>             output.append('\\')
>>             escaped = True
>>             continue
>>         output.append(c)
>>         escaped = False
>>     return ''.join(output)
>>
>
> Thank you for this one! This gives me some inspiration for other more 
> complicated parsing. :-)
>
>
>>
>> Armed with this helper function, which took me two minutes to write, 
>> I can do this:
>>
>> >>> text = 'Some text with backslash-quotes \\" and plain quotes " 
>> together.'
>> >>> print escape(text)
>> Some text with backslash-quotes \" and plain quotes \" together.
>>
>>
>> Most problems that people turn to regexes are best solved without 
>> regexes. Even Larry Wall, inventor of Perl, is dissatisfied with 
>> regex culture and syntax:
>>
>> http://dev.perl.org/perl6/doc/design/apo/A05.html
>
> Ok but if I have to suppress all use of my one-liner sed regex most 
> used utilities this is like refusing to use my car to go to work
> and make 20km by feet.
>  For overuse I can understand that though I already did 30 lines of 
> pure sed script using all it features
> which would have taken much more lines with awk or perl language.
>
> Anyway I am inclined to python now so if a re module exists with my 
> small regex there is no big deal to become familiar with this module.
>
> Thanks for your efforts you've done.
>
> Regards
> Karim
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110204/1f3f2363/attachment-0001.html>


More information about the Tutor mailing list