How to match literal backslashes read from a text file using regular expressions?

Tue Jul 12 19:40:53 EDT 2005

cricfan at gmail.com wrote:
> I'm parsing a text file to extract word definitions. For example the
> input text file contains the following content:
> 
> di.va.gate \'di_--v*-.ga_-t\ vb
> pas.sim \'pas-*m\ adv : here and there : THROUGHOUT
> 
> I am trying to obtain words between two literal backslashes (\ .. \). I
> am not able to  match words between two literal backslashes using the
> regxp - re.compile(r'\\[^\\]*\\').
> 
> Here is my sample script:
> 
> import re;

Lose the semicolons ...

> 
> #slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
> pattern = r'\\[^\\]*\\'
> slashPattern = re.compile(pattern);
> 
> fdr = file( "parseinput",'r');
> line = fdr.readline();
> 

You should upgrade so that you have a modern Python and a modern 
tutor[ial] -- then you will be writing:

for line in fdr:
     do_something_with(line)

> while (line != ""):

Lose the extraneous parentheses ...

>     if (slashPattern.match(line)):

Your main problem is that you should be using the search() method, not 
the match() method. Read the section on this topic in the re docs!!

 >>> import re
 >>> pat = re.compile(r'\\[^\\]*\\')
 >>> pat.match(r'abcd \xyz\ pqr')
 >>> pat.search(r'abcd \xyz\ pqr')
<_sre.SRE_Match object at 0x00AE8988>

>         print  line.rstrip()  + " <-- matches pattern " + pattern
>     else:
>         print  line.rstrip()  + " <-- DOES not match pattern " +
> pattern
>     line = fdr.readline();
>     print;
> 
> 
> ----------
> The output
> 
> C:\home\krishna\lang\python>python wsparsetest.py
> python wsparsetest.py
> di.va.gate \'di_--v*-.ga_-t\ vb                     <-- DOES not match
> pattern \\[^\\]*\\
> pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
> pattern \\[^\\]*\\
> -----------
> 
> What should I be doing to match those literal backslashes? 
> 
> Thanks
>