How to match literal backslashes read from a text file using regular expressions?
John Machin
sjmachin at lexicon.net
Tue Jul 12 19:40:53 EDT 2005
cricfan at gmail.com wrote:
> I'm parsing a text file to extract word definitions. For example the
> input text file contains the following content:
>
> di.va.gate \'di_--v*-.ga_-t\ vb
> pas.sim \'pas-*m\ adv : here and there : THROUGHOUT
>
> I am trying to obtain words between two literal backslashes (\ .. \). I
> am not able to match words between two literal backslashes using the
> regxp - re.compile(r'\\[^\\]*\\').
>
> Here is my sample script:
>
> import re;
Lose the semicolons ...
>
> #slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
> pattern = r'\\[^\\]*\\'
> slashPattern = re.compile(pattern);
>
> fdr = file( "parseinput",'r');
> line = fdr.readline();
>
You should upgrade so that you have a modern Python and a modern
tutor[ial] -- then you will be writing:
for line in fdr:
do_something_with(line)
> while (line != ""):
Lose the extraneous parentheses ...
> if (slashPattern.match(line)):
Your main problem is that you should be using the search() method, not
the match() method. Read the section on this topic in the re docs!!
>>> import re
>>> pat = re.compile(r'\\[^\\]*\\')
>>> pat.match(r'abcd \xyz\ pqr')
>>> pat.search(r'abcd \xyz\ pqr')
<_sre.SRE_Match object at 0x00AE8988>
> print line.rstrip() + " <-- matches pattern " + pattern
> else:
> print line.rstrip() + " <-- DOES not match pattern " +
> pattern
> line = fdr.readline();
> print;
>
>
> ----------
> The output
>
> C:\home\krishna\lang\python>python wsparsetest.py
> python wsparsetest.py
> di.va.gate \'di_--v*-.ga_-t\ vb <-- DOES not match
> pattern \\[^\\]*\\
> pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
> pattern \\[^\\]*\\
> -----------
>
> What should I be doing to match those literal backslashes?
>
> Thanks
>
More information about the Python-list
mailing list