Ask for help on using re

Jach Feng jfong at ms4.hinet.net
Thu Aug 5 20:57:14 EDT 2021


ast 在 2021年8月5日 星期四下午11:29:15 [UTC+8] 的信中寫道:
> Le 05/08/2021 à 17:11, ast a écrit :
> > Le 05/08/2021 à 11:40, Jach Feng a écrit : 
> >> I want to distinguish between numbers with/without a dot attached: 
> >> 
> >>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n' 
> >>>>> re.compile(r'ch \d{1,}[.]').findall(text) 
> >> ['ch 1.', 'ch 23.'] 
> >>>>> re.compile(r'ch \d{1,}[^.]').findall(text) 
> >> ['ch 23', 'ch 4 ', 'ch 56 '] 
> >> 
> >> I can guess why the 'ch 23' appears in the second list. But how to get 
> >> rid of it? 
> >> 
> >> --Jach 
> >> 
> > 
> > >>> import re 
> > 
> > >>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n' 
> > 
> > >>> re.findall(r'ch \d+\.', text) 
> > ['ch 1.', 'ch 23.'] 
> > 
> > >>> re.findall(r'ch \d+(?!\.)', text) # (?!\.) for negated look ahead 
> > ['ch 2', 'ch 4', 'ch 56']
> import regex 
> 
> # regex is more powerful that re
> >>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
> >>> regex.findall(r'ch \d++(?!\.)', text) 
> 
> ['ch 4', 'ch 56'] 
> 
> ## ++ means "possessive", no backtrack is allowed
Can someone explain how the difference appear? I just can't figure it out:-(

>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> re.compile(r'ch \d+[^.]').findall(text)
['ch 23', 'ch 4 ', 'ch 56 ']
>>> re.compile(r'ch \d+[^.0-9]').findall(text)
['ch 4 ', 'ch 56 ']

--Jach


More information about the Python-list mailing list