Ask for help on using re

Fri Aug 6 10:37:43 EDT 2021

Il 06/08/2021 16:17, jak ha scritto:
> Il 06/08/2021 12:57, Jach Feng ha scritto:
>> jak 在 2021年8月6日 星期五下午4:10:05 [UTC+8] 的信中寫道：
>>> Il 05/08/2021 11:40, Jach Feng ha scritto:
>>>> I want to distinguish between numbers with/without a dot attached:
>>>>
>>>>>>> text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>>>>>> re.compile(r'ch \d{1,}[.]').findall(text)
>>>> ['ch 1.', 'ch 23.']
>>>>>>> re.compile(r'ch \d{1,}[^.]').findall(text)
>>>> ['ch 23', 'ch 4 ', 'ch 56 ']
>>>>
>>>> I can guess why the 'ch 23' appears in the second list. But how to 
>>>> get rid of it?
>>>>
>>>> --Jach
>>>>
>>> import re
>>> t = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
>>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>>>
>>> res = r.findall(t)
>>>
>>> dot = [x[1] for x in res if x[1] != '']
>>> udot = [x[0] for x in res if x[0] != '']
>>>
>>> print(f"dot: {dot}")
>>> print(f"undot: {udot}")
>>>
>>> out:
>>>
>>> dot: ['ch 4', 'ch 56']
>>> undot: ['ch 1.', 'ch 23.']
>>> r = re.compile(r'(ch +\d+\.)|(ch +\d+)', re.M)
>> That's an interest solution! Where the '|' operator in re.compile() 
>> was documented?
>>
>> --Jach
>>
> 
> I honestly can't tell you, I've been using it for over 30 years. In any
> case you can find some traces of it in the "regular expressions quick
> reference" on the site https://regex101.com (bottom right side).
> 
...if I'm not mistaken, the '|' it is part of normal regular
expressions, so it is not a specific extension of the python libraries.
Perhaps this is why you don't find any documentation on it.