Regular expression for file name

Bengt Richter bokr at oz.net
Sun Jul 18 15:40:24 EDT 2004


On Sun, 18 Jul 2004 14:21:14 +0200, "Miki Tebeka" <miki.tebeka at zoran.com> wrote:

>Hello All,
>
>In a configuration file there can be ID's and filename tokens.
>The file names have a known suffix (.o or .mls) and I need to get a regular
>expression that will catch filename but not an ID.
>
>Currently:
>ID = r"[a-zA-Z\.]\w+(?![/\\])"
>FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))" 
>
>However if I have the filename "Sources/kernel/rom_kernel.mls" then
>"Source" is interrupted as ID and "s/kernel/rom_kernel.mls" is interrupted
>as file name.
ITYM s/interrupted/interpreted/ ;-)
>
>Any way to do better?
If you want to prioritize matching amongst several
patterns with some leading commonality, UIAM or'ed terms get
tried left to right. I'm not checking your terms, but I think
here's a possible way to give priority to the FILENAME
pattern:

 >>> import re
 >>> ID = r"[a-zA-Z\.]\w+(?![/\\])"
 >>> FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))"
 >>> COMBINED = '(?P<file>%s)|(?P<id>%s)' % (FILENAME, ID)
 >>> rxo = re.compile(COMBINED)
 >>> filename = "Sources/kernel/rom_kernel.mls"
 >>> rxo.search(filename).groupdict()
 {'id': None, 'file': 'Sources/kernel/rom_kernel.mls'}

Try it with an id:

 >>> rxo.search('no_slashes_in_this').groupdict()
 {'id': 'no_slashes_in_this', 'file': None}

Of course you can mess with the result, e.g.,

 >>> result = rxo.search('no_slashes_in_this').groupdict()
 >>> result['id']
 'no_slashes_in_this'
 >>> result['file']
 >>> result['file'] is None
 True
 >>> result['id'], result['file']
 ('no_slashes_in_this', None)

No guarantees, but HTH

Regards,
Bengt Richter



More information about the Python-list mailing list