[Tutor] re.compile ?? (fwd)

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu Apr 24 19:01:00 2003


> def usage():
>         print __doc__ % version
>         sys.exit(2)
>
>
> numberAddedRE = re.compile("(.*)#\d+$")
>
> def makeOutputFileName(input, outputDir, extension):
>         dir, file = os.path.split(input)
>         file, ext = os.path.splitext(file)
>         if outputDir:
>                 dir = outputDir
>         output = os.path.join(dir, file + extension)
>         m = numberAddedRE.match(file)
>         if m:
>                 file = m.group(1)
>          n = 1
>         while os.path.exists(output):
>                 output = os.path.join(dir, file + "#" + repr(n) + extension)
>                 n = n + 1
>         return output
>
> I am trying to understand this code. Here dir, file is assigned the
> directory and file after os.path.split(input) divides the input in two
> parts of file and directory. Next if the output directory is mentioned
> then output dir is assigned to dir. Then in the next line it is joined
> to make the required path having output directory.
>
> After that i m not getting what is expected from the next few lines
> having match and group.



Hi Anish,


The code:

    m = numberAddedRE.match(file)
    if m:
        file = m.group(1)

is a check to see if the 'file' fits the pattern defined in numberAddedRE.
If it doesn't match properly, 'm' will have the value None, so we'll skip
the next if statment.  But if we do match the pattern, we get back a
"match" object that can tell us how the match worked.


In particular, when we say m.group(1), the regular expression engine gives
us the part of the string that matched against the first pair of
parentheses in:

     re.compile("(.*)#\d+$")


Here's an example of group() in action:

###
>>> import re
>>> regex = re.compile('(fo+)bar')
>>> m = regex.match('foooooobar!')
>>> m.group(0)
'foooooobar'
>>> m.group(1)
'foooooo'
###



It might also help if we look at the function in context.  Let's first
pretend that we have a directory structure like:

###
/home/anish/src/
    python#1.py
    python#2.py
    python#3.py
    python#4.py
###

That is, let's say that we have a directory called '/home/anish/src/',
with 4 python source files.


Try working out what happens if we call:

    print makeOutputFileName('/home/anish/src/python#1.py', None, '.py')



The code, I think, is doing too much work, so that might be what's
confusing.  It tries to find an available file name, making sure it
doesn't take the name of an existing file, and if it sees a name in a
particular format --- if that name has a trailing number --- it tries to
create a new name with the next ascending number in sequence.