[Tutor] operate on files based on comparing filenames to today's date

Cameron Simpson cs at cskk.id.au
Thu Mar 28 18:12:56 EDT 2019


Before getting to your specific question, a few remarks below:

On 28Mar2019 12:08, Matthew Herzog <matthew.herzog at gmail.com> wrote:
>I have cobbled together this code which gives me regex matches of files
>whose names either begin with either YYYYMMDD_? or erased_YYMMDD_? and
>whose extensions are exml or ewav.

>todaystring = date.today().strftime("%Y%m%d")
>oldest = date.today() - timedelta(days=180)
>def get_files(extensions):
>    all_files = []
>    for ext in extensions:
>        all_files.extend(Path('/Users/msh/Python/calls').glob(ext))
>        return all_files
>for file in get_files(('*.ewav', '*.exml')):
>    print(re.match("[0-9]{8}|erased_",file.name))

Your regexp in the "print" loop at the bottom does not do what you say.  
You have:

    print(re.match("[0-9]{8}|erased_",file.name))

i.e. the regexp is:

    [0-9]{8}|erased_

(a) That matches 8 digits _or_ the string "erased_".

(b) [0-9] can be written as \d for more readability.

(c) I'd use:

    (erased_)?\d{8}

which is an optional "erased_" followed by 8 digits. And for your 
purposes:

    (erased_)?(\d{8})

which will group the digits together for easy extraction:

    DATED_FILENAME_re = re.compile(r'(erased_)?(\d{8})')

    for file in get_files(('*.ewav', '*.exml')):
        m = DATED_FILENAME_re.match(file)
        if m:
            # a suitable filename
            datepart = m.group(2)
            # now you can turn that into an actual datetime.date object

>Now I need to compare the date string (regex match) in the filename to
>today's date. If the result of the comparison results in YYYMMDD being
>older than 180 days, I should print something to indicate this. If not,
>nothing need be done.
>Should I write another function to compare each matched regex to today's
>date or is that overkill? Thanks.

If you want to figure out the age, you need to convert the YYYYMMDD into 
a datetime.date and then compare _that_ to today's date. Why?  Because 
the datetime.date type knows how to work with dates correctly, avoiding 
all sorts of bug prone efforts on your own part (because human calendars 
are irregular tricky things with many special cases).

So I would drop the "todaystring" altogether. You're thinking "get the 
string from the filename and compare it to "todaystring". But what you 
_want_ it to measure the age, and that requires working with dates, not 
strings.  So instead convert the filename's string in to a date and do 
straight arithmetic with today().

So you might upgrade that regexp to group the year, month and day 
individually, pull them out and make a date object:

    import datetime
    ........
    DATED_FILENAME_re = re.compile(r'(erased_)?(\d\d\d\d)(\d\d)(\d\d)')
    ........
    # get today as a datetime.date object
    today = datetime.today()
    for file in ......:
        m = DATED_FILENAME_re.match(file)
        if m:
            prefix, year, month, day = m.group(1,2,3,4)
            year = int(year)
            month = int(month)
            day = int(day)
            # turn this into a datetime.date object
            date = datetime.date(year, month, day)
            # compute a datetime.timedelta object
            age = today - date
            if age.days > 180:
                print("old!")

I think you're right: make a function to return the datetime.date of the 
filename, so this portion:

    m = DATED_FILENAME_re.match(file)
        if m:
            prefix, year, month, day = m.group(1,2,3,4)
            year = int(year)
            month = int(month)
            day = int(day)
            # turn this into a datetime.date object
            date = datetime.date(year, month, day)

and return the computed date. Then just keep this in the for loop so it 
is obvious what's going on; it is too small to hide in a function 
without a better reason.

      # compute a datetime.timedelta object
      age = today - date
      if age.days > 180:
          print("old!")

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list