[Tutor] RE match on a file (fwd)

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon Feb 17 19:41:04 2003


[Ill forward Bob's response to the python-tutor list.]


---------- Forwarded message ----------
Date: Mon, 17 Feb 2003 18:26:30 -0500
From: Bob Hicks <bokchoy@adelphia.net>
To: 'Danny Yoo' <dyoo@hkn.eecs.berkeley.edu>
Subject: RE: [Tutor] RE match on a file

Maybe I am confusing myself as well. The match may work but I think I am
confused at what I am returning.

This is what I am trying to do:

1.  Connect to the Symantec site
2.  Get the latest NAVCE file

This is my code:

import ftplib, re

site = 'ftp.symantec.com'
dir='/public/english_us_canada/antivirus_definitions/norton_antivirus'

ftp = ftplib.FTP(site)
ftp.login()
ftp.cwd(dir)
files = ftp.nlst()
for fn in re.findall(r'\d{8}.*x86\.exe', "\n".join(files)):
    print "Fetching", fn,
    f = file(fn, 'wb')
    ftp.retrbinary('RETR ' + fn, f.write)
    # In case of timeout problems, decrease block size from default 8192
    # ftp.retrbinary('RETR ' + fn, f.write, 1024)
    f.close()
    print 'Done!'
ftp.quit()

Q1: Does the re.findall return a list of files?
Q2: If Q1 is yes then I should be able to a [-1] to get the last one
correct?
Q3: If Q1 is no then how do I make is a list?

I think my confusion is I assumed that the re.findall was a list of objects
an then I could do something like:

files[-1] = files

I was thinking this would a) get the last file name and b) make that file
name the only one so that when the download is started only that one file
(and then newest one date wise) would be downloaded.

No matter how I try to "slice" it though it downloads every match. Of course
my assumption could have been initially wrong.

Ah...upon looking as well I can see I changed the re.findall from re.match
which was matching the first 8 digits of all the files and giving me back
more than I was thinking it should.

> -----Original Message-----
> From: Danny Yoo [mailto:dyoo@hkn.eecs.berkeley.edu]
> Sent: Monday, February 17, 2003 6:04 PM
> To: Bob Hicks
> Cc: tutor@python.org
> Subject: Re: [Tutor] RE match on a file
>
>
>
> On Mon, 17 Feb 2003, Bob Hicks wrote:
>
> > I am trying to do a regular expression match on:
> >
> > 20030214-001-x86.exe
> >
> > r'\d{8}.*x86\.exe' does not seem to do it.
>
> Hi Bob,
>
>
> Can you explain why you think the above pattern doesn't work? I just want
> to make sure we're on the same page.  *grin*
>
> The reason we need to ask this is because r'\d{8}.*x86.exe' does appear to
> recognise your sample string:
>
> ###
> >>> pattern = re.compile(r'\d{8}.*x86\.exe')
> >>> pattern.match('20030214-001-x86.exe')
> <_sre.SRE_Match object at 0x816f360>
> ###
>
> So that matches.
>
>
> In general, whenever we're debugging, we need some test case to verify the
> problem.  Your example doesn't exhibit the problem you're looking for, so
> we can't do much without reading your mind, or guessing.  *grin*
>
> Show us a counterexample that shows where the regular expression either is
> too overzealous or too lenient, and we can work from there.  Talk to you
> later!