sync databse table based on current directory data without losign previous values

Lele Gaifax lele at metapensiero.it
Wed Mar 6 03:19:06 EST 2013


Νίκος Γκρ33κ <nikos.gr33k at gmail.com> writes:

> How can i update  the databse to only contain the existing filenames without losing the previous stored data?

Basically you need to keep a list (or better, a set) containing all
current filenames that you are going to insert, and finally do another
"inverse" loop where you scan all the records and delete those that are
not present anymore.

Of course, this assume you have a "bidirectional" identity between the
filenames you are loading and the records you are inserting, which is
not the case in the code you show:

> #read the containing folder and insert new filenames
> for result in os.walk(path):
> 	for filename in result[2]:

Here "filename" is just that, not the full path: this could result in
collisions, if your are actually loading a *tree* instead of a flat
directory, that is multiple source files are squeezed into a single
record in your database (imagine "/foo/index.html" and
"/foo/subdir/index.html").

With that in mind, I would do something like the following:

  # Compute a set of current fullpaths
  current_fullpaths = set()
  for root, dirs, files in os.walk(path):
    for fullpath in files:
      current_fullpaths.add(os.path.join(root, file))

  # Load'em
  for fullpath in current_fullpaths:
    
    try:
      #find the needed counter for the page URL
      cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) ) 
      data = cur.fetchone()        #URL is unique, so should only be one

      if not data:
        #first time for file; primary key is automatic, hit is defaulted
        cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
    except MySQLdb.Error, e:
      print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )

  # Delete spurious
  cur.execute('''SELECT url FROM files''')  
  for rec in cur:
    fullpath = rec[0]
    if fullpath not in current_fullpaths:
      other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))

Of course here I am assuming a lot (a typical thing we do to answer your
questions :-), in particular that the "url" field content matches the
filesystem layout, which may not be the case. Adapt it to your usecase.

hope this helps,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele at metapensiero.it  |                 -- Fortunato Depero, 1929.




More information about the Python-list mailing list