sync databse table based on current directory data without losign previous values

Νίκος Γκρ33κ nikos.gr33k at gmail.com
Wed Mar 6 03:57:26 EST 2013


Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k at gmail.com> writes:
> 
> 
> 
> > How can i update  the databse to only contain the existing filenames without losing the previous stored data?
> 
> 
> 
> Basically you need to keep a list (or better, a set) containing all
> 
> current filenames that you are going to insert, and finally do another
> 
> "inverse" loop where you scan all the records and delete those that are
> 
> not present anymore.
> 
> 
> 
> Of course, this assume you have a "bidirectional" identity between the
> 
> filenames you are loading and the records you are inserting, which is
> 
> not the case in the code you show:
> 
> 
> 
> > #read the containing folder and insert new filenames
> 
> > for result in os.walk(path):
> 
> > 	for filename in result[2]:
> 
> 
> 
> Here "filename" is just that, not the full path: this could result in
> 
> collisions, if your are actually loading a *tree* instead of a flat
> 
> directory, that is multiple source files are squeezed into a single
> 
> record in your database (imagine "/foo/index.html" and
> 
> "/foo/subdir/index.html").
> 
> 
> 
> With that in mind, I would do something like the following:
> 
> 
> 
>   # Compute a set of current fullpaths
> 
>   current_fullpaths = set()
> 
>   for root, dirs, files in os.walk(path):
> 
>     for fullpath in files:
> 
>       current_fullpaths.add(os.path.join(root, file))
> 
> 
> 
>   # Load'em
> 
>   for fullpath in current_fullpaths:
> 
>     
> 
>     try:
> 
>       #find the needed counter for the page URL
> 
>       cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) ) 
> 
>       data = cur.fetchone()        #URL is unique, so should only be one
> 
> 
> 
>       if not data:
> 
>         #first time for file; primary key is automatic, hit is defaulted
> 
>         cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
> 
>     except MySQLdb.Error, e:
> 
>       print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
> 
> 
> 
>   # Delete spurious
> 
>   cur.execute('''SELECT url FROM files''')  
> 
>   for rec in cur:
> 
>     fullpath = rec[0]
> 
>     if fullpath not in current_fullpaths:
> 
>       other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
> 
> 
> 
> Of course here I am assuming a lot (a typical thing we do to answer your
> 
> questions :-), in particular that the "url" field content matches the
> 
> filesystem layout, which may not be the case. Adapt it to your usecase.
> 
> 
> 
> hope this helps,
> 
> ciao, lele.
> 
> -- 
> 
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
> 
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
> 
> lele at metapensiero.it  |                 -- Fortunato Depero, 1929.

You are fantastic! Your straightforward logic amazes me!

Thank you very much for making things clear to me!!

But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:

http://superhost.gr/cgi-bin/files.py



More information about the Python-list mailing list