Uniquely identifying each & every html template

Ferrous Cranus nikos.gr33k at gmail.com
Mon Jan 21 02:08:00 EST 2013


Τη Σάββατο, 19 Ιανουαρίου 2013 11:00:15 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
> On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
> 
> > Τη Σάββατο, 19 Ιανουαρίου 2013 12:09:28 π.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
> 
> >
> 
> >> I don't understand the problem.  A trivial Python script could scan
> 
> >>
> 
> >> through all the files in the directory, checking which ones are missing
> 
> >>
> 
> >> the identifier, and rewriting the file with the identifier added.
> 
> >
> 
> >>
> 
> >> So, since you didn't come to that conclusion, there must be some other
> 
> >>
> 
> >> reason you don't want to edit the files.  Is it that the real sources
> 
> >>
> 
> >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
> 
> >>
> 
> >> sources, these files get replaced (without identifiers)?
> 
> >
> 
> > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.
> 
> >
> 
> >
> 
> >> If that's the case, then I figure you have about 3 choices:
> 
> >> 1) use the file path as your key, instead of requiring a number
> 
> >
> 
> > No, i cannot, because it would mess things at a later time on when i for example:
> 
> >
> 
> > 1. mv name.html othername.html   (document's filename altered)
> 
> > 2. mv name.html /subfolder/name.html   (document's filepath altered)
> 
> >
> 
> > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.
> 
> >
> 
> > Pros: If the file's contents gets updated, that won't affect the counter.
> 
> > Cons: If filepath is altered, then duplicity will happen.
> 
> >
> 
> >
> 
> >> 2) use a hash of the page  (eg. md5) as your key.  of course this could
> 
> >> mean that you get a new value whenever the page is updated.  That's good
> 
> >> in many situations, but you don't give enough information to know if
> 
> >> that's desirable for you or not.
> 
> >
> 
> > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution.
> 
> >
> 
> > Pros: If filepath is altered, that won't affect the counter.
> 
> > Cons: If file's contents gets updated the, then duplicity will happen.
> 
> >
> 
> >
> 
> >> 3) Keep an external list of filenames, and their associated id numbers.
> 
> >> The database would be a good place to store such a list, in a separate table.
> 
> >
> 
> > I did not understand that solution.
> 
> >
> 
> >
> 
> > We need to find a way so even IF:
> 
> >
> 
> > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.
> 
> >
> 
> 
> 
> You don't yet have a programming problem, you have a specification 
> 
> problem.  Somehow, you want a file to be considered "the same" even when 
> 
> it's moved, renamed and/or modified.  So all files are the same, and you 
> 
> only need one id.
> 
> Don't pick a mechanism until you have an self-consistent spec.


I do have the specification.

An .html page must retain its database counter value even if its:

(renamed && moved && contents altered)


[original attributes of the file]:

filename: index.html
filepath: /home/nikos/public_html/
contents: <html> Hello </html>

[get modified to]:

filename: index2.html
filepath: /home/nikos/public_html/folder/subfolder/
contents: <html> Hello, people </html>


The file is still the same, even though its attributes got modified.
We want counter.py script to still be able to "identify" the .html page, hence its counter value in order to get increased properly.



More information about the Python-list mailing list