Uniquely identifying each & every html template

Dave Angel d at davea.name
Fri Jan 18 17:09:28 EST 2013


On 01/18/2013 03:48 PM, Ferrous Cranus wrote:
> I use this .htaccess file to rewrite every .html request to counter.py
>
> # =================================================================================================================
> RewriteEngine On
> RewriteCond %{REQUEST_FILENAME} -f
> RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
> # =================================================================================================================
>
>
>
> counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
> It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there
>
> # =================================================================================================================
> # open current html template and get the page ID number
> # =================================================================================================================
> f = open( '/home/nikos/public_html/' + page )
>
> # read first line of the file
> firstline = f.readline()
>
> # find the ID of the file and store it
> pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
> # =================================================================================================================
>
> It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
>
> What is the problem you ask?!
> Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
>
> index.html      <!-- 1 -->
> somefile.html   <!-- 2-->
> other.html      <!-- 3 -->
> nikos.html      <!-- 4 -->
> cool.html       <!-- 5 -->
>
> to HELP counter.py identify each webpage at a unique way.
>
> Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
> Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
> Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.
>
> My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.
>
> or perhaps by modifying them a bit..... but in an automatic way....?
>
> Thank you ALL in advance.
>
>

I don't understand the problem.  A trivial Python script could scan 
through all the files in the directory, checking which ones are missing 
the identifier, and rewriting the file with the identifier added.

So, since you didn't come to that conclusion, there must be some other 
reason you don't want to edit the files.  Is it that the real sources 
are elsewhere (e.g. Dreamweaver), and whenever one recompiles those 
sources, these files get replaced (without identifiers)?

If that's the case, then I figure you have about 3 choices:

1) use the file path as your key, instead of requiring a number
2) use a hash of the page  (eg. md5) as your key.  of course this could 
mean that you get a new value whenever the page is updated.  That's good 
in many situations, but you don't give enough information to know if 
that's desirable for you or not.
3) Keep an external list of filenames, and their associated id numbers. 
  The database would be a good place to store such a list, in a separate 
table.

-- 
DaveA



More information about the Python-list mailing list