Using filepath method to identify an .html page

Ferrous Cranus nikos.gr33k at gmail.com
Tue Jan 22 10:21:28 EST 2013


Τη Τρίτη, 22 Ιανουαρίου 2013 5:05:49 μ.μ. UTC+2, ο χρήστης Dave Angel έγραψε:
> On 01/22/2013 09:55 AM, Ferrous Cranus wrote:
> 
> > Τη Τρίτη, 22 Ιανουαρίου 2013 4:33:03 μ.μ. UTC+2, ο χρήστης Chris Angelico έγραψε:
> 
> >> On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <nikos.gr33k at gmail.com> wrote:
> 
> >>
> 
> >>> Τη Τρίτη, 22 Ιανουαρίου 2013 3:04:41 μ.μ. UTC+2, ο χρήστης Steven D'Aprano έγραψε:
> 
> >>
> 
> >>>
> 
> >>
> 
> >>>> What do you expect int("my-web-page.html") to return? Should it return 23
> 
> >>
> 
> >>>> or 794 or 109432985462940911485 or 42?
> 
> >>
> 
> >>>
> 
> >>
> 
> >>> I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.
> 
> >>
> 
> >>
> 
> >>
> 
> >> Just run python without any args, and you'll get interactive mode. You
> 
> >>
> 
> >> can try things out there.
> 
> >>
> 
> >>
> 
> >>
> 
> >>> This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:
> 
> >>
> 
> >>>
> 
> >>
> 
> >>> /home/nikos/public_html/varsa.gr/articles/html/files/index.html
> 
> >>
> 
> >>
> 
> >>
> 
> >> That's not big. Trust me, modern databases work just fine with unique
> 
> >>
> 
> >> indexes like that. The most common way to organize the index is with a
> 
> >>
> 
> >> binary tree, so the database has to look through log(N) entries.
> 
> >>
> 
> >> That's like figuring out if the two numbers 142857 and 857142 are the
> 
> >>
> 
> >> same; you don't need to look through 1,000,000 possibilities, you just
> 
> >>
> 
> >> need to look through the six digits each number has.
> 
> >>
> 
> >>
> 
> >>
> 
> >>> 'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!
> 
> >>
> 
> >>
> 
> >>
> 
> >> Not that bad actually. I've happily used keys easily that long, and
> 
> >>
> 
> >> expected the database to ensure uniqueness without costing
> 
> >>
> 
> >> performance.
> 
> >>
> 
> >>
> 
> >>
> 
> >>> So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)
> 
> >>
> 
> >>
> 
> >>
> 
> >> Is there any chance that you'll have more than 10,000 pages? If so, a
> 
> >>
> 
> >> four-digit number is *guaranteed* to have duplicates. And if you
> 
> >>
> 
> >> research the Birthday Paradox, you'll find that any sort of hashing
> 
> >>
> 
> >> algorithm is likely to produce collisions a lot sooner than that.
> 
> >>
> 
> >>
> 
> >>
> 
> >>> Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).
> 
> >>
> 
> >>>
> 
> >>
> 
> >>> So, that why i need to get a "unique" number out of a string. please help.
> 
> >>
> 
> >>
> 
> >>
> 
> >> Ultimately, that unique number would end up being a foreign key into a
> 
> >>
> 
> >> table of URLs and IDs. So just skip that table and use the URLs
> 
> >>
> 
> >> directly - much easier. In this instance, there's no value in
> 
> >>
> 
> >> normalizing.
> 
> >>
> 
> >>
> 
> >>
> 
> >> ChrisA
> 
> >
> 
> > I insist, perhaps compeleld, to use a key to associate a number to a filename.
> 
> > Would you help please?
> 
> >
> 
> > I dont know this is supposed to be written. i just know i need this:
> 
> >
> 
> > number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)
> 
> >
> 
> > Would someone help me write that in python coding? We are talkign 1 line of code here....
> 
> >
> 
> 
> 
> I gave you every piece of that code in my last response.  So you're not 
> 
> willing to compose the line from the clues?

I cannot.
I don't even know yet if hashing needs to be used for what i need.

The only thing i know is that:

a) i only need to get a number out of string(being an absolute path)
b) That number needs to be unique, because "that" number is an indicator to the actual html file.

Would you help me write this in python?

Why the hell 

pin = int ( '/home/nikos/public_html/index.html' )

fails? because it has slashes in it?



More information about the Python-list mailing list