Using filepath method to identify an .html page
Dave Angel
d at davea.name
Tue Jan 22 14:00:16 EST 2013
On 01/22/2013 01:26 PM, Ferrous Cranus wrote:
>
>> <snip>
>
> sub hashit {
> my $url=shift;
> my @ltrs=split(//,$url);
> my $hash = 0;
>
> foreach my $ltr(@ltrs){
> $hash = ( $hash + ord($ltr)) %10000;
> }
> printf "%s: %0.4d\n",$url,$hash
>
> }
>
>
> which yields:
> $ perl testMD5.pl
> /index.html: 1066
> /about/time.html: 1547
>
If you use that algorithm to get a 4 digit number, it'll look good for
the first few files. But if you try 100 files, you've got almost 40%
chance of a collision, and if you try 10001, you've got a 100% chance.
So is it really okay to reuse the same integer for different files?
I tried to help you when you were using the md5 algorithm. By using
enough digits/characters, you can cut the likelihood of a collision
quite small. But 4 digits, don't be ridiculous.
--
DaveA
More information about the Python-list
mailing list