Generate unique ID for URL

Chris Kaynor ckaynor at zindagigames.com
Tue Nov 13 19:26:19 EST 2012


One option would be using a hash. Python's built-in hash, a 32-bit
CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
depending on the needs. Higher bit counts will reduce the odds of
accidental collisions; cryptographically secure ones if outside
attacks matter. In such a case, you'd have to roll your own means of
converting the hash back into the string if you ever need it for
debugging, and there is always the possibility of collisions. A
similar solution would be using a pseudo-random GUID using the url as
the seed.

You could use a counter if all IDs are generated by a single process
(and even in other cases with some work).

If you want to be able to go both ways, using base64 encoding is
probably your best bet, though you might get benefits by using
compression.
Chris


On Tue, Nov 13, 2012 at 3:56 PM, Richard <richardbp at gmail.com> wrote:
> Good point - one way encoding would be fine.
>
> Also this is performed millions of times so ideally efficient.
>
>
> On Wednesday, November 14, 2012 10:34:03 AM UTC+11, John Gordon wrote:
>> In <0692e6a2-343c-4eb0-be57-fe5c815efb99 at googlegroups.com> Richard <richardbp at gmail.com> writes:
>>
>>
>>
>> > I want to create a URL-safe unique ID for URL's.
>>
>> > Currently I use:
>>
>> > url_id = base64.urlsafe_b64encode(url)
>>
>>
>>
>> > >>> base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
>>
>> > 'ZG9jcy5weXRob24ub3JnL2xpYnJhcnkvdXVpZC5odG1s'
>>
>>
>>
>> > I would prefer more concise ID's.
>>
>> > What do you recommend? - Compression?
>>
>>
>>
>> Does the ID need to contain all the information necessary to recreate the
>>
>> original URL?
>>
>>
>>
>> --
>>
>> John Gordon                   A is for Amy, who fell down the stairs
>>
>> gordon at panix.com              B is for Basil, assaulted by bears
>>
>>                                 -- Edward Gorey, "The Gashlycrumb Tinies"
>
> --
> http://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list