Generating valid identifiers

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Jul 26 11:30:09 EDT 2012


On Thu, 26 Jul 2012 14:26:16 +0200, Laszlo Nagy wrote:

> I do not want this program to generate very long identifiers. It would
> increase SQL parsing time,

Will that increase in SQL parsing time be more, or less, than the time it 
takes to generate CRC32 or SHA hashsums and append them to a truncated 
identifier?


> * Would it be a problem to use CRC32 instead of SHA? (Since security is
> not a problem, and CRC32 is faster.)

What happens if you get a collision?

That is, you have two different long identifiers:

a.b.c.d...something
a.b.c.d...anotherthing

which by bad luck both hash to the same value:

a.b.c.d.$AABB99
a.b.c.d.$AABB99

(or whatever).



> * I'm truncating the digest value to 10 characters.  Is it safe enough?
> I don't want to use more than 10 characters, because then it wouldn't be
> possible to recognize the original name. 

> * Can somebody think of a
> better algorithm, that would give a bigger chance of recognizing the
> original identifier from the modified one?

Rather than truncating the most significant part of the identifier, the 
field name, you should truncate the least important part, the middle.

a.b.c.d.e.f.g.something

goes to:

a.b...g.something

or similar.



-- 
Steven



More information about the Python-list mailing list