16bit hash

Robin Becker robin at reportlab.com
Thu Jun 28 08:13:33 EDT 2007


Martin v. Löwis wrote:

0 the ideal hash

:)

can't be argued with

>.......
> So: what are your input data, and what is the
> distribution among them?
> 
> Regards,
> Martin
> 
I'm trying to create UniqueID's for dynamic postscript fonts. According to my 
resources we don't actually need to use these, but if they are required by a 
particular postscript program (perhaps to make a print run efficient) then the 
private range of these ID's is 4000000<=UID<=4999999 ie a range of one million.

So I probably really need an 18 bit hash

The data going into the font consists of

fontBBox '[-415 -431 2014 2033]'
charmaps ['dup (\000) 0 get /C0 put',......]
metrics ['/C0 1251 def',.....]
bboxes ['/C29 [0 0 512 0] def',.......]
chardefs ['/C0 {newpath 224 418 m 234 336 ......def}',......]

ie a bunch of lists of strings which are eventually joined together and written 
out with a template to make the postscript definition.

The UniqueID is used by PS interpreters to avoid recreating particular glyphs so 
ideally I would number these fonts sequentially using a global count, but in 
practice several processes separated by application and time can produce 
postscript which eventually gets merged back together.

If the UID's clash then the printer produces very strange output.

I'm fairly sure there's no obvious python way to ensure the separated processes 
can communicate except via the printer. So either I use a python based scheme 
which reduces the risk of clashes ie random or some data based hash scheme or I 
attempt to produce a postscript solution like looking for a private global 
sequence number.

I'm not sure my postscript is really good enough to do the latter so I hoped to 
pursue a python based approach which has a low probability of busting. 
Originally I thought the range was a 16bit number which is why I started with 
16bit hashes.
-- 
Robin Becker




More information about the Python-list mailing list