[Python-3000] Immutable bytes type and dbm modules

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 7 05:27:40 CEST 2007


> I thought about this issue some more.
> 
> Given that the *dbm types strive for emulating dicts, I think it makes
> sense to use strings for the keys, and bytes for the values; this
> makes them more plug-compatible with real dicts. (We should ideally
> also change the keys() method etc. to return views.) This of course
> requires that we know the encoding used for the keys. Perhaps it would
> be acceptable to pick a conservative default encoding (e.g. ASCII) and
> add an encoding argument to the open() method.
> 
> Perhaps this will work? It seems better than using str8 or bytes for the keys.

It would work, but it would not be good. The dbm files traditionally did
not have any notion of character encoding for keys or values; they are
really bytes:bytes mappings. The encoding used for the keys might not
be known, or it might not be consistent across all keys.

Furthermore, for the specific case of bsddb, some users pointed out that
they absolutely think that keys must be bytes, since they *conceptually*
aren't text at all. "Big" users of bsddb create databases where some
tables are index tables for other tables; in such tables, the keys are
combinations of fields where the byte representation allows for
efficient lookup (akin postgres "create index foo_idx on foo(f1, f2,
f3);" where the key to the index becomes the concatenation of f1, f2,
and f3 - and f2 may be INTEGER, f3 TIMESTAMP WITHOUT TIME ZONE, say).

It's always possible to treat these as if they were latin-1, but this
is so unnaturally hacky that I didn't think of it.

Regards,
Martin



More information about the Python-3000 mailing list