[Python-3000] Immutable bytes type and dbm modules

Guido van Rossum guido at python.org
Tue Aug 7 02:39:15 CEST 2007


On 8/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> >> I don't know how to do that. All implementation strategies I
> >> can think of have significant drawbacks.
> >
> > Can you elaborate about the issues?
>
> It's a decision tree:
>
> 0. whichdb fails
>
> 1. should the DB APIs use strings or bytes as keys and values?
>    Given the discussion of bsddb, I went for "bytes". I replace
>
>    f["1"] = b"1"
>  with
>    f[b"1"] = b"1"
>
> 2. then,  dumbdbm fails, with TypeError: keys must be strings.
>    I change __setitem__ to expect bytes instead of basestring
>
> 3. it fails with unhashable type: 'bytes' in line 166:
>
>    if key not in self._index:
>
>    _index is a dictionary. It's really essential that the key
>    can be found quickly in _index, since this is how it finds
>    the data in the database (so using, say, a linear search would
>    be no option)

I thought about this issue some more.

Given that the *dbm types strive for emulating dicts, I think it makes
sense to use strings for the keys, and bytes for the values; this
makes them more plug-compatible with real dicts. (We should ideally
also change the keys() method etc. to return views.) This of course
requires that we know the encoding used for the keys. Perhaps it would
be acceptable to pick a conservative default encoding (e.g. ASCII) and
add an encoding argument to the open() method.

Perhaps this will work? It seems better than using str8 or bytes for the keys.

> > Not quite. It's the least evil. I'm hoping to put off the decision.
>
> For how long? Do you expect to receive further information that will
> make a decision simpler?

I'm waiting for a show-stopper issue that can't be solved without
having an immutable bytes type. It would be great if we could prove to
ourselves that such a show-stopper will never happen; or if we found
one quickly. But so far the show-stopper candidates aren't convincing.
At the same time we still have enough uses of str9 and PyString left
in the code base that we can't kill str8 yet.

It would be great if we had the decision before alpha 1 but I'm okay
if it remains open a bit longer (1-2 months past alpha 1).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list