{} for set notation

Nick Vatamaniuc vatamane at gmail.com
Fri Jul 14 12:05:33 EDT 2006


I really like the set notation idea. Now that sets are first class
"citizens" along with dicts, lists and tuples I think they should be
used when it makes sense to use them A keyset of a dictionary should be
viewed as a set not a list because it is a key_set_ after all. Also
sets should get back their traditional notation of '{' and '}', which
dictionaries have been in 'borrowing'  for all this time in Python.

When defining a non-empty set it could be unambiguous to share the
notation. So :
{1,2,3}  is  set([1,2,3])
while
{1:'a', 2:'b', 3:'c'} could still be the a dictionary. Of course then
{1, 2, 3:'b'} would be undefined and would raise an exception or
alternatively be equivalent to {1:None, 2:None, 3:'b'} - a dictionary.

As far as the set and dictionary notation being "confusing" it seems
that a common notation is actually  a _benefit_, after all sets and
dictionaries are not that different!  A dictionary is a mapping
(surgective explicit function?) from  a _set_ of keys to a set
(actually a bad or multiset in Python) of values. At the same time a
set can also be regarded as a degenerate dictionary as in  {1:None,
2:None, 3:None}. A similarity of notation does make sense to me.

Of course the empty set is the problem since {} could be interpreted
ambiguously as both a set or a dictionary. I think it makes sense to
give the '{}' notation to the empty _set_ as this will make more sense
as far as common sense goes. If the proposal is to have {x} be the a
set([x]) then it only makes sense for {} be a set([]). This will break
compatibility with old code and that is why it should be in Python 3000
not in 2.x.x The empty dictionary is probably best represented as {:},
it is actually more clear this way as it shows that there is a key and
a value separated by ':' and in this case they are both missing so it
is an empty dictionary.

Also the frozenset, although not used as often, could still probably
get its own notation too.
For example:
1.    ({1,2,3})  - a symmetry with tuples, which are also immutable.
The problem of a one element tuple i.e. (10,) not (10) will also be
present here. So just as there is a need to use a comma to make
(10,)=tuple([10]) one would have to use a comma to specify that a tuple
is needed and not a a frozenset() but at the same time the ({1,2,3})
could then never be reduced to {1,2,3}.
 In other words:
({1,2,3},) is  tuple({1,2,3})
({1,2,3}) is a frozenset([1,2,3]) and never just {1,2,3}.
This notation would make the parser go 'nuts'. I think the next idea
might be better:

2.  _{1,2,3}_ - the underscores '_'  intuitively could mean that the
braces are fixed blocks and will not "move"  to accomodate addition or
removal of elements i.e.  the fact that the frozenset is immutable. Or
perhaps a more verbose _{_1,2,3_}_ would be better, not sure...

3. {|1,2,3|}  or maybe  |{1,2,3}|  - same aesthetic rationale as above,
'|'s look like 'fences' that will not allow the braces to 'move', but
this would look to much like Ruby's blocks so  1 and 2 might be better.

In general a 'set' are a fundamental data structure. It has always been
secondary in traditional programming languages.  For simplicity in
implementation arrays and lists have been  used to mimic a set.  Now
that Python has a built in set it only makes sense to give it its own
notation and maybe Python 3000 is just the right time for it.

- Nick Vatamaniuc

bearophileHUGS at lycos.com wrote:
> >From this interesting blog entry by Lawrence Oluyede:
> http://www.oluyede.org/blog/2006/07/05/europython-day-2/
> and the Py3.0 PEPs, I think the people working on Py3.0 are doing a
> good job, I am not expert enough (so I don't post this on the Py3.0
> mailing list), but I agree with most of the things they are agreeing
> to. Few notes:
>
> - input() vanishes and raw_input() becomes sys.stdin.readline(). I
> think a child that is learning to program with Python can enjoy
> something simpler: input() meaning what raw_input() means today.
>
>
> - dict.keys() and items() returns a set view
>
> This is being discussed here too a lot, I agree that they will just
> become equivalent to iterkeys() and iteritems().
>
>
> - dict.values() a bag (multiset) view
>
> I think this isn't a good idea, I think bags can be useful but in this
> situation they make things too much complex.
>
>
> http://www.python.org/dev/peps/pep-3100/#core-language :
> - Set literals and comprehensions: {x} means set([x]); {x, y} means
> set([x, y]). {F(x) for x in S if P(x)} means set(F(x) for x in S if
> P(x)). NB. {range(x)} means set([range(x)]), NOT set(range(x)). There's
> no literal for an empty set; use set() (or {1}&{2} :-). There's no
> frozenset literal; they are too rarely needed.
>
> I like the idea of set literals, but using {1:2} for dicts and {1, 2}
> for sets may look a bit confusing.
> And using {} for the empty dict is confusing even more, newbies will
> use it a lot for empty sets. Maybe the {:} for the empty dict and {}
> for the empty set are a bit better.
> Maybe a better syntax can be use a different denotator, to distinguis
> the two structures better. Some possibilities are nice looking but not
> easy to type:
> «1, 2»
> Other may be confused with bitwise operators:
> |1, 2|
> Others are bad looking and not easy to type (some nationalized
> keyboards don't have the `):
> §1, 2§
> `1, 2`
> Some of them are too much long:
> <<<1, ,2>>>
> Maybe using two nested like this is better (you can't put a dict or set
> in a set, so there are no ambiguities):
> {{1, 2}}
>
> I don't have a definitive good solution, but I think that adopting a
> bad solution is worse than using set(...). Set literals are cute but
> not necessary. Choosing things that increase the probability of bugs
> isn't good.
>
> ---------------------
>
> In the past I have suggested other possibilities for Py3.0, nothing
> really important, but few things can be interesting.
>
> - cmp() (or comp(), comp4(), etc) returns 4 values (<, ==, >, not
> comparable).
> - do - while.
> - NOT OR AND XOR as bitwise operators syntax.
> - Better syntax for octals and hex numbers.
> - obj.copy() and obj.deepcopy() methods for all objects.
> - Simplification and unification of string formatting (I have seen they
> are working on this already).
> - Intersection and subtraction among dicts.
>
> Less important things:
> - More human syntax for REs
> - Optional auto stripping of newlines during an iteration on a file.
> - String split that accepts a sequence of separators too.
> - a function in the math library to test for approximate FP equality.
> - something in cmath to do a better conversion of polar<->cartesian
> complex number conversion
> - xpermutations and xcombinations generators in the standard library.
> 
> Bye,
> bearophile




More information about the Python-list mailing list