From tim.one@home.com Fri Jun 1 01:24:01 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 20:24:01 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0005_01C0EA0F.A145F760 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Another version of the patch attached, a bit faster and with a large new comment block explaining it. It's looking good! As I hope the new comments make clear, nothing about this approach is "a mystery" -- there are explainable reasons for each fiddly bit. This gives me more confidence in it than in the previous approach, and, indeed, it turned out that when I *thought* "hmm! I bet this change would be a little faster!", it actually was . ------=_NextPart_000_0005_01C0EA0F.A145F760 Content-Type: text/plain; name="dict.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dict.txt" Index: Objects/dictobject.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.96 diff -c -r2.96 dictobject.c *** Objects/dictobject.c 2001/05/27 07:39:22 2.96 --- Objects/dictobject.c 2001/06/01 00:17:07 *************** *** 12,123 **** */ #define MINSIZE 8 =20 ! /* define this out if you don't want conversion statistics on exit */ #undef SHOW_CONVERSION_COUNTS =20 /* ! Table of irreducible polynomials to efficiently cycle through ! GF(2^n)-{0}, 2<=3Dn<=3D30. A table size is always a power of 2. ! For a table size of 2**i, the polys entry is 2**i + j for some j in 1 = thru ! 2**i-1 inclusive. The polys[] entries here happen to add in the = smallest j ! values "that work". Work means this: given any integer k in 1 thru = 2**i-1 ! inclusive, a poly works if & only if repeating this code: ! print k ! k <<=3D 1 ! if k >=3D 2**i: ! k ^=3D poly ! prints every integer in 1 thru 2**i-1 inclusive exactly once before = printing=20 ! k a second time. Theory can be used to find such polys efficiently, = but the=20 ! operational defn. of "works" is sufficient to find them in reasonable = time=20 ! via brute force program (hint: any poly that has an even number of 1 = bits=20 ! cannot work; ditto any poly with low bit 0; exploit those). !=20 ! Some major subtleties: Most hash schemes depend on having a "good" = hash ! function, in the sense of simulating randomness. Python doesn't: = some of ! its hash functions are trivial, such as hash(i) =3D=3D i for ints i = (excepting ! i =3D=3D -1, because -1 is the "error occurred" return value from = tp_hash). !=20 ! This isn't necessarily bad! To the contrary, that our hash tables are = powers ! of 2 in size, and that we take the low-order bits as the initial table = index, ! means that there are no collisions at all for dicts indexed by a = contiguous ! range of ints. This is "better than random" behavior, and that's very ! desirable. !=20 ! On the other hand, when collisions occur, the tendency to fill = contiguous ! slices of the hash table makes a good collision resolution strategy = crucial; ! e.g., linear probing is right out. !=20 ! Reimer Behrends contributed the idea of using a polynomial-based = approach,=20 ! using repeated multiplication by x in GF(2**n) where a polynomial is = chosen=20 ! such that x is a primitive root. This visits every table location = exactly=20 ! once, and the sequence of locations probed is highly non-linear. !=20 ! The same is also largely true of quadratic probing for power-of-2 = tables, of ! the specific !=20 ! (i + comb(1, 2)) mod size ! (i + comb(2, 2)) mod size ! (i + comb(3, 2)) mod size ! (i + comb(4, 2)) mod size ! ... ! (i + comb(j, 2)) mod size !=20 ! flavor. The polynomial approach "scrambles" the probe indices better, = but ! more importantly allows to get *some* additional bits of the hash code = into ! play via computing the initial increment, thus giving a weak form of = double ! hashing. Quadratic probing cannot be extended that way (the first = probe ! offset must be 1, the second 3, the third 6, etc). !=20 ! Christian Tismer later contributed the idea of using polynomial = division ! instead of multiplication. The problem is that the multiplicative = method ! can't get *all* the bits of the hash code into play without expensive ! computations that slow down the initial index and/or initial increment ! computation. For a set of keys like [i << 16 for i in range(20000)], = under ! the multiplicative method the initial index and increment were the = same for ! all keys, so every key followed exactly the same probe sequence, and = so ! this degenerated into a (very slow) linear search. The division = method uses ! all the bits of the hash code naturally in the increment, although it = *may* ! visit locations more than once until such time as all the high bits of = the ! increment have been shifted away. It's also impossible to tell in = advance ! whether incr is congruent to 0 modulo poly, so each iteration of the = loop has ! to guard against incr becoming 0. These are minor costs, as we = usually don't ! get into the probe loop, and when we do we usually get out on its = first ! iteration. */ =20 - static long polys[] =3D { - /* 4 + 3, */ /* first active entry if MINSIZE =3D=3D 4 */ - 8 + 3, /* first active entry if MINSIZE =3D=3D 8 */ - 16 + 3, - 32 + 5, - 64 + 3, - 128 + 3, - 256 + 29, - 512 + 17, - 1024 + 9, - 2048 + 5, - 4096 + 83, - 8192 + 27, - 16384 + 43, - 32768 + 3, - 65536 + 45, - 131072 + 9, - 262144 + 39, - 524288 + 39, - 1048576 + 9, - 2097152 + 5, - 4194304 + 3, - 8388608 + 33, - 16777216 + 27, - 33554432 + 9, - 67108864 + 71, - 134217728 + 39, - 268435456 + 9, - 536870912 + 5, - 1073741824 + 83 - /* 2147483648 + 9 -- if we ever boost this to unsigned long */ - }; -=20 /* Object used as dummy key to fill deleted entries */ static PyObject *dummy; /* Initialized by first call to = newdictobject() */ =20 --- 12,117 ---- */ #define MINSIZE 8 =20 ! /* Define this out if you don't want conversion statistics on exit. */ #undef SHOW_CONVERSION_COUNTS =20 + /* See large comment block below. This must be >=3D 1. */ + #define PERTURB_SHIFT 5 +=20 /* ! Major subtleties ahead: Most hash schemes depend on having a "good" = hash ! function, in the sense of simulating randomness. Python doesn't: its = most ! important hash functions (for strings and ints) are very regular in = common ! cases: !=20 ! >>> map(hash, (0, 1, 2, 3)) ! [0, 1, 2, 3] ! >>> map(hash, ("namea", "nameb", "namec", "named")) ! [-1658398457, -1658398460, -1658398459, -1658398462] ! >>> !=20 ! This isn't necessarily bad! To the contrary, in a table of size 2**i, = taking ! the low-order i bits as the initial table index is extremely fast, and = there ! are no collisions at all for dicts indexed by a contiguous range of = ints. ! The same is approximately true when keys are "consecutive" strings. = So this ! gives better-than-random behavior in common cases, and that's very = desirable. !=20 ! OTOH, when collisions occur, the tendency to fill contiguous slices of = the ! hash table makes a good collision resolution strategy crucial. Taking = only ! the last i bits of the hash code is also vulnerable: for example, = consider ! [i << 16 for i in range(20000)] as a set of keys. Since ints are = their own ! hash codes, and this fits in a dict of size 2**15, the last 15 bits of = every ! hash code are all 0: they *all* map to the same table index. !=20 ! But catering to unusual cases should not slow the usual ones, so we = just take ! the last i bits anyway. It's up to collision resolution to do the = rest. If ! we *usually* find the key we're looking for on the first try (and, it = turns ! out, we usually do -- the table load factor is kept under 2/3, so the = odds ! are solidly in our favor), then it makes best sense to keep the = initial index ! computation dirt cheap. !=20 ! The first half of collision resolution is to visit table indices via = this ! recurrence: !=20 ! j =3D ((5*j) + 1) mod 2**i !=20 ! For any initial j in range(2**i), repeating that 2**i times generates = each ! int in range(2**i) exactly once (see any text on random-number = generation for ! proof). By itself, this doesn't help much: like linear probing = (setting j ! +=3D 1, or j -=3D 1, on each loop trip), it scans the table entries in = a fixed ! order. This would be bad, except that's not the only thing we do, and = it's ! actually *good* in the common cases where hash keys are consecutive. = In an ! example that's really too small to make this entirely clear, for a = table of ! size 2**3 the order of indices is: !=20 ! 0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 [and here it's = repeating] !=20 ! If two things come in at index 5, the first place we look after is = index 2, ! not 6, so if another comes in at index 6 the collision at 5 didn't = hurt it. ! Linear probing is deadly in this case because there the fixed probe = order ! is the *same* as the order consecutive keys are likely to arrive. But = it's ! extremely unlikely hash codes will follow a 5*j+1 recurrence by = accident, ! and certain that consecutive hash codes do not. !=20 ! The other half of the strategy is to get the other bits of the hash = code ! into play. This is done by initializing a (unsigned) vrbl "perturb" = to the ! full hash code, and changing the recurrence to: !=20 ! j =3D (5*j) + 1 + perturb; ! perturb >>=3D PERTURB_SHIFT; ! use j % 2**i as the next table index; !=20 ! Now the probe sequence depends (eventually) on every bit in the hash = code, ! and the pseudo-scrambling property of recurring on 5*j+1 us more = valuable. ! because it quickly magnifies small differences in the bits that didn't = affect ! the initial index. Note that because perturb is unsigned, if the = recurrence ! is executed often enough perturb eventually becomes and remains 0. At = that ! point (very rarely reached) the recurrence is on (just) 5*j+1 again, = and ! that's certain to find an empty slot eventually (since it generates = every int ! in range(2**i), and we make sure there's always at least one empty = slot). !=20 ! Selecting a good value for PERTURB_SHIFT is a balancing act. You want = it ! small so that the high bits of the hash code continue to affect the = probe ! sequence across iterations; but you want it large so that in really = bad cases ! the high-order hash bits have an effect on early iterations. 5 was = "the ! best" in minimizing total collisions across experiments Tim Peters = ran (on ! both normal and pathological cases), but 4 and 6 weren't significantly = worse. !=20 ! Historical: Reimer Behrends contributed the idea of using a = polynomial-based ! approach, using repeated multiplication by x in GF(2**n) where an = irreducible ! polynomial for each table size was chosen such that x was a primitive = root. ! Christian Tismer later extended that to use division by x instead, as = an ! efficient way to get the high bits of the hash code into play. This = scheme ! also gave excellent collision statistics, but was more expensive: two ! if-tests were required inside the loop; computing "the next" index = took about ! the same number of operations but without as much potential = parallelism ! (e.g., computing 5*j can go on at the same time as computing 1+perturb = in the ! above, and then shifting perturb can be done while the table index is = being ! masked); and the dictobject struct required a member to hold the = table's ! polynomial. In Tim's experiments the current scheme ran faster, and = with ! less code and memory. */ =20 /* Object used as dummy key to fill deleted entries */ static PyObject *dummy; /* Initialized by first call to = newdictobject() */ =20 *************** *** 168,174 **** int ma_fill; /* # Active + # Dummy */ int ma_used; /* # Active */ int ma_size; /* total # slots in ma_table */ - int ma_poly; /* appopriate entry from polys vector */ /* ma_table points to ma_smalltable for small tables, else to * additional malloc'ed memory. ma_table is never NULL! This rule * saves repeated runtime null-tests in the workhorse getitem and --- 162,167 ---- *************** *** 202,209 **** (mp)->ma_table =3D (mp)->ma_smalltable; \ (mp)->ma_size =3D MINSIZE; \ (mp)->ma_used =3D (mp)->ma_fill =3D 0; \ - (mp)->ma_poly =3D polys[0]; \ - assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2); \ } while(0) =20 PyObject * --- 195,200 ---- *************** *** 235,262 **** This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead). - However, instead of going through the table at constant steps, we = cycle - through the values of GF(2^n). This avoids modulo computations, being - much cheaper on RISC machines, without leading to clustering. -=20 - The initial probe index is computed as hash mod the table size. - Subsequent probe indices use the values of x^i in GF(2^n)-{0} as an = offset, - where x is a root. The initial offset is derived from hash, too. =20 All arithmetic on hash should ignore overflow. =20 ! (This version is due to Reimer Behrends, some ideas are also due to ! Jyrki Alakuijala and Vladimir Marangozov.) =20 This function must never return NULL; failures are indicated by = returning a dictentry* for which the me_value field is NULL. Exceptions are = never reported by this function, and outstanding exceptions are maintained. */ static dictentry * lookdict(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int incr; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; --- 226,268 ---- This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead). =20 + The initial probe index is computed as hash mod the table size. = Subsequent + probe indices are computed as explained earlier. +=20 All arithmetic on hash should ignore overflow. =20 ! (The details in this version are due to Tim Peters, building on many = past ! contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir = Marangozov and ! Christian Tismer). =20 This function must never return NULL; failures are indicated by = returning a dictentry* for which the me_value field is NULL. Exceptions are = never reported by this function, and outstanding exceptions are maintained. */ +=20 + /* #define DUMP_HASH_STUFF */ + #ifdef DUMP_HASH_STUFF + static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0; + #define BUMP_ENTRY ++nEntry + #define BUMP_COLLIDE ++nCollide + #define BUMP_TRIP ++nTrip + #define PRINT_HASH_STUFF \ + if ((nEntry & 0x1ff) =3D=3D 0) \ + fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip) +=20 + #else + #define BUMP_ENTRY + #define BUMP_COLLIDE + #define BUMP_TRIP + #define PRINT_HASH_STUFF + #endif +=20 static dictentry * lookdict(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int perturb; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; *************** *** 265,273 **** register int checked_error =3D 0; register int cmp; PyObject *err_type, *err_value, *err_tb; ! /* We must come up with (i, incr) such that 0 <=3D i < ma_size ! and 0 < incr < ma_size and both are a function of hash. ! i is the initial table index and incr the initial probe offset. */ i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) --- 271,277 ---- register int checked_error =3D 0; register int cmp; PyObject *err_type, *err_value, *err_tb; ! BUMP_ENTRY; i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) *************** *** 294,309 **** } freeslot =3D NULL; } ! /* Derive incr from hash, just to make it more arbitrary. Note that ! incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); --- 298,310 ---- } freeslot =3D NULL; } ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) { ! BUMP_TRIP; ! i =3D (i << 2) + i + perturb + 1; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); *************** *** 335,344 **** } else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 336,341 ---- *************** *** 356,362 **** lookdict_string(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int incr; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; --- 353,359 ---- lookdict_string(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int perturb; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; *************** *** 370,377 **** mp->ma_lookup =3D lookdict; return lookdict(mp, key, hash); } ! /* We must come up with (i, incr) such that 0 <=3D i < ma_size ! and 0 < incr < ma_size and both are a function of hash */ i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) --- 367,374 ---- mp->ma_lookup =3D lookdict; return lookdict(mp, key, hash); } ! BUMP_ENTRY; ! PRINT_HASH_STUFF; i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) *************** *** 385,400 **** } freeslot =3D NULL; } ! /* Derive incr from hash, just to make it more arbitrary. Note that ! incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key --- 382,394 ---- } freeslot =3D NULL; } ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) { ! BUMP_TRIP; ! i =3D (i << 2) + i + perturb + 1; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key *************** *** 404,413 **** return ep; if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 398,403 ---- *************** *** 448,454 **** static int dictresize(dictobject *mp, int minused) { ! int newsize, newpoly; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; --- 438,444 ---- static int dictresize(dictobject *mp, int minused) { ! int newsize; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; *************** *** 456,475 **** =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused, and its poly[] entry. */ ! newpoly =3D 0; ! newsize =3D MINSIZE; ! for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) { ! if (newsize > minused) { ! newpoly =3D polys[i]; ! break; ! } ! newsize <<=3D 1; ! if (newsize < 0) /* overflow */ ! break; ! } ! if (newpoly =3D=3D 0) { ! /* Ran out of polynomials or newsize overflowed. */ PyErr_NoMemory(); return -1; } --- 446,457 ---- =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused. */ ! for (newsize =3D MINSIZE; ! newsize <=3D minused && newsize >=3D 0; ! newsize <<=3D 1) ! ; ! if (newsize < 0) { PyErr_NoMemory(); return -1; } *************** *** 511,517 **** mp->ma_table =3D newtable; mp->ma_size =3D newsize; memset(newtable, 0, sizeof(dictentry) * newsize); - mp->ma_poly =3D newpoly; mp->ma_used =3D 0; i =3D mp->ma_fill; mp->ma_fill =3D 0; --- 493,498 ---- *************** *** 1255,1261 **** if (a->ma_used !=3D b->ma_used) /* can't be equal if # of entries differ */ return 0; ! =20 /* Same # of entries -- check all of 'em. Exit early on any diff. */ for (i =3D 0; i < a->ma_size; i++) { PyObject *aval =3D a->ma_table[i].me_value; --- 1236,1242 ---- if (a->ma_used !=3D b->ma_used) /* can't be equal if # of entries differ */ return 0; !=20 /* Same # of entries -- check all of 'em. Exit early on any diff. */ for (i =3D 0; i < a->ma_size; i++) { PyObject *aval =3D a->ma_table[i].me_value; ------=_NextPart_000_0005_01C0EA0F.A145F760-- From tim.one@home.com Fri Jun 1 02:32:30 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 21:32:30 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com> Message-ID: Heh. I was implementing 128-bit floats in software, for Cray, in about 1980. They didn't do it because they *wanted* to make the Cray boxes look like pigs . A 128-bit float type is simply necessary for some scientific work: not all problems are well-conditioned, and the "extra" bits can vanish fast. Went thru the same bit at KSR. Just yesterday Konrad Hinsen was worrying on c.l.py that his scripts that took 2 hours using native floats zoomed to 5 days when he started using GMP's arbitrary-precision float type *just* to get 100 bits of precision. When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was never quite sure why the founders thought that would be a killer selling point, but it wasn't for floats. Down in the trenches we thought it would be mondo cool to have an address space so large that for the rest of our lives we'd never need to bother calling free() again <0.8 wink>. From tim.one@home.com Fri Jun 1 02:46:11 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 21:46:11 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531124533.J690@xs4all.nl> Message-ID: [Thomas Wouters] > Why ? Bumping register size doesn't mean Intel expects to use it all as > address space. They could be used for video-processing, Bingo. Common wisdom holds that vector machines are dead, but the truth is virtually *everyone* runs on a vector box now: Intel just renamed "vector" to "multimedia" (or AMD to "3D Now!"), and adopted a feeble (but ever-growing) subset of traditional vector machines' instruction sets. > or to represent a modest range of rationals , or to help core > 'net routers deal with those nasty IPv6 addresses. KSR's founders had in mind bit-level addressability of networks of machines spanning the globe. Were he to press the point, though, I'd have to agree with Eric that they didn't really *need* 128 bits for that modest goal. > I'm sure cryptomunchers would like bigger registers as well. Agencies we can't talk about would like them as big as they can get them. Each vector register in a Cray box actually consisted of 64 64-bit words, or 4K bits per register. Some "special" models were constructed where the vector FPU was thrown away and additional bit-fiddling units added in its place: they really treated the vector registers as giant bitstrings, and didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. > Oh wait... I get it! You were trying to get yourself in the > historybooks as the guy that said "64 bits ought to be enough for > everyone" :-) That would be foolish indeed! 128, though, now *that's* surely enough for at least a decade . From fdrake@acm.org Fri Jun 1 02:45:45 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 21:45:45 -0400 (EDT) Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531044332.B5026@thyrsus.com> Message-ID: <15126.62409.909290.736779@cj42289-a.reston1.va.home.com> Tim Peters writes: > When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was > never quite sure why the founders thought that would be a killer selling > point, but it wasn't for floats. Down in the trenches we thought it would > be mondo cool to have an address space so large that for the rest of our > lives we'd never need to bother calling free() again <0.8 wink>. And given what (little) I know about the memory architecture on those things, that actually would have be quite reasonable on that platform! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Fri Jun 1 03:23:47 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 22:23:47 -0400 Subject: [Python-Dev] FW: CP4E and Python newbies, it works! Message-ID: Good for the soul! -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Ron Stephens [mailto:rdsteph@earthlink.net] Sent: Thursday, May 31, 2001 7:12 PM To: python-list@python.org Subject: CP4E and Python newbies, it works! I am a complete newbie, and with a very low programming IQ. Although I had programmed a little in college thirty years ago, in Basic, PL/1 and a very little assembler, and fooled around in later years on PC's at home with Basic, then tried PERL, then an effort at Java, they were all too much trouble to really use to program, given that it was a *hobby* that was supposed to be fun. After all, I have a demanding day job that has nothing to do with software, that requires extensive travel, and four kids, a wife, two dogs, and a cat. Java et al, by the time I had digested a couple of books and put in a lot of hours, was just no fun at all to program; and I had to look in the book every other line of code just to recall the syntax etc.; I could not keep it in my head. Now, four months into Python, after being attracted by reading a blurb about Guido van Rossum's Computer Programming for Everybody project, I am in awe of his achievement. I am having fun; and if I can do so then almost anyone can. I am really absent minded, lazy, and not good at detail. Yet I have done the following in four months, and I believe Python therefore has the potential to open up programming to a much wider audience for a lot of people, which is nice: 1. I have written a half dozen scripts that are meaningful to me in Python, more than I ever accomplished with any other language. 2. I am able to have fun by sitting down in the evening, or especially on a weekend, and just programming in Python. The syntax and keywords are gratifyingly just in my head, enough anyway that I can just program like I am having a conversation, and check the details later for errors etc. This is the most satisfying thing of all. 3. I find the debugger just works; magically, it helps me turn my scripts into actual working programs, simply by rather mindlessly following the road laid out for me by using the debugger. 4. I have pleasurably read more Python books from front cover to back than I care to admit. I must be enjoying myself ;-))) 5. I am exploring Jython, which is also pleasurable. After fooling around with Java a couple of years ago, it is really a kick to see jython generating such detailed Java code for me, just as if I had written it (but it would have taken me untold pain to actually do so in Java). Whether or not I actually end up using the java code so generated, I still am enjoying the sheer experience. 6. I have Zope and other things to look forward to. 7. I am able to enjoy the discussions on this newsgroup, even though they are over my head technically. I find them intriguing. Now, I may never actually accomplish anything truly useful by my programming. But I am happy. I hope that others, younger and brighter than myself, who have an interest in programming, but need the right stimulus to get going, will find Python and produce programs of real value. I think Guido van Rossum and his team should be very proud of what they are enabling. The CP4E idea is alive and well. My hat's off to Guido and the whole community which he has spawned, especially those on this newsgroup. I am humbled and honored to read your erudite technical discussions, as a voyeur of mysteries and wonders I can only dimly see on the horizon, but that nonetheless fill me with mental delight. Ron Stephens -- http://mail.python.org/mailman/listinfo/python-list From esr@thyrsus.com Fri Jun 1 04:51:48 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:51:48 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:32:30PM -0400 References: <20010531044332.B5026@thyrsus.com> Message-ID: <20010531235148.B14591@thyrsus.com> Tim Peters : > A 128-bit float type is simply necessary for some > scientific work: not all problems are well-conditioned, and the "extra" > bits can vanish fast. Makes me wonder how competent your customers' numerical analysts were. Where the heck did they think they were getting data with that many digits of accuracy? (Note that I didn't say "precision"...) -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From esr@thyrsus.com Fri Jun 1 04:54:33 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:54:33 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:46:11PM -0400 References: <20010531124533.J690@xs4all.nl> Message-ID: <20010531235433.C14591@thyrsus.com> Tim Peters : > Agencies we can't talk about would like them as big as they can get them. > Each vector register in a Cray box actually consisted of 64 64-bit words, or > 4K bits per register. Some "special" models were constructed where the > vector FPU was thrown away and additional bit-fiddling units added in its > place: they really treated the vector registers as giant bitstrings, and > didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. You've got a point...but I don't think it's really economical to build that kind of hardware into general-purpose processors. You end up with a camel. You know, a horse designed by committee? -- Eric S. Raymond To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 From tim.one@home.com Fri Jun 1 07:58:08 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 02:58:08 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235148.B14591@thyrsus.com> Message-ID: [Tim] > A 128-bit float type is simply necessary for some scientific work: not > all problems are well-conditioned, and the "extra" bits can vanish fast. [ESR] > Makes me wonder how competent your customers' numerical analysts were. > Where the heck did they think they were getting data with that many > digits of accuracy? (Note that I didn't say "precision"...) Not all scientific work consists of predicting the weather with inputs known to half a digit on a calm day . Knuth gives examples of ill-conditioned problems where resorting to unbounded rationals is faster than any known stable f.p. approach (stuck with limited precision) -- think, e.g., chaotic systems here, which includes parts of many hydrodynamics problems in real life. Some scientific work involves modeling ab initio across trillions of computations (and on a Cray box in particular, where addition didn't even bother to round, nor multiplication bother to compute the full product tree, the error bounds per operation were much worse than in a 754 world). You shouldn't overlook either that algorithms often needed massive rewriting to exploit vector and parallel architectures, and in a world where a supremely competent numerical analysis can take a month to verify the numerical robustness of a new algorithm covering two pages of Fortran, a million lines of massively reworked seat-of-the-pants modeling code couldn't be trusted at all without running it under many conditions in at least two precisions (it only takes one surprise catastrophic cancellation to destroy everything). A major oil company once threatened to sue Cray when their reservoir model produced wildly different results under a new release of the compiler. Some exceedingly sharp analysts worked on that one for a solid week. Turned out the new compiler evaluated a subexpression A*B*C by doing (B*C) first instead of (A*B), because it was faster in context (and fine to do so by Fortran's rules). It so happened A was very large, and B and C both small, and doing B*C first caused the whole product to underflow to zero where doing A*B first left a product of roughly C's magnitude. I can't imagine how they ever would have found this if they weren't able to recompile the code using twice the precision (which worked fine thanks to the larger dynamic range), then tracing to see where the runs diverged. Even then it took a week because this was 100s of thousands of lines of crufty Fortran than ran for hours on the world's then-fastest machine before delivering bogus results. BTW, if you think the bulk of the world's numeric production code has even been *seen* by a qualified numerical analyst, you should ride on planes more often . From tim.one@home.com Fri Jun 1 08:08:28 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 03:08:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235433.C14591@thyrsus.com> Message-ID: [EAR] > You've got a point... Well, really, they do -- but they had a much more compelling point when the Cold War came with an unlimited budget. > but I don't think it's really economical to build that kind of > hardware into general-purpose processors. Economical? The marginal cost of adding even nutso new features in silicon now for mass-market chips is pretty close to zero. Indeed, if you're in the speech recog or 3D imaging games (i.e., things that still tax a PC), Intel comes around *begging* for new ideas to use up all their chip real estate. The only one I recall them turning down was a request from Dragon's founder to add an instruction that, given x and y, returned log(exp(x)+exp(y)). They were skeptical, and turned out even *we* didn't need it . > You end up with a camel. You know, a horse designed by committee? Yup! But that's the camel Intel rides to the bank, so it will probably grow more humps, on which to hang more bags of gold. From esr@thyrsus.com Fri Jun 1 08:23:16 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 1 Jun 2001 03:23:16 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Fri, Jun 01, 2001 at 02:58:08AM -0400 References: <20010531235148.B14591@thyrsus.com> Message-ID: <20010601032316.A15635@thyrsus.com> Tim Peters : > Not all scientific work consists of predicting the weather with inputs known > to half a digit on a calm day . Knuth gives examples of > ill-conditioned problems where resorting to unbounded rationals is faster > than any known stable f.p. approach (stuck with limited precision) -- think, > e.g., chaotic systems here, which includes parts of many hydrodynamics > problems in real life. Hmmm...good answer. I still believe it's the case that real-world measurements max out below 48 bits or so of precision because the real world is a noisy, fuzzy place. But I can see that most of the algorithms for partial differential equationss would multiply those by very small or very large quantities repeatedly. The range-doubling trick for catching divergences is neat, too. So maybe there's a market for 128-bit floats after all. I'm still skeptical about how likely those applications are to influence the architecture of general-purpose processors. I saw a study once that said heavy-duty scientific floating point only accounts for about 2% of the computing market -- and I think it's significant that MMX instructions and so forth entered the Intel line to support *games*, not Navier-Stokes calculations. That 2% will have to get a lot bigger before I can see Intel doubling its word size again. It's not just the processor design; the word size has huge implications for buses, memory controllers, and the whole system architecture. -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From pf@artcom-gmbh.de Fri Jun 1 08:22:50 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 1 Jun 2001 09:22:50 +0200 (MEST) Subject: [Python-Dev] precision thread (was One more dict trick) Message-ID: Eric: > > You end up with a camel. You know, a horse designed by committee? Tim: > Yup! But that's the camel Intel rides to the bank, so it will probably grow > more humps, on which to hang more bags of gold. cam*ls? Guido is only one week on vacation and soon heretical words show up here. ;-) sorry, couldn't resist, Peter From thomas@xs4all.net Fri Jun 1 08:28:01 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 1 Jun 2001 09:28:01 +0200 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 01:06:01PM -0500 References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <20010601092800.K690@xs4all.nl> On Thu, May 31, 2001 at 01:06:01PM -0500, Skip Montanaro wrote: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? You had a sticky tag on the file, probably because you used '-rrelease21-maint' on a cvs checkout or update. Good thing it was release21-maint, though, and not some random other revision, or you would have created another branch :-) You can remove stickyness by using 'cvs update -A'. I personally just have two trees, ~/python/python-2.2 and ~/python/python-2.1.1, where the last one was checked out with -rrelease21-maint. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From gmcm@hypernet.com Fri Jun 1 12:29:28 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 1 Jun 2001 07:29:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531235433.C14591@thyrsus.com> Message-ID: <3B174458.1998.46DEEE2B@localhost> [ESR] > > You end up with a camel. You know, a horse designed by > > committee? [Tim] > Yup! But that's the camel Intel rides to the bank, so it will > probably grow more humps, on which to hang more bags of gold. Been a camel a long time, too. x86 assembler is the, er, Perl of assemblers. - Gordon From mwh@python.net Fri Jun 1 12:54:40 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 12:54:40 +0100 Subject: [Python-Dev] another dict crasher Message-ID: Adapted from a report on comp.lang.python from Wolfgang Lipp: class Child: def __init__(self, parent): self.__dict__['parent'] = parent def __getattr__(self, attr): self.parent.a = 1 self.parent.b = 1 self.parent.c = 1 self.parent.d = 1 self.parent.e = 1 self.parent.f = 1 self.parent.g = 1 self.parent.h = 1 self.parent.i = 1 return getattr(self.parent, attr) class Parent: def __init__(self): self.a = Child(self) print Parent().__dict__ segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't tried Tim's latest patch, but I don't believe that will make any difference. It's obvious what's happening; the dict's resizing inside the for loop in dict_repr and the ep pointer is dangling. By the time we've shaken all of these out of dictobject.c it's going to be pretty close to free-threading safe, I'd have thought. reentrancy-sucks-ly y'rs M. -- But since I'm not trying to impress anybody in The Software Big Top, I'd rather walk the wire using a big pole, a safety harness, a net, and with the wire not more than 3 feet off the ground. -- Grant Griffin, comp.lang.python From mwh@python.net Fri Jun 1 13:12:55 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 13:12:55 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: Michael Hudson's message of "01 Jun 2001 12:54:40 +0100" References: Message-ID: Michael Hudson writes: > Adapted from a report on comp.lang.python from Wolfgang Lipp: [snip] > segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't > tried Tim's latest patch, but I don't believe that will make any > difference. > > It's obvious what's happening; the dict's resizing inside the > for loop in dict_repr and the ep pointer is dangling. Actually this crash was dict_print (I always forget about tp_print...). It's pretty easy to mend: *** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter, and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem; if the user's being this contrary there's only so much we can do to help him or her. Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4 From Samuele Pedroni Fri Jun 1 13:49:11 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 1 Jun 2001 14:49:11 +0200 (MET DST) Subject: [Python-Dev] __xxxattr__ caching semantic Message-ID: <200106011249.OAA05837@core.inf.ethz.ch> Hi. What is the intendend semantic wrt to __xxxattr__ caching: class X: pass def cga(self,name): print name def iga(name): print name x=X() x.__dict__['__getattr__'] = iga # 1. x.__getattr__ = iga # 2. X.__dict__['__getattr__'] = cga # 3. X.__getattr__ = cga # 4. x.a for the manual http://www.python.org/doc/current/ref/customization.html with all the variants x.a should fail, they should have no effect. In practice 4. work. Is that an implementation manual mismatch, is this indented, is there code around using 4. ? I'm asking this because jython has differences/bugs in this respect? I imagine that 1.-4. should work for all other __magic__ methods (this should be fixed in jython for some methods), OTOH jython has such a restriction on __del__ too, and this one cannot be removed (is not simply a matter of caching/non caching). regards, Samuele Pedroni. From Greg.Wilson@baltimore.com Fri Jun 1 13:59:28 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 1 Jun 2001 08:59:28 -0400 Subject: [Python-Dev] re: %b format Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1E47@nsamcanms1.ca.baltimore.com> My thanks to everyone who commented on the idea of adding a binary format specifier to Python. I'll volunteer to draft the PEP --- volunteers for a co-author? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From tismer@tismer.com Fri Jun 1 14:56:26 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 15:56:26 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B179F0A.CFA3B2C@tismer.com> Tim Peters wrote: > > Another version of the patch attached, a bit faster and with a large new > comment block explaining it. It's looking good! As I hope the new comments > make clear, nothing about this approach is "a mystery" -- there are > explainable reasons for each fiddly bit. This gives me more confidence in > it than in the previous approach, and, indeed, it turned out that when I > *thought* "hmm! I bet this change would be a little faster!", it actually > was . Thanks a lot for this nice patch. It looks like a real improvement. Also thanks for mentioning my division idea. Since all bits of the hash are eventually taken into account, this idea has somehow survived in an even more efficient solution, good end, file closed. (and good that I saved the time to check my patch in, lately :-) cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From Samuele Pedroni Fri Jun 1 15:18:20 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 1 Jun 2001 16:18:20 +0200 (MET DST) Subject: [Python-Dev] Re: [Jython-dev] Using PyChecker in Jython Message-ID: <200106011418.QAA13570@core.inf.ethz.ch> Hi. [Neal Norwitz] > Hello! > > I have created a program PyChecker to perform Python source code checking. > (http://pychecker.sourceforge.net). > > PyChecker is implemented in C Python and does some "tricky" things. > It doesn't currently work in Jython due to the module dis (disassemble code) > not being available in Jython. > > Is there any fundamental problem with getting PyChecker to work under Jython? > > Here's a high-level overview of what PyChecker does: > > imp.find_module() > imp.load_module() > for each object in dir(module): > # object can be a class, function, imported module, etc. > for each instruction in disassembled byte code: > # handle each instruction appropriately > > This hides a lot of details, but I do lots of things like getting the code objects from the classes, methods, and > functions, look at the arguments > in functions, etc. > > Is it possible to make work in Jython? Easy? > > Thanks for any guidance, > Neal It would be great - really - but about easy? As easy as making PyChecker working on source code without using dis and without importing/executing modules and their top defs, I think there will be no dis support on jython side (we produce java bytecode and getting "back" to python vm bytecode would be very tricky, not very elegant, etc. ) any time soon . Seriously, two possible workaround hacks (they are also not very easy), this is just after small brainstorming and ignoring the concrete needs and code of PyChecker: +) more elegant one, but maybe still too difficult or requiring too much work: let PyChecker run under CPython even when checking jython code, jython code can compile down to py vm bytecode but then does not run: why? java classes imports and the jython specific builtin modules (not so many) So one needs to implement a sufficient amount of python (an import hook, etc) code that does the minimal partial evalution required and the required amount of loading&introspection on java, jython specific stuff in order to have the imports work and PyChecher feeded with the things it needs. This means dealing with the java class format, or a two passes approach: run the code under jython in order to gather the information needed to load it succesfully under python. If the top level code contains conditionals that depend on jython stuff this could be hard, but one can ignore that (at least for starting). Clearly the main PyChecker loop would require some adaptation, and maybe include some logic to check some jython specific stuff (subclassing from java, etc). *) let an adapted PyChecker run under jython, obtain someway the needed py vm bytecode stream from a source -> py vm bytecode compiler written in python (such a thing exists - if I remember well) . And similar ideas ... regards, Samuele Pedroni. From barry@digicool.com Fri Jun 1 15:43:59 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 10:43:59 -0400 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.43567.202950.192811@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> You can remove stickyness by using 'cvs update -A'. I TW> personally just have two trees, ~/python/python-2.2 and TW> ~/python/python-2.1.1, where the last one was checked out with TW> -rrelease21-maint. Very good advice for anybody playing with branches! -Barry From barry@digicool.com Fri Jun 1 16:12:33 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 11:12:33 -0400 Subject: [Python-Dev] another dict crasher References: Message-ID: <15127.45281.435849.822222@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that MH> will make any difference. That is highly, highly nasty. Sounds to me like there ought to be an emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if necessary. And if we can trojan in the NAIPL (New And Improved Python License), I wouldn't mind. :) -Barry From jeremy@digicool.com Fri Jun 1 16:18:05 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Fri, 1 Jun 2001 11:18:05 -0400 (EDT) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <15127.45613.947590.246269@slothrop.digicool.com> >>>>> "BAW" == Barry A Warsaw writes: >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that will MH> make any difference. BAW> That is highly, highly nasty. Sounds to me like there ought to BAW> be an emergency 2.1.1 patch made for this, bumping Thomas's BAW> work to 2.1.2 if necessary. And if we can trojan in the NAIPL BAW> (New And Improved Python License), I wouldn't mind. :) We can release a critical patch for this bug, ala the CriticalPatches page for the Python 2.0 release. Jeremy From mwh@python.net Fri Jun 1 17:03:55 2001 From: mwh@python.net (Michael Hudson) Date: Fri, 1 Jun 2001 17:03:55 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: On Fri, 1 Jun 2001, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Yes. > Sounds to me like there ought to be an emergency 2.1.1 patch made for > this, bumping Thomas's work to 2.1.2 if necessary. Really? Two mild counterpoints: 1) It's *old*; 1.5.2 at least, and that's only because that's the oldest version I happen to have lying around. It's quite similar to the test_mutants oddness in some ways. 2) There's at least one other crasher in 2.1; the one in the compiler where a variable is referenced in a class and in a contained method. (I've actually run into that one). But a "fix these crashers" release seems reasonable if there's someone with the time to put it out (not me!). > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) Well me neither... Cheers, M. From skip@pobox.com (Skip Montanaro) Fri Jun 1 17:26:35 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 1 Jun 2001 11:26:35 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <20010601092800.K690@xs4all.nl> References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.49723.186388.220648@beluga.mojam.com> Thomas> I personally just have two trees, ~/python/python-2.2 and Thomas> ~/python/python-2.1.1, where the last one was checked out with Thomas> -rrelease21-maint. Thanks, good advice. httplib.py has now been updated on both the head and release21-maint branches. Skip From loewis@informatik.hu-berlin.de Fri Jun 1 18:07:52 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 1 Jun 2001 19:07:52 +0200 (MEST) Subject: [Python-Dev] METH_NOARGS calling convention Message-ID: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> The patch http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 introduces two new calling conventions, METH_O and METH_NOARGS. The rationale for METH_O has been discussed already; the rationale for METH_NOARGS is that it allows a convient simplification (plus a marginal speed-up) of functions which do either PyArg_NoArgs(args) or PyArg_ParseTuple(args, ":function_name"). Now, one open issue is whether the METH_NOARGS functions should have a signature of PyObject * (*unaryfunc)(PyObject *); or of PyObject *(*PyCFunction)(PyObject *, PyObject *); which then would be called with a NULL second argument; the first argument would be self in either case. IMO, the advantage of passing the NULL argument is that NOARGS methods don't need to be cast into PyCFunction in the method table; the advantage of the second approach is that it is clearer in the function implementation. Any opinions which signature to use? Regards, Martin From mal@lemburg.com Fri Jun 1 18:18:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 19:18:21 +0200 Subject: [Python-Dev] METH_NOARGS calling convention References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: <3B17CE5D.9D4CE8D4@lemburg.com> Martin von Loewis wrote: > > The patch > > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 > > introduces two new calling conventions, METH_O and METH_NOARGS. The > rationale for METH_O has been discussed already; the rationale for > METH_NOARGS is that it allows a convient simplification (plus a > marginal speed-up) of functions which do either PyArg_NoArgs(args) or > PyArg_ParseTuple(args, ":function_name"). > > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The second... I'm not sure how you will get extension writers who have to maintain packages for all three Python versions to ever change their code to use the new style calling scheme: there simply is no clean way to use the same code base unless you are willing to add tons of #ifdefs. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fdrake@acm.org Fri Jun 1 18:31:15 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Jun 2001 13:31:15 -0400 (EDT) Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <3B17CE5D.9D4CE8D4@lemburg.com> References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> <3B17CE5D.9D4CE8D4@lemburg.com> Message-ID: <15127.53603.87216.103262@cj42289-a.reston1.va.home.com> M.-A. Lemburg writes: > > Any opinions which signature to use? > > The second... Seconded. ;-) > I'm not sure how you will get extension writers who > have to maintain packages for all three Python versions to > ever change their code to use the new style calling scheme: > there simply is no clean way to use the same code base unless > you are willing to add tons of #ifdefs. You won't, and that's OK. Even if 3rd-party extensions never use it, there are plenty of functions/methods in the standard distribution which can use it, and I imagine those would be converted fairly quickly. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tismer@tismer.com Fri Jun 1 19:29:11 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:29:11 +0200 Subject: [Python-Dev] Marshal bug in 2.1? Message-ID: <3B17DEF7.3E7C6BC6@tismer.com> This is a multi-part message in MIME format. --------------6AB95E65519E7075E373B33F Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi friends, there is a script which generates encrypted passwords for Starship users. There is a series of marshal, zlib and base64 calls, which is reversed by the script. Is there a known bug in Marshal, or should I start the debugger now? The passwphrase for the attached script is "hey". cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ --------------6AB95E65519E7075E373B33F Content-Type: text/plain; charset=us-ascii; name="letmein.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="letmein.py" import marshal,base64,zlib exec marshal.loads(zlib.decompress(base64.decodestring(""" eJytVM+PGzUUfs6PzWZYwapAqbbAuiyF6Yqsqt2iomq1HGkvuQQJaS+pM3YzbjP2yHY6CdrVHNr+ Exz5L/gn4MidC2f+Az5Pkq0QlFMnmTf2s+d73/vmPWeEq43b/wxT498mSXSOwbskGZ0zqm+QbNF5 i+o9km16idU21bdIdUh26GmLrCRWf0ayS8+6dN6l+oAU0XcP689JbZHcohfA6VF9mxQj1SbVi57r 2PAFqS7p7bVH9+kFkew1mDvA/JJUCziGEYs3AozS7ch1yIiSg7dwJfjxzCkRVFml4Q7ng8F6zgUv hfeVdZLzJ84WXJgln+rnyvCgFuEIbzoV5s54/g3PcuFEFpTzvMp1lnPhFM9sUc6DklwboEmF5UIb 7YPO8PJkHvhz5ZbcWDOYaaOE45VYrmI18N/n2sctXlvDMczmPthC/wjEJ9bxUrtFTOBt6OAPoqSH h4c85MqrdUaeT1SoFDIenJ0OmpyWdu5AxDllwmuB8GLC33gNzm7700EytBWfA3s0esiD5TM7hTAY +IBIuS6PymXIrTkyKiRYjKL5+MI607nXZsrVAjLPlpHmFck0m+lyYgWIOAXRC2UkNHowuJMII+Mm M10zv2K8QosojUvy0tmpE0WyomQLFfK4o7BIGgUhxWSmjhJ/F/U3CdVX/BHPRKyE2SwiA0mEVQgI g49agXtmIVMWbmWMOvi1yZexyfaovhmb7BnRJWsGjC7RXh/TBZqgFdsO3XCJJvuELtqkO3RB0cPq T5v5VmyTSwDt00WLdI/CduxQNGbc14pNGm2H+Ajgo7SLoEPfhz25e3x8cv/eyX0wYuADRjepAQpE ga3jIP514H2E4SiNZ8NQj2E1h2nmPposd80TYnrUDi3SaFdD/37c8O9q9bF7T2eimEhxtk8+Hj6N 0XEh7W+wC/m134qT4PANGpdRVYMtm4V5KdGijSM0DqmnygffwfCp1WaFIsq0s+EU/gt4Bfh/ZDdn wx75JJ6U7EN2je2y91izOh4XQpvxeOj3MStnSqC88f1RsqtSiMXKy9zB/8DvYs/jH/46fWR+q3+v fv3lz5/+eJUmm5ylzRr6eB5vBif/4LAOaUShxuOrdKJoTlRjbXDWNN6wCFeSvdYmbcR+U65RiW9R Dh/gufNOP+m3dnq7bIdtI9VrbJ/9DYOcdyU= """))) --------------6AB95E65519E7075E373B33F-- From tismer@tismer.com Fri Jun 1 19:47:02 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:47:02 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> Message-ID: <3B17E326.41D82CCE@tismer.com> Christian Tismer wrote: > > Hi friends, > > there is a script which generates encrypted passwords for > Starship users. There is a series of marshal, zlib and base64 > calls, which is reversed by the script. > > Is there a known bug in Marshal, or should I start the debugger now? > The passwphrase for the attached script is "hey". Aehmmm... can it be that code objects are no longer compatible between Python 2.0 and 2.1? sigh - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mwh@python.net Fri Jun 1 19:52:17 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 19:52:17 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: barry@digicool.com's message of "Fri, 1 Jun 2001 11:12:33 -0400" References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: Warning! VERY SICK CODE INDEED ahead! barry@digicool.com (Barry A. Warsaw) writes: > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Not as nasty as this, though: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli: def __repr__(self): dict.clear() print # doesn't crash without this. don't know why return `"machiavelli"` def __hash__(self): return 0 dict[Machiavelli()] = Machiavelli() print dict gives, even with my posted patch to dictobject.c $ ./python crash2.py { Segmentation fault (core dumped) Any ideas what the above code should do? (Other than use the secret PSU website to hire a hitman and shoot whoever wrote the code, I mean). Cheers, M. -- Well, yes. I don't think I'd put something like "penchant for anal play" and "able to wield a buttplug" in a CV unless it was relevant to the gig being applied for... -- Matt McLeod, alt.sysadmin.recovery From mal@lemburg.com Fri Jun 1 20:01:38 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 21:01:38 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> Message-ID: <3B17E692.281A329B@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Hi friends, > > > > there is a script which generates encrypted passwords for > > Starship users. There is a series of marshal, zlib and base64 > > calls, which is reversed by the script. > > > > Is there a known bug in Marshal, or should I start the debugger now? > > The passwphrase for the attached script is "hey". > > Aehmmm... can it be that code objects are no longer compatible > between Python 2.0 and 2.1? Yes, not suprisingly though... AFAIK the pyc format changed in every single version between 1.5.2 and 2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 1 21:36:21 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 16:36:21 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: I suspect there are many ways to get the dict code to blow up, and always have been. I picked on dict compare a month or so ago mostly because nobody cares how fast that runs except in the == and != cases. Others are a real bitch; for example, the fundamental lookdict function caches dictentry *ep0 = mp->ma_table; at the start as if it were invariant -- but very unlikely sequences of collisions with identical hash codes combined with mutating comparisons can turn that into a bogus pointer. List objects used to have similar vulnerabilities during sorting (where comparison is the *norm*, not a one-in-a-billion freak occurrence), and no amount of slow-the-code paranoia sufficed to plug all conceivable holes. In the end we invented an internal "immutable list type", and replace the list object's type pointer for the duration of the sort (you can still try to mutate a list during a sort, but all the mutating list methods are redirected to raise an exception when you do). The dict code has even more holes and in more places, but they're generally much harder to provoke, so they've gone unnoticed for 10 years. All in all, seemed like a good tradeoff to me . From tim.one@home.com Fri Jun 1 23:08:32 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 18:08:32 -0400 Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: Cool! [Martin von Loewis] > ... > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The one that makes sense : delcare functions with the number of arguments they use. I don't care about needing to cast in the table: you do that once, but people read the *code* over and over, and an unused arg will be a mystery (or even a source of compiler warnings) every time you bump into one. The only way needing to cast could be "a problem" is if this remains an undocumented gimmick that developers have to reverse-engineer from staring at the (distributed all over the place) implementation. I like what the patch does, but I'd reject it just for continuing to leave this stuff Utterly Mysterious: please add comments saying what METH_NOARGS and METH_O *mean*: what's the point, why are these defined, how and when are you supposed to use them? That's where to explain the need to cast METH_NOARGS. From thomas@xs4all.net Fri Jun 1 23:42:35 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:42:35 +0200 Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org>; from barry@digicool.com on Fri, Jun 01, 2001 at 11:12:33AM -0400 References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <20010602004235.Q690@xs4all.nl> On Fri, Jun 01, 2001 at 11:12:33AM -0400, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > That is highly, highly nasty. Sounds to me like there ought to be an > emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if > necessary. Why bump 'my work' ? I'm just reviewing patches checked into the head. A fix for the above problems would fit in a patch release very nicely, and a release is a release. Besides, releasing 2.1.1 as 2.1 + dict fix would be a CVS nightmare. Unless you propose to keep it out of CVS, Barry ? :) > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) I'll channel Guido by saying he wouldn't even allow us to ship it with anything other than the PSF licence :) Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Fri Jun 1 23:47:16 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:47:16 +0200 Subject: [Python-Dev] Marshal bug in 2.1? In-Reply-To: <3B17E692.281A329B@lemburg.com>; from mal@lemburg.com on Fri, Jun 01, 2001 at 09:01:38PM +0200 References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> Message-ID: <20010602004716.R690@xs4all.nl> On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > Yes, not suprisingly though... AFAIK the pyc format changed > in every single version between 1.5.2 and 2.1. Worse, it's changed several times between each release :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry@digicool.com Sat Jun 2 00:12:30 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 19:12:30 -0400 Subject: [Python-Dev] another dict crasher References: <15127.45281.435849.822222@anthem.wooz.org> <20010602004235.Q690@xs4all.nl> Message-ID: <15128.8542.51241.192412@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> That is highly, highly nasty. Sounds to me like there ought to >> be an emergency 2.1.1 patch made for this, bumping Thomas's >> work to 2.1.2 if necessary. TW> Why bump 'my work' ? I'm just reviewing patches checked into TW> the head. A fix for the above problems would fit in a patch TW> release very nicely, and a release is a release. Besides, TW> releasing 2.1.1 as 2.1 + dict fix would be a CVS TW> nightmare. Unless you propose to keep it out of CVS, Barry ? TW> :) Oh no! You know me, I like to release those maintenance releases early and often. :) Anyway, that's why /you're/ the 2.1.1 czar. >> And if we can trojan in the NAIPL (New And Improved Python >> License), I wouldn't mind. :) TW> I'll channel Guido by saying he wouldn't even allow us to ship TW> it with anything other than the PSF licence :) :) TW> Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly TW> y'rs Where'd you get /that/ idea? :) -Barry From mwh@python.net Sat Jun 2 00:20:26 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 00:20:26 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Fri, 1 Jun 2001 16:36:21 -0400" References: Message-ID: "Tim Peters" writes: > The dict code has even more holes and in more places, but they're > generally much harder to provoke, so they've gone unnoticed for 10 > years. All in all, seemed like a good tradeoff to me . Are you suggesting that we should just leave these crashers in? They're not *particularly* hard to provoke if you know the implementation - and I was inspired to look for them by someone's report of actually running into one. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From tim.one@home.com Sat Jun 2 02:04:36 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 21:04:36 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Are you suggesting that we should just leave these crashers in? > They're not *particularly* hard to provoke if you know the > implementation - and I was inspired to look for them by someone's > report of actually running into one. I certainly don't object to fixing ones that bite innocent users, but there are also costs of several kinds. In this case, I couldn't care less how long printing a dict takes -- go for it. When adversarial abuse starts interfering with the speed of crucial operations, though, I'm simply not a "safety at any cost" person. Guido is much more of one, although the number of holes remaining in Python could plausibly fill Albert Hall . short-of-50-easy-ways-to-crash-win98-just-think-hard-about-each-"+"-in- the-code-base-ly y'rs - tim From gstein@lyra.org Sat Jun 2 06:52:03 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:52:03 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 09:42:30PM -0400 References: <3B10D758.3741AC2F@lemburg.com> Message-ID: <20010601225203.R23560@lyra.org> On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote: >... > [Greg Ewing] > > I think it would be safe if: > > > > 1) it kept a reference to the underlying object, and > > That much it already does. > > > 2) it re-fetched the pointer and length info each time it was > > needed, using the underlying object's buffer interface. > > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Jun 2 06:54:23 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:54:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, May 26, 2001 at 02:44:04AM -0400 References: <3B0ED784.FC53D01@lemburg.com> Message-ID: <20010601225423.S23560@lyra.org> On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote: > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Jun 2 06:55:33 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:55:33 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com>; from mal@lemburg.com on Sat, May 26, 2001 at 05:47:47PM +0200 References: <3B0FD023.C4588919@lemburg.com> Message-ID: <20010601225533.T23560@lyra.org> On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote: >... > Even the idea of replacing the usage of strings as data buffers > with buffer object didn't get very far; common habits are simply > hard to break. That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Sat Jun 2 07:13:49 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:13:49 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Actually this crash was dict_print (I always forget about tp_print...). We all should . > It's pretty easy to mend: > > *** dictobject.c Fri Jun 1 13:08:13 2001 > --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 > *************** > *** 793,795 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { > if (ep->me_value != NULL) { > --- 793,796 ---- > any = 0; > ! for (i = 0; i < mp->ma_size; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > *************** > *** 833,835 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { > if (ep->me_value != NULL) { > --- 834,837 ---- > any = 0; > ! for (i = 0; i < mp->ma_size && v; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > > I'm not sure this stops still more Machiavellian behaviour from > crashing the interpreter, Alas, it doesn't. You can't trust *anything* about a container you're iterating over across any call that may call back into Python. In these cases, the call to PyObject_Repr() can execute any code at all, including code that mutates the dict you're crawling over. In particular, calling PyObject_Repr() to format the key means the ep = &mp->ma_table[i] pointer may be trash by the time PyObject_Repr() is called again to format the value. See characterize() for the pain it takes to guard against everything, including encouraging comments like: if (cmp > 0 || i >= a->ma_size || a->ma_table[i].me_value == NULL) { /* Not the *smallest* a key; or maybe it is * but the compare shrunk the dict so we can't * find its associated value anymore; or * maybe it is but the compare deleted the * a[thiskey] entry. */ Py_DECREF(thiskey); continue; } It should really add "or maybe it just shuffled the dict around and the value at ma_table[i] is no longer associated with the key that *used* to be at ma_table[i], but since there's still *some* non-NULL pointer there we'll just pretend that didn't happen and press onward". > and you can certainly get items being printed more than once or not > at all. I'm not sure this last is a problem; Those don't matter: in a long tradition, we buy "safety" not only at the cost of bloating the code, but also by making the true behavior in case of mutation unpredictable & inexplicable. That's why I *really* liked the "immutable list" trick in list.sort(): even if we could have made the code bulletproof without it, we couldn't usefully explain what the heck it actually did. It's not Pythonic to blow up, but neither is it Pythonic to be incomprehensible. You simply can't win here. > if the user's being this contrary there's only so much we can > do to help him or her. I'd prefer a similar internal immutable-dict trick that raised an exception if the user was pushing Python into a corner where "blow up or do something baffling" were its only choices. That would render the original example illegal, of course. But would that be a bad thing? What *should* it mean when the user invokes an operation on a container and mutates the container during that operation? There's almost no chance that Jython does the same thing as CPython in all these cases, so it's effectively undefined behavior no matter how you plug the holes (short of raising an exception). From tim.one@home.com Sat Jun 2 07:34:43 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:34:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010601225203.R23560@lyra.org> Message-ID: [Tim] > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. [Greg] > Huh? I don't think it would be all that slow. It is just a function > call. And I don't think that the getitem slot is really used all that > frequently (in a loop) for buffer type objects. I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python. > I've been thinking that refetching the ptr/len is the right fix. So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle. > Part of the neglect is also based on Guido's ambivalence. Part is > that I haven't needed more from it. The day that I do, then I'll > code it up :-) But that doesn't help the "generic" case, > unfortunately. I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising . From mwh@python.net Sat Jun 2 08:09:07 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 08:09:07 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 02:13:49 -0400" References: Message-ID: "Tim Peters" writes: > [Michael Hudson] > > Actually this crash was dict_print (I always forget about tp_print...). > > We all should . > > > It's pretty easy to mend: [snip] > > I'm not sure this stops still more Machiavellian behaviour from > > crashing the interpreter, > > Alas, it doesn't. No, that's what my "dict[Machiavelli()] = Machiavelli()" example was demonstrating. If noone beats me to it, I'll post a better fix to sf next week, complete with test-cases and suitably "encouraging" comments. I can't easily see other examples of the problem; there certainly might be things you could do with comparisons that could trigger crashes, but that code's so hairy that it's almost impossible for me to be sure. There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare > > and you can certainly get items being printed more than once or not > > at all. I'm not sure this last is a problem; > > Those don't matter: in a long tradition, we buy "safety" not only at the > cost of bloating the code, but also by making the true behavior in case of > mutation unpredictable & inexplicable. This is what I thought. [snip] > > if the user's being this contrary there's only so much we can > > do to help him or her. > > I'd prefer a similar internal immutable-dict trick that raised an exception > if the user was pushing Python into a corner where "blow up or do something > baffling" were its only choices. That would render the original example > illegal, of course. But would that be a bad thing? It's hard to see how. > What *should* it mean when the user invokes an operation on a > container and mutates the container during that operation? I don't think there's a meaning you can attach to this kind of behaviour. The "immutable dict trick" looks better the more I think about it, but I guess that will have to wait until Guido gets back from the sun... Cheers, M. -- incidentally, asking why things are "left out of the language" is a good sign that the asker is fairly clueless. -- Erik Naggum, comp.lang.lisp From gstein@lyra.org Sat Jun 2 08:40:05 2001 From: gstein@lyra.org (Greg Stein) Date: Sat, 2 Jun 2001 00:40:05 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, Jun 02, 2001 at 02:34:43AM -0400 References: <20010601225203.R23560@lyra.org> Message-ID: <20010602004005.F23560@lyra.org> On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote: > [Tim] > > If after > > > > b = buffer(some_object) > > > > b.__getitem__ needed to refetch the info between > > > > b[i] > > and > > b[i+1] > > > > I expect it would be so slow even Greg wouldn't want it anymore. > > [Greg] > > Huh? I don't think it would be all that slow. It is just a function > > call. And I don't think that the getitem slot is really used all that > > frequently (in a loop) for buffer type objects. > > I expect they index into the buffer memory directly then, right? Then for > buffers obtained from mutable objects, any such loop is unsafe in the > absence of the GIL, or even in its presence if the loop contains code that > may call back into Python. Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common. > > I've been thinking that refetching the ptr/len is the right fix. > > So is calling __getitem__ all the time then, unless you want to dance on the > razor's edge. The idea that you can safely "borrow" memory from a mutable > object without copying it is brittle. Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake. > > Part of the neglect is also based on Guido's ambivalence. Part is > > that I haven't needed more from it. The day that I do, then I'll > > code it up :-) But that doesn't help the "generic" case, > > unfortunately. > > I take that as "yes" to my "nobody cares about it enough to maintain it?". > In that light, Guido's ambivalence is indeed surprising . Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Sat Jun 2 09:17:39 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 04:17:39 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > ... > If noone beats me to it, I'll post a better fix to sf next week, > complete with test-cases and suitably "encouraging" comments. Ah, no need -- looks like I was doing that while you were writing this. Checked in already. So long as we're happy to settle for senseless results that simply don't blow up, the only other trick you really needed was to save away the value in a local vrbl and incref it across the key->string bit; then you don't have to worry about key->string deleting the value, or about the table entry it lived in going away (because you get the value from the (still-incref'ed) *local* vrbl later, not from the table again). > I can't easily see other examples of the problem; there certainly > might be things you could do with comparisons that could trigger > crashes, but that code's so hairy that it's almost impossible for me > to be sure. It's easy to be sure: any code that tries to remember anything about a dict (ditto any mutable object) across a "dangerous" call, other than the mere address of the object, is a place you *can* provoke a core dump. It may not be easy to provoke, and a given provoking test case may not fail across all platforms, or even every time you run it on a single platform, but it's "an obvious" hole all the same. From tismer@tismer.com Sat Jun 2 10:49:35 2001 From: tismer@tismer.com (Christian Tismer) Date: Sat, 02 Jun 2001 11:49:35 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> Message-ID: <3B18B6AE.88EA6926@tismer.com> Thomas Wouters wrote: > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > Yes, not suprisingly though... AFAIK the pyc format changed > > in every single version between 1.5.2 and 2.1. > > Worse, it's changed several times between each release :) But I didn't use .pyc at all, just a marshalled code object. There are no version headers or such. The same object worked in fact for Py 1.5.2 and 2.0, but no longer with 2.1 . I debugged the unmarshalling and saw what happened: The new code objects with their new scoping features were the problem. The new structures were simply added, and there is no way to skip these for older code objects, since there isn't any info. Some option for marshal to umarshal old-style code objects would ave helped. But then, I'm not sure if the opcodes are still assigned the same way in 2.1, or if there was some movement? This would kill it anyway. ciao - chris (now looking for another cheap way to do something invisible in Python without installing *anything* ) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mal@lemburg.com Sat Jun 2 12:09:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 02 Jun 2001 13:09:13 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> <3B18B6AE.88EA6926@tismer.com> Message-ID: <3B18C958.598A9891@lemburg.com> Christian Tismer wrote: > > Thomas Wouters wrote: > > > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > > > Yes, not suprisingly though... AFAIK the pyc format changed > > > in every single version between 1.5.2 and 2.1. > > > > Worse, it's changed several times between each release :) > > But I didn't use .pyc at all, just a marshalled code object. That's the point: the header in pyc files is meant to signal the incompatibility of the following code object. Perhaps we should moev this version information into the marshal format of code objects themselves... > There are no version headers or such. > The same object worked in fact for Py 1.5.2 and 2.0, but no > longer with 2.1 . > I debugged the unmarshalling and saw what happened: > The new code objects with their new scoping features were > the problem. The new structures were simply added, and there > is no way to skip these for older code objects, since there > isn't any info. > Some option for marshal to umarshal old-style code objects > would ave helped. > But then, I'm not sure if the opcodes are still assigned > the same way in 2.1, or if there was some movement? This would > kill it anyway. AFAIK, the assignments did not change, but several opcodes were added in 2.1, so code compiled in 2.1 will no run in 2.0. > ciao - chris > > (now looking for another cheap way to do something invisible in > Python without installing *anything* ) Why don't you use freeze or py2exe or Gordon's installer for these one file executables ? Alternatively, you should check the Python version and make sure that it matches the one used for compiling the byte code. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh@python.net Sat Jun 2 12:40:56 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 12:40:56 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 04:17:39 -0400" References: Message-ID: "Tim Peters" writes: > > I can't easily see other examples of the problem; there certainly > > might be things you could do with comparisons that could trigger > > crashes, but that code's so hairy that it's almost impossible for me > > to be sure. > > It's easy to be sure: any code that tries to remember anything about a dict > (ditto any mutable object) across a "dangerous" call, other than the mere > address of the object, is a place you *can* provoke a core dump. It may not > be easy to provoke, and a given provoking test case may not fail across all > platforms, or even every time you run it on a single platform, but it's "an > obvious" hole all the same. Ah, like this one: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0 dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one). Cheers, M. Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { Here's another test case to work out the second of those new if statements: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli3: def __init__(self, id): self.id = id def __eq__(self, other): if self.id == other.id: dict.clear() return 1 else: return 0 def __repr__(self): return "%s(%s)"%(self.__class__.__name__, self.id) def __hash__(self): return 0 dict[Machiavelli3(1)] = Machiavelli3(0) dict[Machiavelli3(2)] = Machiavelli3(0) print dict[Machiavelli3(2)] -- M-x psych[TAB][RETURN] -- try it From pedroni@inf.ethz.ch Sat Jun 2 19:58:55 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 2 Jun 2001 20:58:55 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? Message-ID: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Hi. Is this a case that only the BDFL could know and pronounce on ... or I'm missing somenthing ... Thanks for any feedback, Samuele Pedroni. ----- Original Message ----- From: Samuele Pedroni To: Sent: Friday, June 01, 2001 2:49 PM Subject: [Python-Dev] __xxxattr__ caching semantic > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). > > regards, Samuele Pedroni. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > From tim.one@home.com Sat Jun 2 23:57:57 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 18:57:57 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > Is this a case that only the BDFL could know and pronounce on ... > or I'm missing somenthing ... The referenced URL http://www.python.org/doc/current/ref/customization.html appears irrelevant to me, so unsure what you're asking about. Perhaps http://www.python.org/doc/current/ref/attribute-access.html was intended? If so, the these methods are cached in the class object at class definition time; therefore, they cannot be changed after the class definition is executed. there doesn't mean exactly what it says: it's trying to say that the __XXXattr__ methods *inherited from base classes* (if any) are cached in the class object at class definition time, so that changing them in the base classes later has no effect on the derived class. It should be clearer. A direct class setattr can still change them; indirect assignment via class.__dict__ is ineffective for the __dict__, __bases__, __name__, __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create a dict entry then, but class getattr doesn't look in the dict to get the value of these specific keys). Didn't understand the program snippet. Much of this is due to hoary optimizations and I agree is ill-documented. I hope Guido's current rework of all this stuff will leave the endcases more explainable. > ----- Original Message ----- > From: Samuele Pedroni > To: > Sent: Friday, June 01, 2001 2:49 PM > Subject: [Python-Dev] __xxxattr__ caching semantic > > > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). From pedroni@inf.ethz.ch Sun Jun 3 00:46:42 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sun, 3 Jun 2001 01:46:42 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? References: Message-ID: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Hi. Thanks a lot for the answer, and sorry for the ill-formed question. [Tim Peters] > [Samuele Pedroni] > > Is this a case that only the BDFL could know and pronounce on ... > > or I'm missing somenthing ... > > The referenced URL > > http://www.python.org/doc/current/ref/customization.html > > appears irrelevant to me, so unsure what you're asking about. Perhaps > > http://www.python.org/doc/current/ref/attribute-access.html > > was intended? If so, the Yes, pilot error with browser and copy&pasted, I intented the latter. > these methods are cached in the class object at class > definition time; therefore, they cannot be changed after > the class definition is executed. > > there doesn't mean exactly what it says: it's trying to say that the > __XXXattr__ methods *inherited from base classes* (if any) are cached in the > class object at class definition time, so that changing them in the base > classes later has no effect on the derived class. It should be clearer. > > A direct class setattr can still change them; indirect assignment via > class.__dict__ is ineffective for the __dict__, __bases__, __name__, > __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create > a dict entry then, but class getattr doesn't look in the dict to get the > value of these specific keys). > This matches what I understood reading CPython C code (yes I did that too ), and what the snippets was trying to point out. And I see the problem with derived classes too. > Didn't understand the program snippet. Sorry it is not one snippet, but the 4 variants should be considered indipendently. > > Much of this is due to hoary optimizations and I agree is ill-documented. I > hope Guido's current rework of all this stuff will leave the endcases more > explainable. That will be a lot to work for porting it to jython . In any case the manual is really not clear (euphemism ) about this. The point is that jython implements the letter of the manual, and even extend the caching opt to some others __magic__ methods. I wanted to know the intended behaviour in order to fix that in jython. regards Samuele Pedroni. From tim.one@home.com Sun Jun 3 00:56:34 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 19:56:34 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > ... > The point is that jython implements the letter of the manual, and even > extend the caching opt to some others __magic__ methods. I wanted to > know the intended behaviour in order to fix that in jython. You got that one right the first time: this requires BDFL pronouncement! As semantically significant optimizations (the only reason for caching __getattr__, e.g.) creep into the code but the docs lag behind, it gets more and more unclear what's mandatory behavior and what's implementation-defined. This came up a couple weeks ago again in the context of what, exactly, rich comparisons are supposed to do in all cases. After poking holes in everything Guido wrote, he turned it around and told me to write up what I think it should say (which I have yet to do, as it's time-consuming and it appears some of the current CPython behavior is at least partly accidental -- but unclear exactly which parts). So don't be surprised if the same trick gets played on you ... From tim.one@home.com Sun Jun 3 05:04:57 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 00:04:57 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Ah, like this one: > > dict = {} > > # let's force dict to malloc its table > for i in range(1,10): > dict[i] = i > > class Machiavelli2: > def __eq__(self, other): > dict.clear() > return 1 > def __hash__(self): > return 0 > > dict[Machiavelli2()] = Machiavelli2() > > print dict[Machiavelli2()] Told you it was easy . > I'll attach a patch, but it's another branch inside lookdict (though > not lookdict_string which is I guess the really performance sensitive > one). lookdict_string is crucial to Python's own performance. Dicts indexed by ints or class instances or ... are vital to other apps. > Index: dictobject.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v > retrieving revision 2.100 > diff -c -1 -r2.100 dictobject.c > *** dictobject.c 2001/06/02 08:27:39 2.100 > --- dictobject.c 2001/06/02 11:36:47 > *************** > *** 273,274 **** > --- 273,281 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { > *************** > *** 310,311 **** > --- 317,325 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { Then we have other problems. Note the comment before lookdict: Exceptions are never reported by this function, and outstanding exceptions are maintained. The patched code doesn't preserve that. Looking for "the first" unused or dummy slot isn't good enough either, as surely the user has the right to expect that after, e.g., d[m] = 1, d[m] retrieves 1. That is, picking a reusable slot "at random" doesn't respect the *semantics* of dict operations ("just because" the dict resized doesn't mean the key they're looking for went away!). It would be better in this case to go back to the top and start over. However, then an adversarial user can construct a case that never terminates. Unclear what to do. From tim.one@home.com Sun Jun 3 08:55:43 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 03:55:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010602004005.F23560@lyra.org> Message-ID: [Greg Stein] > ... > I was saying that, at the Python level, using a loop and doing b[i] into > a buffer/string/unicode object would seem to be relatively rare. b[0] > and stuff is reasonably common. Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed. >> I take that as "yes" to my "nobody cares about it enough to >> maintain it?". In that light, Guido's ambivalence is indeed >> surprising . > Eh? I'll maintain the thing, but you're confusing that with adding more > features into it. Different question. I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book . From gstein@lyra.org Sun Jun 3 10:10:36 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 3 Jun 2001 02:10:36 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, Jun 03, 2001 at 03:55:43AM -0400 References: <20010602004005.F23560@lyra.org> Message-ID: <20010603021036.U23560@lyra.org> On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote: > [Greg Stein] > > ... > > I was saying that, at the Python level, using a loop and doing b[i] into > > a buffer/string/unicode object would seem to be relatively rare. b[0] > > and stuff is reasonably common. > > Well, at the Python level buffer objects seem never to be used, probably I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often. > because all the people who know about them don't advertise it because it's > an easy way to provoke core dumps now. Easy? Depends on what you use them with. >... > >> I take that as "yes" to my "nobody cares about it enough to > >> maintain it?". In that light, Guido's ambivalence is indeed > >> surprising . > > > Eh? I'll maintain the thing, but you're confusing that with adding more > > features into it. Different question. > > I haven't asked for new features, just that what's already there get fixed: > Python-level buffer objects are unsafe, the docs remain incomplete, I'll fix the code. > there's > random stuff like file.readinto() that's not documented at all (could be > that's the only one -- it's certainly "discovered" on c.l.py often enough, > though), Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc. > and there are no buffer tests in the std test suite. The work to > introduce the type wasn't completed, nobody works on it, and finishing work > 3 years late doesn't count as "new feature" in my book . Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/ From dgoodger@bigfoot.com Sun Jun 3 15:39:42 2001 From: dgoodger@bigfoot.com (David Goodger) Date: Sun, 03 Jun 2001 10:39:42 -0400 Subject: [Python-Dev] new PEP candidates Message-ID: I have just posted three related PEP candidates to the Doc-SIG: - PEP: Docstring Processing System Framework http://mail.python.org/pipermail/doc-sig/2001-June/001855.html - PEP: DPS Generic Implementation Details http://mail.python.org/pipermail/doc-sig/2001-June/001856.html - PEP: Docstring Conventions http://mail.python.org/pipermail/doc-sig/2001-June/001857.html These are all part of the newly created Python Docstring Processing System project, http://docstring.sf.net. Barry: Please assign PEP numbers to these if possible. Once PEP numbers have been assigned, I will post to comp.lang.python. Thanks. A related project is the second draft of reStructuredText, a docstring markup syntax definition. The project is http://structuredtext.sf.net, and I've posted the following to Doc-SIG: - An Introduction to reStructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001858.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001859.html - reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001860.html - Python Extensions to the reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001861.html I am not seeking PEP status for reStructuredText at this time; I think it's one step too far removed from the Python language to warrant a PEP. If you think it *should* be a PEP, I will be happy to convert it. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net From mwh@python.net Sun Jun 3 22:47:48 2001 From: mwh@python.net (Michael Hudson) Date: 03 Jun 2001 22:47:48 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 00:04:57 -0400" References: Message-ID: "Tim Peters" writes: > It would be better in this case to go back to the top and start > over. Yes. What you checked in is obviously better. I'll stick to being the bearer of bad tidings... > However, then an adversarial user can construct a case that never > terminates. I seem to have done this - it was odd, though - it only loops when I bump the dict to fairly enormous preportions for reasons I don't really (want to) understand. > Unclear what to do. Not worrying about it seems entirely reasonable - I now have sitting on my hard drive the wierdest way of spelling "while 1: pass" *I've* ever seen. and-I'll-stop-poking-holes-now-ly y'rs m. -- The rapid establishment of social ties, even of a fleeting nature, advance not only that goal but its standing in the uberconscious mesh of communal psychic, subjective, and algorithmic interbeing. But I fear I'm restating the obvious. -- Will Ware, comp.lang.python From tim.one@home.com Mon Jun 4 00:03:31 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 19:03:31 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Tim] >> It would be better in this case to go back to the top and start >> over. [Michael Hudson] > Yes. What you checked in is obviously better. I'll stick to being > the bearer of bad tidings... Hey, if it's fun, do whatever what you want! If you hadn't provoked me, I would have let it slide. Guido only cares about the end result . >> However, then an adversarial user can construct a case that never >> terminates. > I seem to have done this - it was odd, though - it only loops when I > bump the dict to fairly enormous preportions for reasons I don't > really (want to) understand. Pass it on. I deliberately "started over" via a recursive call instead of a goto so that an offending program would eventually die with a stack fault instead of just running forever. So if you're seeing something run forever, it may be a different problem. >> Unclear what to do. > Not worrying about it seems entirely reasonable I don't think anyone is happy leaving an exploitable hole in Python -- we endure enormous pain to plug those. Except, I guess, for buffer objects . I simply haven't thought of a good and efficient way to plug this one. Implementing an "internal immutable dict" type appeals to me, but it conflicts with that the affected routines believe to the core of their souls that exceptions raised during comparisons are to be ignored -- and raising a "hey, you can't change the dict *now*!" exception doesn't do the user any good if they never see it. Would plug the hole, but an *innocent* user would never know why their program failed to work as (probably) expected. From tim.one@home.com Mon Jun 4 01:38:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 20:38:53 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010603021036.U23560@lyra.org> Message-ID: [Tim] >> because all the people who know about them don't advertise it >> because it's an easy way to provoke core dumps now. [Greg Stein] > Easy? Depends on what you use them with. "Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend). >> I haven't asked for new features, just that what's already there get >> fixed: Python-level buffer objects are unsafe > I'll fix the code. Thank you! >> the docs remain incomplete, there's random stuff like file.readinto() >> that's not documented at all (could be that's the only one -- it's >> certainly "discovered" on c.l.py often enough, though), > Find another goat to screw for that one. I don't know anything about it. > > Hmm... Using the "annotate" feature of ViewCVS, I see that Guido > added it. Go blame him if you want to scream about that function and > its lack of doc. I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess . >> and there are no buffer tests in the std test suite. The work to >> introduce the type wasn't completed, nobody works on it, and >> finishing work 3 years late doesn't count as "new feature" in my book > Now you're just being bothersome. You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them. > You want all that stuff, then feel free. "All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards. > I'll volunteer to do the code. You can go beat some heads, or find other > volunteers. Anyone else care to chip in? > I'll do the code fixing just to placate you, and to get all this ranting > about the buffer object to quiet down, but not because I'm joyful > to do it. OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick . From Barrett@stsci.edu Mon Jun 4 14:22:14 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Mon, 04 Jun 2001 09:22:14 -0400 Subject: [Python-Dev] strop vs. string References: <3B1214B3.9A4C295D@lemburg.com> Message-ID: <3B1B8B86.68E99328@STScI.Edu> "M.-A. Lemburg" wrote: > > Tim Peters wrote: > > > > [Tim] > > > About combining strop and buffers and strings, don't forget > > > unicodeobject.c: that's got oodles of basically duplicate code too. > > > /F suggested dealing with the minor differences via maintaining one > > > code file that gets compiled multiple times w/ appropriate #defines. > > > > [MAL] > > > Hmm, that only saves us a few kB in source, but certainly not > > > in the object files. > > > > That's not the point. Manually duplicated code blocks always get out of > > synch, as people fix bugs in, or enhance, one of them but don't even know > > about the others. /F brought this up after I pissed away a few hours trying > > to repair one of these in all places, and he noted that strop.replace() and > > string.replace() are woefully inefficient anyway. > > Ok, so what we'd need is a bunch of generic low-level string > operations: one set for 8-bit and one for 16-bit code. > > Looking at unicodeobject.c it seems that the section "Helpers" would > be a good start, plus perhaps a few bits from the method implementations > refactored to form a low-level string template library. > > Perhaps we should move this code into > a file stringhelpers.h which then gets included by stringobject.c > and unicodeobject.c with appropriate #defines set up for > 8-bit strings and for Unicode. > > > > The better idea would be making the types subclass from a generic > > > abstract string object -- I just don't know how this will be > > > possible with Guido's type patches. We'll just have to wait, > > > I guess. >From the discussion so far, it appears that the buffer object is intended solely to support string-like objects. I've seen no mention of their use for binary data objects, such as multidimensional arrays and matrices. Will the buffer object also support these objects? If no, then I suggest it be renamed to one that is less generic and more descriptive. On the otherhand, if yes, then I think the buffer C/API needs to be reimplemented, because the current design/implementation falls far short of what I would expect for a buffer object. First, it is overly complex: the support for multiple buffers does not appear necessary. Second, the dangling pointer issue has not been resolved. I suggest the addition of lock flag which indicates that the data is currently inaccessible, ie. that data and/or data pointer is in the process of being modified. I would suggest the following structure to be much more useful for char and binary data: typedef struct { char* rf_pointer; int rf_length; int rf_access; /* read, write, etc. */ int rf_lock; /* data is in use */ int rf_flags; /* type of data; char, binary, unicode, etc. */ } PyBufferProcs; But I'm guessing my proposal is way off base. If I find some time, I'll prepare a PEP to air these issues, since they are very important to those of us working on and with multidimensional arrays. We find the current buffer API lacking. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From fdrake@acm.org Mon Jun 4 15:07:37 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 10:07:37 -0400 (EDT) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> References: <3B1214B3.9A4C295D@lemburg.com> <3B1B8B86.68E99328@STScI.Edu> Message-ID: <15131.38441.301314.46009@cj42289-a.reston1.va.home.com> Paul Barrett writes: > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. I've seen no mention > of their use for binary data objects, such as multidimensional arrays > and matrices. Will the buffer object also support these objects? If > no, then I suggest it be renamed to one that is less generic and more > descriptive. In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure. > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, because the current design/implementation falls far > short of what I would expect for a buffer object. First, it is overly > complex: the support for multiple buffers does not appear necessary. > Second, the dangling pointer issue has not been resolved. I suggest I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection. > the addition of lock flag which indicates that the data is currently > inaccessible, ie. that data and/or data pointer is in the process of > being modified. > > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@pobox.com (Skip Montanaro) Mon Jun 4 17:29:53 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 11:29:53 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist Message-ID: <15131.46977.861815.323386@beluga.mojam.com> I recently upgraded to Mandrake 8.0. I find that the readline module is no longer getting built. When building, it builds rgbimb followed immediately by crypt. Readline, which is tested for in between, is not built. Apparently, it can't find one of the libraries required to build it. On my system, both readline and termcap are in /lib. Neither has a static version available and neither as a plain .so file available. The .so file always has a version number tacked onto the end: % ls -l /lib/libtermcap* /lib/libreadline* lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 If I create the necessary .so symlinks it builds okay. Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first one), but if it is valid for shared libraries to be installed with only a version-numbered .so file, then it seems to me that distutils ought to handle that. There are several programs in /usr/bin on my machine that seem to be dynamically linked to libreadline. In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, which suggests that the .so-without version number is valid as far as ld is concerned. Skip From Greg.Wilson@baltimore.com Mon Jun 4 18:33:29 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:33:29 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> The 'struct' module allows packing and unpacking orders to be specified, but doesn't provide a hook to report on the order used by the machine the script is running on. As I'm likely going to be using this module in future runs of my course, I'd like to add 'struct.getorder()', which would return either "<" or ">" (the characters used to signal little-endian and big-endian respectively). Does this duplicate something in some other standard module? Does it seem like a sensible idea? Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From fdrake@acm.org Mon Jun 4 18:42:28 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 13:42:28 -0400 (EDT) Subject: [Python-Dev] struct.getorder() ? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> Message-ID: <15131.51332.73137.795543@cj42289-a.reston1.va.home.com> Greg Wilson writes: > The 'struct' module allows packing and unpacking > orders to be specified, but doesn't provide a hook > to report on the order used by the machine the Python 2.0 introduced sys.byteorder; check it out: http://www.python.org/doc/current/lib/module-sys.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Greg.Wilson@baltimore.com Mon Jun 4 18:41:45 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:41:45 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1E@nsamcanms1.ca.baltimore.com> > Python 2.0 introduced sys.byteorder; check it out: > http://www.python.org/doc/current/lib/module-sys.html Woo hoo! Thanks, Fred --- should've guessed someone would be ahead of me :-). Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From barry@scottb.demon.co.uk Mon Jun 4 19:00:05 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Mon, 4 Jun 2001 19:00:05 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: <000201c0ed20$2f295c30$060210ac@private> Eric wrote: > While I'm at it, I should note that the design of the 11 was ancestral > to both the 8088 and 68000 microprocessors, and thus to essentially > every new general-purpose computer designed in the last fifteen years. The key to PDP-11 and VAX was lots of registers all a like and rich addressing modes for the instructions. The 8088 is very far from this design, its owes its design more to 4004 then the PDP-11. However the 68000 is the closer, but not as nice to program as there are too many special cases in its instruction set for my liking. BArry From mwh@python.net Mon Jun 4 19:05:10 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 19:05:10 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 11:29:53 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I recently upgraded to Mandrake 8.0. I find that the readline > module is no longer getting built. When building, it builds rgbimb > followed immediately by crypt. Readline, which is tested for in > between, is not built. Apparently, it can't find one of the > libraries required to build it. On my system, both readline and > termcap are in /lib. Neither has a static version available and > neither as a plain .so file available. The .so file always has a > version number tacked onto the end: > > % ls -l /lib/libtermcap* /lib/libreadline* > lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 > -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 > lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 > -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 > > If I create the necessary .so symlinks it builds okay. > > Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first > one), but if it is valid for shared libraries to be installed with > only a version-numbered .so file, then it seems to me that distutils > ought to handle that. Hmm. Does compiling a proggie $ gcc foo.c -lreadline work? It doesn't here if I move libreadline.so & libreadline.a out of the way. If the C compiler isn't going to find readline, there ain't much point distutils trying to find it... > There are several programs in /usr/bin on my machine that seem to be > dynamically linked to libreadline. Those things will be directly linked to libreadline.so.whatever; I believe the libfoo.so files are only for the (compile time) linker's benefit. > In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, > which suggests that the .so-without version number is valid as far > as ld is concerned. ld != ld.so. Do you need a readline-devel package or something? Cheers, M. -- It's actually a corruption of "starling". They used to be carried. Since they weighed a full pound (hence the name), they had to be carried by two starlings in tandem, with a line between them. -- Alan J Rosenthal explains "Pounds Sterling" on asr From mwh@python.net Mon Jun 4 20:01:10 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 20:01:10 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 19:03:31 -0400" References: Message-ID: "Tim Peters" writes: > >> However, then an adversarial user can construct a case that never > >> terminates. > > > I seem to have done this - it was odd, though - it only loops when I > > bump the dict to fairly enormous preportions for reasons I don't > > really (want to) understand. > > Pass it on. I deliberately "started over" via a recursive call instead of a > goto so that an offending program would eventually die with a stack fault > instead of just running forever. So if you're seeing something run forever, > it may be a different problem. I left it running overnight, and it terminated! (with a KeyError). I can't say I really understand what's going on, but I'm in Exam Hell at the moment (for the last time! Yippee!), so don't have any spare cycles to think about it hard. Anyway, this is what I was running: dict = {} # let's force dict to malloc its table for i in range(1,10000): dict[i] = i hashcode = 0 class Machiavelli2: def __eq__(self, other): global hashcode d2 = dict.copy() dict.clear() hashcode += 1 for k,v in d2.items(): dict[k] = v return 1 def __hash__(self): return hashcode dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] If you thought my last test case was contrived, I look forward to you finding adjectives for this one... Cheers, M. -- (ps: don't feed the lawyers: they just lose their fear of humans) -- Peter Wood, comp.lang.lisp From barry@digicool.com Mon Jun 4 20:42:34 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 4 Jun 2001 15:42:34 -0400 Subject: [Python-Dev] Status of 2.0.1? Message-ID: <15131.58538.121723.671374@anthem.wooz.org> I've just fixed two buglets in the regression test suite for Python 2.0.1 (release20-maint branch). Now I get the following results from regrtest: 88 tests OK. 20 tests skipped: test_al test_audioop test_cd test_cl test_dbm test_dl test_gl test_imageop test_imgfile test_largefile test_linuxaudiodev test_minidom test_nis test_pyexpat test_rgbimg test_sax test_sunaudiodev test_timing test_winreg test_winsound Has anybody else tested out the 2.0.1 branch on anything? I'm going to run some quick tests with Mailman 2.0.x on Python 2.0.1 over the next hour or so. I'm just wondering what's left to do for this release, and how I can help out. -Barry From esr@thyrsus.com Mon Jun 4 21:11:14 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 16:11:14 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <000201c0ed20$2f295c30$060210ac@private>; from barry@scottb.demon.co.uk on Mon, Jun 04, 2001 at 07:00:05PM +0100 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> Message-ID: <20010604161114.A20979@thyrsus.com> Barry Scott : > Eric wrote: > > While I'm at it, I should note that the design of the 11 was ancestral > > to both the 8088 and 68000 microprocessors, and thus to essentially > > every new general-purpose computer designed in the last fifteen years. > > The key to PDP-11 and VAX was lots of registers all a like and rich > addressing modes for the instructions. > > The 8088 is very far from this design, its owes its design more to > 4004 then the PDP-11. Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, which was descended from the 11. Admiitedly, in the chain of transmission here were two stages of redesign so bad that the connection got really tenuous. -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From skip@pobox.com (Skip Montanaro) Mon Jun 4 21:49:07 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 15:49:07 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: <15131.62531.595208.65994@beluga.mojam.com> [my readline woes snipped] Michael> Hmm. Does compiling a proggie Michael> $ gcc foo.c -lreadline Michael> work? It doesn't here if I move libreadline.so & libreadline.a Michael> out of the way. Yup, it does: beluga:tmp% cc -o foo foo.c -lreadline -ltermcap beluga:tmp% ./foo >>sdfsdfsdf sdfsdfsdf (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) In this case, foo.c is #include #include #include main() { printf("%s\n", readline(">>" )); } Michael> Do you need a readline-devel package or something? Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" does list readline-devel as the provider. I just reinstalled it using --force. Now the .so symlinks are there. Go figure... Oh well, probably ought to drop it unless another Mandrake user complains. I'm really amazed at how many packages Mandrake chose *not* to install even though I selected all the groups during install and was installing into fresh / and /usr partitions. I've been dribbling various packages in bit-by-bit as I've discovered omissions. In the past I've also noticed files apparently not installed even though the packages that were supposed to provide them were installed. Skip From guido@digicool.com Mon Jun 4 22:03:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 04 Jun 2001 17:03:35 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: Your message of "Tue, 29 May 2001 02:15:07 EDT." References: Message-ID: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > > used to extend Idle. We've used this extensively, building entire > > "applications" as Idle extensions. > > > > Now that we're moving to Python 2.1, we find the same old directions > > for extending Idle (in extend.txt), but there appears to be no > > extend.py in Idle-0.8. > > > > Does anyone know how we can add extensions to Idle-0.8? It's simpler than before. Extensions are now loaded simply by being named in config.txt (or any of the other custom configuration files). For example, ZoomHeight.py is a very simple extension; it is loaded because of the line [ZoomHeight] somewhere in config.txt. The interface for extensions is the same as before; ZoomHeight.py hasn't changed since 1999. I'll update extend.txt. Can someone forward this to the original asker of the question, or to the list where it was posted? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Mon Jun 4 22:03:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 16:03:58 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> Message-ID: <15131.63422.695297.393477@beluga.mojam.com> Eric> Yes, but the 4004 was designed as a sort of lobotomized imitation Eric> of the 65xx, which was descended from the 11. Really? I was always under the impression the 4004 was considered the first microprocessor. The page below says that and gives a date of 1971 for it. I have no idea if the author is correct, just that what he says agrees with my memory. He does seem to have an impressive collection of old computer iron: http://www.piercefuller.com/collect/i4004/ I haven't found a statement about the origins of the 6502, but this page suggests that commercial computers were being made from 8080's before 6502's: http://www.speer.org/2backup/pcbs_pch.html Ah, wait a minute... This page: http://www.geocities.com/SiliconValley/Byte/6508/6502/english/versoes.htm says the 6502 was descended from the 6800. I'm getting less and less convinced that the 4004 somehow descended from the 65xx family. (Maybe we should shift this thread to the always entertaining folks at comp.arch... ;-) Skip From esr@thyrsus.com Mon Jun 4 22:19:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 17:19:08 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <15131.63422.695297.393477@beluga.mojam.com>; from skip@pobox.com on Mon, Jun 04, 2001 at 04:03:58PM -0500 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> Message-ID: <20010604171908.A21831@thyrsus.com> Skip Montanaro : > Really? I was always under the impression the 4004 was considered the first > microprocessor. The page below says that and gives a date of 1971 for it. First sentence is widely believed, but there was an earlier micro called the Star-8 designed at Burroughs that has been almost completely forgotten. I only know about it because I worked there in 1980 with one of the people who designed it. I think I had a brain fart and it's the Z80 that was descended from the 6502. I was going by a remark in some old lecture notes. I've got a copy of the definitive reference on history of computer architecture and will check. -- Eric S. Raymond "Extremism in the defense of liberty is no vice; moderation in the pursuit of justice is no virtue." -- Barry Goldwater (actually written by Karl Hess) From mwh@python.net Mon Jun 4 22:55:34 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 22:55:34 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 15:49:07 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: Skip Montanaro writes: > [my readline woes snipped] > > Michael> Hmm. Does compiling a proggie > > Michael> $ gcc foo.c -lreadline > > Michael> work? It doesn't here if I move libreadline.so & libreadline.a > Michael> out of the way. > > Yup, it does: > > beluga:tmp% cc -o foo foo.c -lreadline -ltermcap > beluga:tmp% ./foo > >>sdfsdfsdf > sdfsdfsdf > > (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) Odd. What does the output of $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose look like? In particular the bit at the end where you get things like: attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.so failed attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.a failed attempt to open /usr/i386-redhat-linux/lib/libreadline.so failed attempt to open /usr/i386-redhat-linux/lib/libreadline.a failed attempt to open /usr/bin/../lib/libreadline.so succeeded -lreadline (/usr/bin/../lib/libreadline.so) (this is more for my personal curiosity than any important reason). > Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" > does list readline-devel as the provider. I just reinstalled it using > --force. Now the .so symlinks are there. Go figure... No :-) > Oh well, probably ought to drop it unless another Mandrake user complains. Sounds reasonable. Cheers, M. -- After a heavy night I travelled on, my face toward home - the comma being by no means guaranteed. -- paraphrased from cam.misc From tim.one@home.com Mon Jun 4 22:58:48 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 4 Jun 2001 17:58:48 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can someone forward this to the original asker of the question, or to > the list where it was posted? Done. Thanks! From skip@pobox.com (Skip Montanaro) Tue Jun 5 02:01:01 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 20:01:01 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: <15132.12109.914981.110774@beluga.mojam.com> >> (This after deleting both /lib/libreadline.so and >> /lib/libhistory.so.) Michael> Odd. What does the output of Michael> $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose Michael> look like? Well, what it looks like is "Skip's a dunce...". Turns out there was a libreadline.so symlink /usr/lib also. It found that. When I deleted that it found /usr/lib/libreadline.a. Getting rid of that caused the link to (finally) fail. With just the version-based .so files cc apparently can't do the trick. Sorry to have wasted the bandwidth. Skip From skip@pobox.com (Skip Montanaro) Tue Jun 5 02:16:00 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 20:16:00 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604171908.A21831@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> <20010604171908.A21831@thyrsus.com> Message-ID: <15132.13008.429800.585157@beluga.mojam.com> Eric> Skip Montanaro : >> Really? I was always under the impression the 4004 was considered >> the first microprocessor. The page below says that and gives a date >> of 1971 for it. Eric> First sentence is widely believed, but there was an earlier micro Eric> called the Star-8 designed at Burroughs that has been almost Eric> completely forgotten. There was also a GE-8 (I think that was the name) developed at GE's R&D Center in the early 1970's timeframe - long before my time there. It was apparently very competitive with the other microprocessors produced about that time but never saw the light of day. I suspect that was at least due in part to the fact that GE built mainframes back then. Skip From tim.one@home.com Tue Jun 5 05:07:27 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 00:07:27 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson, taking a break from exams] > I left it running overnight, and it terminated! (with a KeyError). I > can't say I really understand what's going on, but I'm in Exam Hell at > the moment (for the last time! Yippee!), so don't have any spare > cycles to think about it hard. Good luck! I really shouldn't tell you this now, but the real reason people dread turning 30, 40, 50, 60-- and so on --is that every 10th birthday starting at 30 they test you *again*! On every course you ever took. It's grueling. The penalty for failure is severe: flunk just one review exam, and they pick a date at random over the following 10 years for you to die. No point fighting it, it's just civilization's nasty little secret. This is why life expectancy correlates with education, but it does appear that the human limit for remembering both plane geometry and the names of hundreds of dead psychopaths is about 120 years. In the meantime, I built a test case to tickle stack overflow directly, and it does so quickly: class Yuck: def __init__(self): self.i = 0 def make_dangerous(self): self.i = 1 def __hash__(self): # direct to slot 4 in table of size 8; slot 12 when size 16 return 4 + 8 def __eq__(self, other): if self.i == 0: # leave dict alone pass elif self.i == 1: # fiddle to 16 slots self.__fill_dict(6) self.i = 2 else: # fiddle to 8 slots self.__fill_dict(4) self.i = 1 return 1 def __fill_dict(self, n): self.i = 0 dict.clear() for i in range(n): dict[i] = i dict[self] = "OK!" y = Yuck() dict = {y: "OK!"} z = Yuck() y.make_dangerous() print dict[z] It just arranges to move y to a different slot in a different-sized table each time __eq__ is invoked, alternating between slot 4 in a size-8 table and slot 12 in a size-16 table. However, if I stick "print self.i" at the start of __eq__, it dies with a KeyError instead! That's why I'm mentioning it -- could be the same misdirection you're seeing. I can't account for the KeyError in any rational way: under Windows, it's actually hitting a stack overflow in the bowels of the system malloc() then. Windows "recovers" from that and presses on. Everything that happens after appears to be an accident. win98-as-usual-ly y'rs - tim PS: You'll be tested on this, too . From greg@cosc.canterbury.ac.nz Tue Jun 5 06:00:30 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Jun 2001 17:00:30 +1200 (NZST) Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> "Eric S. Raymond" : > I think it's significant that MMX > instructions and so forth entered the Intel line to support *games*, > not Navier-Stokes calculations. But when version 1.0 of FlashFlood! comes out, requiring high-quality real-time hydrodynamics simulation, Navier-Stokes calculations will suddenly become very important... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Tue Jun 5 06:18:50 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:18:50 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: [Paul Barrett] > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface. > I've seen no mention of their use for binary data objects, I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g., >>> import array >>> a = array.array('i') >>> a.append(2) >>> a.append(3) >>> a array('i', [2, 3]) >>> b = buffer(a) >>> len(b) 8 >>> [b[i] for i in range(len(b))] ['\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00'] >>> While of *some* conceivable use, that's not exactly destined to become wildly popular . > such as multidimensional arrays and matrices. Since core Python has no such things, of course it doesn't use buffers for those either. > Will the buffer object also support these objects? In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot. > ... > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, Or do you mean redesigned? > because the current design/implementation falls far short of what I > would expect for a buffer object. First, it is overly complex: the > support for multiple buffers does not appear necessary. AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there. > Second, the dangling pointer issue has not been resolved. I expect Greg will fix that now. > I suggest the addition of lock flag which indicates that the data is > currently inaccessible, ie. that data and/or data pointer is in the > process of being modified. To sell that (but please save it for the PEP ) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!". > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; > > But I'm guessing my proposal is way off base. Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. A PEP is always a good idea. From aahz@rahul.net Tue Jun 5 06:41:28 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 4 Jun 2001 22:41:28 -0700 (PDT) Subject: [Python-Dev] strop vs. string In-Reply-To: from "Tim Peters" at Jun 05, 2001 01:18:50 AM Message-ID: <20010605054129.933C199C83@waltz.rahul.net> Tim Peters wrote: > > If you do pursue this, please please please boost the rf_length field! An > int is too small to hold real-life sizes anymore, and "large files" are > becoming common even on 32-bit boxes. Python needs to grow a wholly > supported way to pass 8-byte ints around (and it looks like I'll be adding > that to the struct module, possibly to the array module and marshal too). Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From tim.one@home.com Tue Jun 5 06:53:26 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:53:26 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > So maybe there's a market for 128-bit floats after all. I think very small. There's a much larger market for 128-bit float *registers*, though -- in the "treat it as 2 64-bit, or 4 32-bit, floats, and operate on them in parallel" sense. That's the baby vector register view, and is already happening. > I'm still skeptical about how likely those applications are to > influence the architecture of general-purpose processors. I saw a > study once that said heavy-duty scientific floating point only > accounts for about 2% of the computing market -- and I think it's > significant that MMX instructions and so forth entered the Intel > line to support *games*, not Navier-Stokes calculations. Heh. I used to wonder about that, but not any more: games may have no more than entertainment (sometimes disguised as education ) in mind, but what do the latest & greatest games do? Strive to simulate physical reality (sometimes with altered physical laws), just as closely as possible. Whether it's ray-tracing, effective motion-compression, or N-body simulations, games are easily as demanding as what computational chemists do. A difference is that general-purpose *compilers* aren't being taught how to use these "new" architectural gimmicks. All that new hardware sits unused unless you've got an app dipping into assembler, or into a hand-coded utility library written in assembler. The *general* market for pure floating-point can barely support what's left of the supercomputer industry anymore (btw, Cray never became a billion-dollar company even in its heyday, and what's left of them gets passed around for peanuts now). > That 2% will have to get a lot bigger before I can see Intel doubling > its word size again. It's not just the processor design; the word size > has huge implications for buses, memory controllers, and the whole > system architecture. Intel is just now getting its foot wet with with 64-bit boxes. That was old news to me 20 years ago. All I hope to see 20 years from now is that somewhere along the way I got smart enough to drop computers and get a real life . by-then-the-whole-system-will-exist-in-the-superposition-of-a- single-plutonium-atom's-states-anyway-ly y'rs - tim From tim.one@home.com Tue Jun 5 06:55:48 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:55:48 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010605054129.933C199C83@waltz.rahul.net> Message-ID: [Aahz] > Hey! Are you discriminating against 128-bit ints? Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early . From MarkH@ActiveState.com Tue Jun 5 08:10:53 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 5 Jun 2001 17:10:53 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: > complex: the support for multiple buffers does not appear necessary. I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark. From mwh@python.net Tue Jun 5 09:39:24 2001 From: mwh@python.net (Michael Hudson) Date: Tue, 5 Jun 2001 09:39:24 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: Haven't run your example yet as my machine's not on at the moment. On Tue, 5 Jun 2001, Tim Peters wrote: > However, if I stick "print self.i" at the start of __eq__, it dies > with a KeyError instead! That's why I'm mentioning it -- could be the > same misdirection you're seeing. I can't account for the KeyError in > any rational way: under Windows, it's actually hitting a stack > overflow in the bowels of the system malloc() then. Hmm. It's quite likely that PyMem_Malloc (or whatever) crapping out and returning NULL will get turned into a MemoryError, which will then get turned into a KeyError, isn't it? I could believe that malloc would set up some fancy sigsegv-type handlers for memory management purposes which then get called when it tramples all over the end of the stack. But I'm making this up as I go along... > Windows "recovers" from that and presses on. Everything that happens > after appears to be an accident. > > win98-as-usual-ly y'rs - tim Well, linux seems to be similarly inscrutable here. One problem is that this is a pig to run under the debugger - setting a breakpoint on lookdict isn't terribly interesting way to spend your time. I suppose you could just set the breakpoint on the recursive call... later. > PS: You'll be tested on this, too . Oh, piss off . Cheers, M. From guido@digicool.com Tue Jun 5 10:07:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 05:07:34 -0400 Subject: [Python-Dev] Happy event Message-ID: <200106050907.FAA08198@cj20424-a.reston1.va.home.com> I just wanted to send a note about a happy event in the Python family. Jeremy Hylton and his wife became the proud parents of twin girls on Sunday June 3rd. Please join Pythonlabs and Digital Creations in congratulating them, and wishing them much joy and luck. Also, don't expect Jeremy to be too responsive to email for the next 6-8 weeks. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Tue Jun 5 13:28:45 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:28:45 -0600 Subject: [Python-Dev] One more dict trick In-Reply-To: Message from Greg Ewing of "Tue, 05 Jun 2001 17:00:30 +1200." <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> Message-ID: <200106051228.f55CSjk18336@localhost.local> > "Eric S. Raymond" : > > > I think it's significant that MMX > > instructions and so forth entered the Intel line to support *games*, > > not Navier-Stokes calculations. > > But when version 1.0 of FlashFlood! comes out, requiring > high-quality real-time hydrodynamics simulation, > Navier-Stokes calculations will suddenly become very > important... Shoot, I thought that was what Microsoft Hailstorm was all about. Path integrals about the atmospheric isobars, and all that... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Tue Jun 5 13:32:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:32:07 -0600 Subject: [Python-Dev] Happy event In-Reply-To: Message from Guido van Rossum of "Tue, 05 Jun 2001 05:07:34 EDT." <200106050907.FAA08198@cj20424-a.reston1.va.home.com> Message-ID: <200106051232.f55CW7618353@localhost.local> > I just wanted to send a note about a happy event in the Python family. > Jeremy Hylton and his wife became the proud parents of twin girls on > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > congratulating them, and wishing them much joy and luck. > > Also, don't expect Jeremy to be too responsive to email for the next > 6-8 weeks. :) *twin* girls? Try 6-8 years. Congrats and felicits of the highest order, of course, Jeremy. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Barrett@stsci.edu Tue Jun 5 13:53:46 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Tue, 05 Jun 2001 08:53:46 -0400 Subject: [Python-Dev] Happy event References: <200106051232.f55CW7618353@localhost.local> Message-ID: <3B1CD65A.595E8CD@STScI.Edu> Uche Ogbuji wrote: > > > I just wanted to send a note about a happy event in the Python family. > > Jeremy Hylton and his wife became the proud parents of twin girls on > > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > > congratulating them, and wishing them much joy and luck. > > > > Also, don't expect Jeremy to be too responsive to email for the next > > 6-8 weeks. :) > > *twin* girls? Try 6-8 years. > > Congrats and felicits of the highest order, of course, Jeremy. Actually girls are fine until about 13, after that I expect Jeremy won't be too responsive. Something about hormones and such. In any case, all the best, Jeremy! -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From aahz@rahul.net Tue Jun 5 15:41:10 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <3B1CD65A.595E8CD@STScI.Edu> from "Paul Barrett" at Jun 05, 2001 08:53:46 AM Message-ID: <20010605144110.DD90C99C84@waltz.rahul.net> Paul Barrett wrote: > Uche Ogbuji wrote: >> Guido: >>> >>> Also, don't expect Jeremy to be too responsive to email for the next >>> 6-8 weeks. :) >> >> *twin* girls? Try 6-8 years. > > Actually girls are fine until about 13, after that I expect Jeremy > won't be too responsive. Something about hormones and such. Are you trying to imply that there's a difference between girls and boys? compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr@thyrsus.com Tue Jun 5 15:55:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 10:55:59 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 07:41:10AM -0700 References: <3B1CD65A.595E8CD@STScI.Edu> <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: <20010605105559.A28963@thyrsus.com> Aahz Maruch : > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? Of course there's a difference. Girls, er, *mature* sooner. Congratulations, Jeremy! -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From Samuele Pedroni Tue Jun 5 16:05:03 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Tue, 5 Jun 2001 17:05:03 +0200 (MET DST) Subject: [Python-Dev] Happy event Message-ID: <200106051505.RAA24810@core.inf.ethz.ch> > Subject: Re: [Python-Dev] Happy event > To: Barrett@stsci.edu (Paul Barrett) > Cc: python-dev@python.org > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > From: aahz@rahul.net (Aahz Maruch) > X-BeenThere: python-dev@python.org > X-Mailman-Version: 2.0.5 (101270) > List-Help: > List-Post: > List-Subscribe: , > List-Id: Python core developers > List-Unsubscribe: , > List-Archive: > Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) > > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? > > compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs > -- The simple fact that we are still moving from the previous bad habit of considering them different to considering them equal just implies/evolves differences. A neutral view-point would be: the N/S ratio between gender-phisiological- differences and the overall interpersonal differences is very big, at least when considering the whole personality and not single aspects. There is no established truth, we are just longing for equiblibrium: in the actual transition phase boys and girls are under different kind of cultural tensions related to self-identification,etc ... this makes differences. regards, Samuele Pedroni. From aahz@rahul.net Tue Jun 5 16:17:38 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 08:17:38 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <20010605105559.A28963@thyrsus.com> from "Eric S. Raymond" at Jun 05, 2001 10:55:59 AM Message-ID: <20010605151739.3864199C83@waltz.rahul.net> Eric S. Raymond wrote: > Aahz Maruch : >> >> Are you trying to imply that there's a difference between girls and >> boys? > > Of course there's a difference. Girls, er, *mature* sooner. Not legally. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr@thyrsus.com Tue Jun 5 16:30:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 11:30:08 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605151739.3864199C83@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 08:17:38AM -0700 References: <20010605105559.A28963@thyrsus.com> <20010605151739.3864199C83@waltz.rahul.net> Message-ID: <20010605113008.A29236@thyrsus.com> Aahz Maruch : > Eric S. Raymond wrote: > > Aahz Maruch : > >> > >> Are you trying to imply that there's a difference between girls and > >> boys? > > > > Of course there's a difference. Girls, er, *mature* sooner. > > Not legally. My point was that the hormone thing is likely to be an issue sooner with twin girls. Hey, Jeremy...fraternal or identical? -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From guido@digicool.com Tue Jun 5 18:21:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 13:21:32 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106051721.f55HLW729400@odiug.digicool.com> While thinking about metatypes, I had an interesting idea. In PEP 252 and 253 (which still need much work, please bear with me!) I describe making classes and types more similar to each other. In particular, you'll be able to subclass built-in object types in much the same way as you can subclass user-defined classes today. One nice property of classes is that a class is a factory function for its instances; in other words, if C is a class, C() returns a C instance. Now, for built-in types, it makes sense to do the same. In my current prototype, after "from types import *", DictType() returns an empty dictionary and ListType() returns an empty list. It would be nice take this much further: IntType() could return an integer, TupleType() could return a tuple, StringType() could return a string, and so on. These are immutable types, so to make this useful, these constructors need to take an argument to specify a specific value. What should the type of such an argument be? It's not very interesting to require that int(x) takes an integer argument! Most of the popular standard types already have a constructor function that's named after their type: int(), long(), float(), complex(), str(), unicode(), tuple(), list() We could make the constructor take the same argument(s) as the corresponding built-in function. Now invoke the Zen of Python: "There should be one-- and preferably only one --obvious way to do it." So why not make these built-in functions *be* the corresponding types? Then instead of >>> int you would see >>> int but otherwise the behavior would be identical. (Note that I don't require that a factory function returns a *new* object each time.) If we did this for all built-in types, we'd have to add maybe a dozen new built-in names -- I think that's no big deal and actually helps naming types. The types module, with its awkward names and usage, can be deprecated. There are details to be worked out, e.g. - Do we really want to have built-in names for code objects, traceback objects, and other figments of Python's internal workings? - What should the argument to dict() be? A list of (key, value) pairs, a list of alternating keys and values, or something else? - What else? Comments? --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Tue Jun 5 18:34:35 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 5 Jun 2001 19:34:35 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <001301c0ede5$cb804a10$e46940d5@hagrid> guido wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? +1 from here. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? nope. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? how about supporting the following: d == dict(d.items()) d == dict(d.keys(), d.values()) and also: d = dict(k=v, k=v, ...) Cheers /F From ping@lfw.org Tue Jun 5 18:41:22 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 5 Jun 2001 12:41:22 -0500 (CDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > I'm all in favour of this. In fact, i had the impression that you were planning to do exactly this all along. I seem to recall some conversation about this a long time ago -- am i dreaming? > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. I would love this. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Perhaps we would only provide built-in names for objects that are commonly constructed. For things like code objects that are never user-constructed, their type objects could be set aside in a module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A list of (key, value) pairs. It's the only sensible choice, given that dict.items() is the obvious way to get all the information out of a dictionary into a list. -- ?!ng From aahz@rahul.net Tue Jun 5 18:40:27 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 10:40:27 -0700 (PDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> from "Guido van Rossum" at Jun 05, 2001 01:21:32 PM Message-ID: <20010605174027.17A4199C83@waltz.rahul.net> I'm +1 on the general concept; I think it will make explaining Python easier in the long run. I'm not competent to vote on the details, but I'll complain if something seems too confused to me. Currently in the Decimal class I'm working on, I can take any of the following types in the constructor: Decimal, tuple, string, int, float. I'm wondering whether that approach makes sense, that any "compatible" type should be accepted in an explicit constructor. So for your question about dict(), perhaps any sequence/iterator type that returns 2-element sequences would be be accepted. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From Donald Beaudry Tue Jun 5 18:50:34 2001 From: Donald Beaudry (Donald Beaudry) Date: Tue, 05 Jun 2001 13:50:34 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <200106051750.NAA25458@localhost.localdomain> Guido van Rossum wrote, > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? I like it! > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) Of course... singletons (which would also break that requirement) are quite useful. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I dont think so. Having easy access to these things might be good but since they are implementation specific it might be best to discourage their use by putting them somewhere more implementation specific, like the newmodule or even sys. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? At a minimum, I'd like to see a list of key/value tuples. I seem to find myself reconstructing dicts from the .items() of other dicts. For 'something else', I'd like to be able to pass keyword arguments to initialize the new dict. Going really crazy, I'd like to be able to pass a dict as an argument to dict()... just another way to spell copy, but combined with keywords, it would be more like copy followed by an update. > - What else? Well, since you are asking ;) I havnt read the PEP, so perhaps I shouldnt be commenting just yet, but. I'd hope that the built-in types are sub-classable from C as well as from Python. This is most interesting for types like instance, class, method, but I can imagine reasons for doing it to tuple, list, dict, and even int. > Comments? Fantastic! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...Will hack for sushi... From mal@lemburg.com Tue Jun 5 18:53:18 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 19:53:18 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3B1D1C8E.B7770419@lemburg.com> Guido van Rossum wrote: > > While thinking about metatypes, I had an interesting idea. > > In PEP 252 and 253 (which still need much work, please bear with me!) > I describe making classes and types more similar to each other. In > particular, you'll be able to subclass built-in object types in much > the same way as you can subclass user-defined classes today. One nice > property of classes is that a class is a factory function for its > instances; in other words, if C is a class, C() returns a C instance. > > Now, for built-in types, it makes sense to do the same. In my current > prototype, after "from types import *", DictType() returns an empty > dictionary and ListType() returns an empty list. It would be nice > take this much further: IntType() could return an integer, TupleType() > could return a tuple, StringType() could return a string, and so on. > These are immutable types, so to make this useful, these constructors > need to take an argument to specify a specific value. What should the > type of such an argument be? It's not very interesting to require > that int(x) takes an integer argument! > > Most of the popular standard types already have a constructor function > that's named after their type: > > int(), long(), float(), complex(), str(), unicode(), tuple(), list() > > We could make the constructor take the same argument(s) as the > corresponding built-in function. > > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > > > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) -1 While this looks cute, I think it would break a lot of introspection code or other code which special cases Python functions for some reason since type(int) would no longer return types.BuiltinFunctionType. If you don't like the names, why not take the change and create a new module which then exposes the Python class hierarchy (much like we did with the exceptions.py module before it was intregrated as C module) ?! > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Not really. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? As function, I'd say: take either a sequence of tuples or another dictionary as argument. mxTools already has such a function, BTW. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Tue Jun 5 19:12:09 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 13:12:09 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <15133.8441.983687.572159@beluga.mojam.com> Just catching up on a little c.l.py and I noticed the effbot's response to the Unicode degree inquiry. I tried to create and print one and got this: % python Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 Type "copyright", "credits" or "license" for more information. >>> u"\N{DEGREE SIGN}" u'\xb0' >>> print u"\N{DEGREE SIGN}" Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Shouldn't I be able to print arbitrary Unicode objects? What am I missing (this time)? Skip From mwh@python.net Tue Jun 5 19:16:52 2001 From: mwh@python.net (Michael Hudson) Date: 05 Jun 2001 19:16:52 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 13:12:09 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Just catching up on a little c.l.py and I noticed the effbot's response to > the Unicode degree inquiry. I tried to create and print one and got this: > > % python > Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) > [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> u"\N{DEGREE SIGN}" > u'\xb0' > >>> print u"\N{DEGREE SIGN}" > > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Shouldn't I be able to print arbitrary Unicode objects? What am I missing > (this time)? The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") � Cheers, Skippy's little helper. -- In case you're not a computer person, I should probably point out that "Real Soon Now" is a technical term meaning "sometime before the heat-death of the universe, maybe". -- Scott Fahlman From guido@digicool.com Tue Jun 5 19:26:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:26:22 -0400 Subject: [Python-Dev] SourceForget Python Foundry needs help Message-ID: <200106051826.f55IQMS29540@odiug.digicool.com> The Python Foundry at SF could use a hand. If you're interested in helping out, please write to Chuck Esterbrook, below! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Tue, 05 Jun 2001 14:12:07 -0400 From: Chuck Esterbrook To: guido@python.org Subject: SourceForget Python Foundry Hi Guido, I'm one of the admins of the SourceForge Python Foundry. In case you're not familiar with them, foundries are simply SF web portals centered around a particular topic. Admins can customize the HTML text and graphics and SourceForge stats are integrated on the side. I haven't had much time to give the Python Foundry the attention it deserves. I was wondering if you knew of anyone who had the inclination, time and energy to join the Foundry as an admin and expand it. If it becomes strong enough, we could possibly get it featured on the sidebar of the main SF page, which would then bring more attention to Python and its related projects. The foundry is at: http://sourceforge.net/foundry/python-foundry/ - -Chuck ------- End of Forwarded Message From barry@digicool.com Tue Jun 5 19:31:12 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 14:31:12 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.9584.871074.255497@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Now invoke the Zen of Python: "There should be one-- and GvR> preferably only one --obvious way to do it." So why not make GvR> these built-in functions *be* the corresponding types? Then GvR> instead of >> int GvR> GvR> you would see >> int GvR> +1 GvR> but otherwise the behavior would be identical. (Note that I GvR> don't require that a factory function returns a *new* object GvR> each time.) GvR> If we did this for all built-in types, we'd have to add maybe GvR> a dozen new built-in names -- I think that's no big deal and GvR> actually helps naming types. The types module, with its GvR> awkward names and usage, can be deprecated. I'm a little concerned about this, since the names that would be added are probably in common use as variable and/or argument names. I.e. At one point `list' was a very common identifier in Mailman, and I'm sure `dict' is used quite often still. I guess this would be okay as long as working code doesn't break because of it. OTOH, I've had fewer needs for a dict builtin (though not non-zero), and easily zero needs for traceback objects, code objects, etc. GvR> There are details to be worked out, e.g. GvR> - Do we really want to have built-in names for code objects, GvR> traceback objects, and other figments of Python's internal GvR> workings? I'd say no. However, we could probably C-ify the types module, a la, the exceptions module, and that would be the logical place to put the type factories. GvR> - What should the argument to dict() be? A list of (key, GvR> value) pairs, a list of alternating keys and values, or GvR> something else? You definitely want to at least accept a sequence of key/value 2-tuples, so that d.items() can be retransformed into a dictionary object. -Barry From guido@digicool.com Tue Jun 5 19:38:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:38:23 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 14:31:12 EDT." <15133.9584.871074.255497@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> Message-ID: <200106051838.f55IcNk29624@odiug.digicool.com> > I'm a little concerned about this, since the names that would be added > are probably in common use as variable and/or argument names. I.e. At > one point `list' was a very common identifier in Mailman, and I'm sure > `dict' is used quite often still. I guess this would be okay as long > as working code doesn't break because of it. It would be hard to see how this would break code, since built-ins are searched *after* all variables that the user defines. --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn@worldonline.dk Tue Jun 5 19:46:04 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 05 Jun 2001 18:46:04 GMT Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3b1d2894.16564838@smtp.worldonline.dk> [Guido] >Now invoke the Zen of Python: "There should be one-- and preferably >only one --obvious way to do it." So why not make these built-in >functions *be* the corresponding types? Then instead of > > >>> int > > >you would see > > >>> int > > >but otherwise the behavior would be identical. (Note that I don't >require that a factory function returns a *new* object each time.) I think that it will be difficult to avoid creating a new object under jython because calling a type already directly calls the type's java constructor. >If we did this for all built-in types, we'd have to add maybe a dozen >new built-in names -- I think that's no big deal and actually helps >naming types. The types module, with its awkward names and usage, can >be deprecated. > >There are details to be worked out, e.g. > >- Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? > >- What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? Jython already interprets the arguments to the dict type as alternating key/values: >>> from types import DictType as dict >>> dict('a', 97, 'b', 98, 'c', 99) {'b': 98, 'a': 97, 'c': 99} >>> This behaviour isn't documented on the python side so it can be changed. However, it it is necessary to maintain this API on the java side and we have currently no way to prevent the type constructors from being visible and callable from python. Whatever is decided, I hope jython can keep the current semantics of its dict type. regards, finn From fdrake@acm.org Tue Jun 5 20:11:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 5 Jun 2001 15:11:58 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3b1d2894.16564838@smtp.worldonline.dk> References: <200106051721.f55HLW729400@odiug.digicool.com> <3b1d2894.16564838@smtp.worldonline.dk> Message-ID: <15133.12030.538647.295809@cj42289-a.reston1.va.home.com> Finn Bock writes: > >>> from types import DictType as dict > >>> dict('a', 97, 'b', 98, 'c', 99) > {'b': 98, 'a': 97, 'c': 99} > >>> > > This behaviour isn't documented on the python side so it can be changed. > However, it it is necessary to maintain this API on the java side and we > have currently no way to prevent the type constructors from being > visible and callable from python. This should not be a problem: If dict() is called with one arg, the new semantics can be used, but with an odd number of args, your existing semantics can be used. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@pobox.com (Skip Montanaro) Tue Jun 5 20:23:54 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 14:23:54 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: <15133.12746.666351.127286@beluga.mojam.com> Me> [what am I missing?] Michael> The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") =B0 Hmmm... I don't believe I've ever encountered an object in Python befor= e that you couldn't simply print. Are Unicode objects unique in this res= pect? Seems like a bug (or at least a feature) to me. Skip From mwh@python.net Tue Jun 5 20:31:33 2001 From: mwh@python.net (Michael Hudson) Date: 05 Jun 2001 20:31:33 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 14:23:54 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Me> [what am I missing?] > > Michael> The encoding: > > >>> print u"\N{DEGREE SIGN}".encode("latin1") > � > > Hmmm... I don't believe I've ever encountered an object in Python before > that you couldn't simply print. Are Unicode objects unique in this respect? > Seems like a bug (or at least a feature) to me. Well, what would you have >>> print u"\N{DEGREE SIGN}" (or equivalently str(u"\N{DEGREE SIGN}") since we're eventually going to have to stuff an 8-bit string down stdout) do? I don't think >>> print u"\N{DEGREE SIGN}" u'\xb0' is really an option. This is old news. It must have been discussed here before 1.6, I'd have thought. Cheers, M. -- 58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From barry@digicool.com Tue Jun 5 20:46:54 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 15:46:54 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> Message-ID: <15133.14126.221568.235269@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm a little concerned about this, since the names that would >> be added are probably in common use as variable and/or argument >> names. I.e. At one point `list' was a very common identifier >> in Mailman, and I'm sure `dict' is used quite often still. I >> guess this would be okay as long as working code doesn't break >> because of it. GvR> It would be hard to see how this would break code, since GvR> built-ins are searched *after* all variables that the user GvR> defines. Wasn't there talk about issuing warnings for locals shadowing built-ins (or was that globals?). If not, fergitaboutit. If so, that would fall under the category of "breaking". -Barry From tim@digicool.com Tue Jun 5 20:56:59 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 15:56:59 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: Just to reduce this to its most trivial point , > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? the middle one (perhaps generalized to "iterable object alternately producing keys and values") is most useful in practice. Perl gets a lot of mileage of that, e.g. think of using re.findall() to build a list of mail-header field, value, field, value, ... thingies to feed to a dict. A list of (key, value) pairs is prettiest, but almost nothing *produces* such a list except for dict.items(); we don't need another way to spell dict.copy(). From guido@digicool.com Tue Jun 5 20:56:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 15:56:05 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 15:46:54 EDT." <15133.14126.221568.235269@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> Message-ID: <200106051956.f55Ju5130078@odiug.digicool.com> > >>>>> "GvR" == Guido van Rossum writes: > > >> I'm a little concerned about this, since the names that would > >> be added are probably in common use as variable and/or argument > >> names. I.e. At one point `list' was a very common identifier > >> in Mailman, and I'm sure `dict' is used quite often still. I > >> guess this would be okay as long as working code doesn't break > >> because of it. > > GvR> It would be hard to see how this would break code, since > GvR> built-ins are searched *after* all variables that the user > GvR> defines. > > Wasn't there talk about issuing warnings for locals shadowing > built-ins (or was that globals?). If not, fergitaboutit. If so, that > would fall under the category of "breaking". > > -Barry You may be thinking of this: >>> def f(int): def g(): int :1: SyntaxWarning: local name 'int' in 'f' shadows use of 'int' as global in nested scope 'g' >>> This warns you when you override a built-in or global *and* you use that same name in a nested function. This code will mean something different in 2.2 anyway (g's reference to int will become a reference to f's int because of nested scopes). But this does not cause a warning: >>> def g(): int = 12 >>> Nor does this: >>> int = 12 >>> So we're safe. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:01:47 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:01:47 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: <15133.15019.237484.605267@beluga.mojam.com> Michael> Well, what would you have >>>> print u"\N{DEGREE SIGN}" Michael> (or equivalently Michael> str(u"\N{DEGREE SIGN}") Michael> since we're eventually going to have to stuff an 8-bit string Michael> down stdout) do? How about if print calls the .encode("latin1") method for me it gets an ASCII encoding error? If "latin1" isn't a reasonable default choice, it could pick an encoding based on the current locale. Michael> I don't think >>>> print u"\N{DEGREE SIGN}" Michael> u'\xb0' Michael> is really an option. I agree. I'd like to see a little circle. Michael> This is old news. It must have been discussed here before 1.6, Michael> I'd have thought. Perhaps, but I suspect many people suffered from glazing over of the eyes reading all that the messages exchanged about Unicode arcana. I know I did. Skip From barry@digicool.com Tue Jun 5 21:01:29 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 16:01:29 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> <200106051956.f55Ju5130078@odiug.digicool.com> Message-ID: <15133.15001.19308.108288@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> You may be thinking of this: Yup. GvR> So we're safe. Cool! Count me as a solid +1 then. -Barry From aahz@rahul.net Tue Jun 5 21:10:06 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 13:10:06 -0700 (PDT) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <15133.15019.237484.605267@beluga.mojam.com> from "Skip Montanaro" at Jun 05, 2001 03:01:47 PM Message-ID: <20010605201006.15CAD99C83@waltz.rahul.net> Skip Montanaro wrote: > > Perhaps, but I suspect many people suffered from glazing over of the eyes > reading all that the messages exchanged about Unicode arcana. I know I did. Ditto. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From mal@lemburg.com Tue Jun 5 21:14:39 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:14:39 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> Message-ID: <3B1D3DAF.DAE727AE@lemburg.com> > > [Guido] > > Now invoke the Zen of Python: "There should be one-- and preferably > > only one --obvious way to do it." So why not make these built-in > > functions *be* the corresponding types? Then instead of > > > > >>> int > > > > > > you would see > > > > >>> int > > > > > > but otherwise the behavior would be identical. (Note that I don't > > require that a factory function returns a *new* object each time.) > > -1 > > While this looks cute, I think it would break a lot of introspection > code or other code which special cases Python functions for > some reason since type(int) would no longer return > types.BuiltinFunctionType. > > If you don't like the names, why not take the change and > create a new module which then exposes the Python class hierarchy > (much like we did with the exceptions.py module before it was > intregrated as C module) ?! Looks like I'm alone with my uncertain feeling about this move... oh well. BTW, we should consider having more than one contructor for an object rather than trying to stuff all possible options and parameters into one overloaded super-constructor. I've done this in many of my mx extensions and have so far had great success with it (better programming error detection, better docs, more intuitive interfaces, etc.). In that sense, more than one way to do something will actually help clarify what the programmer really wanted. Just a thought... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jun 5 21:16:02 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:16:02 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> Message-ID: <3B1D3E02.3C9AE1F4@lemburg.com> Skip Montanaro wrote: > > Michael> Well, what would you have > > >>>> print u"\N{DEGREE SIGN}" > > Michael> (or equivalently > > Michael> str(u"\N{DEGREE SIGN}") > > Michael> since we're eventually going to have to stuff an 8-bit string > Michael> down stdout) do? > > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. Please see Lib/site.py for details on how to enable all these goodies -- it's all there, just disabled and meant for super-users only ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Tue Jun 5 21:22:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 16:22:43 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 22:14:39 +0200." <3B1D3DAF.DAE727AE@lemburg.com> References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> Message-ID: <200106052022.f55KMhq30227@odiug.digicool.com> > > -1 > > > > While this looks cute, I think it would break a lot of introspection > > code or other code which special cases Python functions for > > some reason since type(int) would no longer return > > types.BuiltinFunctionType. > > Looks like I'm alone with my uncertain feeling about this move... > oh well. Well, I don't see how someone could be doing introspection on int and be confused when it's not a function -- either you (think you) know it's a function, so you use it as a function without introspecting it, and that continues to work; or you're open to all possibilities, and then you'll introspect it, and then you'll discover what it is. > BTW, we should consider having more than one contructor for an > object rather than trying to stuff all possible options and parameters > into one overloaded super-constructor. I've done this in many of > my mx extensions and have so far had great success with it (better > programming error detection, better docs, more intuitive interfaces, > etc.). In that sense, more than one way to do something will > actually help clarify what the programmer really wanted. Just > a thought... Yes, but the other ways are spelled as factory functions. Maybe, *maybe* the other factory functions could be class-methods, but don't hold your hopes high. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Tue Jun 5 21:30:18 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Jun 2001 22:30:18 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <200106052030.f55KUIu02762@mira.informatik.hu-berlin.de> > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. These are both bad ideas. First, there is no guarantee that your terminal is capable of displaying the circle at all. Maybe the typewriter connected to your computer doesn't even have a degree type. Further, maybe it does support displaying the degree sign, but then it likely fails for >>> print u"\N{EURO SIGN}" Or, worse, instead of displaying the EURO SIGN, it may just display the CURRENCY SIGN (since it may chose to use ISO-8859-15, but the terminal assumes ISO-8859-1). So unless you can come up with a really good way to find out what the terminal is capable of displaying (plus finding out how to make it display these things), I think Python is better off raising an exception than producing garbage output. In addition, what you see is the "default encoding", i.e. it doesn't just apply to print; it also applies to all places where Unicode objects are converted into byte strings. Assuming any default other than ASCII has been considered as a bad idea by the authors of the Unicode support. IMO, the next-most reasonable default would have been UTF-8, *not* Latin-1, since UTF-8 can represent the EURO SIGN and every other character in Unicode. Most likely, you terminal will have difficulties producing a circle symbol when it gets the UTF-8 representation of the DEGREE SIGN, though. So the best thing is still to give it into the hands of the application author. As MAL points out, the administrator can give a different default encoding in site.py. Since the default default is ASCII, applications assuming that the default is ASCII won't break on your system. OTOH, applications developed on your system may then break elsewhere, since the default in site.py might be different. Regards, Martin From sdm7g@Virginia.EDU Tue Jun 5 21:41:11 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Tue, 5 Jun 2001 16:41:11 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I would say to put all of the common constructors in __builtin__, and all of the odd ducks can go into the new module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A varargs list of (key,value) tuples would probably be most useful. Since most of these functions, before being classed as constructors, were considered coercion function, I wouldn't be against having it try to do something sensible with a variety of args. -- sdm From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:47:17 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:47:17 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1D3E02.3C9AE1F4@lemburg.com> References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> Message-ID: <15133.17749.390756.115544@beluga.mojam.com> mal> Please see Lib/site.py for details on how to enable all these mal> goodies -- it's all there, just disabled and meant for super-u= sers mal> only ;-) Okay, I found the encoding section. I changed the encoding variable assignment to be encoding =3D "latin1" and now the degree sign print works. What other side-effects will that= have besides on printed representations? It appears I can create (but not s= ee properly?) variable names containing latin1 characters: >>> =FCmlaut =3D "=FCmlaut" >>> print locals().keys() ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpa= t', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__nam= e__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'missi= on', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] I am having trouble printing some strings containing latin1 characters:= >>> print =FCmlaut mlaut >>> type("=FCmlaut") >>> type(string.letters) >>> print "=FCmlaut" mlaut >>> print string.letters abcdefghijklmnopqrstuvwxyz=B5=DF=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC= =ED=EE=EF=F0=F1=F2=F3=F4=F5=F6=F8=F9=FA=FB=FC=FD=FE=FFABCDEFGHIJKLMNOPQ= RSTUVWXYZ=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4= =D5=D6=D8=D9=DA=DB=DC=DD=DE >>> print string.letters[55:] =FC=FD=FE=FFABCDEFGHIJKLMNOPQRSTUVWXYZ=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA= =CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=D8=D9=DA=DB=DC=DD=DE The above was pasted from Python running in a shell session in XEmacs, = which is certainly latin1-aware. Why did I have trouble seeing the =FC in so= me situations, but not in others? Are the ramifications of all this encoding stuff documented somewhere? Skip From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:56:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:56:58 -0500 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.18330.910736.249838@beluga.mojam.com> Is the intent of using int and friends as constructors instead of just coercion functions that I should (eventually) be able to do this: class NonNegativeInt(int): def __init__(self, val): if int(val) < 0: raise ValueError, "Value must be >= 0" int.__init__(self, val) self.a = 47 ... ? Skip From tim@digicool.com Tue Jun 5 22:01:23 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:01:23 -0400 Subject: [Python-Dev] another dict crasher Message-ID: [Tim's dict-crasher dies w/ a stack overflow, but with a KeyError when he sticks a print inside __eq__] OK, I understand this now, at least on Windows. In PyObject_Print(), #ifdef USE_STACKCHECK if (PyOS_CheckStack()) { PyErr_SetString(PyExc_MemoryError, "stack overflow"); return -1; } #endif On Windows, PyOs_CheckStack() is __try { /* _alloca throws a stack overflow exception if there's not enough space left on the stack */ _alloca(PYOS_STACK_MARGIN * sizeof(void*)); return 0; } __except (EXCEPTION_EXECUTE_HANDLER) { /* just ignore all errors */ } return 1; The _alloca dies, so the __except falls thru and PyOs_CheckStack returns 1. PyObject_Print sets the "stack overflow" error and returns -1. This winds its way thru the rich comparison attempt, until lookdict() sees it and says, Hmm. I can't compare this thing without raising error. So this can't be the key I'm looking for. First I'll clear the error. Hmm. Can't find it anywhere else in the dict either. Hmm. There were no errors pending at the time I got called, so I'll leave things that way and return "not found". At that point about 15,000 levels of recursion unwind, and KeyError gets raised. I don't believe PyOS_CheckStack() is implemented on Unixoid systems (just Windows and Macs), so some other accident must account for the KeyError on Linux. Remains unclear what to do about it; the idea that all errors raised by dict lookup comparisons are ignorable is sure a tempting target. From mal@lemburg.com Tue Jun 5 22:00:23 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 23:00:23 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1D4866.A40AAB1C@lemburg.com> Skip Montanaro wrote: > > mal> Please see Lib/site.py for details on how to enable all these > mal> goodies -- it's all there, just disabled and meant for super-users > mal> only ;-) > > Okay, I found the encoding section. I changed the encoding variable > assignment to be > > encoding = "latin1" > > and now the degree sign print works. What other side-effects will that have > besides on printed representations? It appears I can create (but not see > properly?) variable names containing latin1 characters: > > >>> �mlaut = "�mlaut" Huh ? That should not be possible ! Python literals are still ASCII. >>> �mlaut = '�mlaut' File "", line 1 �mlaut = '�mlaut' ^ SyntaxError: invalid syntax > >>> print locals().keys() > ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] > > I am having trouble printing some strings containing latin1 characters: > > >>> print �mlaut > mlaut > >>> type("�mlaut") > > >>> type(string.letters) > > >>> print "�mlaut" > mlaut > >>> print string.letters > abcdefghijklmnopqrstuvwxyz��ABCDEFGHIJKLMNOPQRSTUVWXYZ�� > >>> print string.letters[55:] > ��ABCDEFGHIJKLMNOPQRSTUVWXYZ�� > > The above was pasted from Python running in a shell session in XEmacs, which > is certainly latin1-aware. Why did I have trouble seeing the � in some > situations, but not in others? No idea what's going on there... the encoding parameter should not have any effect on printing normal 8-bit strings. It only defines the standard encoding used in coercion and auto-conversion from Unicode to 8-bit strings and vice-versa. > Are the ramifications of all this encoding stuff documented somewhere? The basic things can be found in Misc/unicode.txt, on the i18n sig page and some resources on the web. I'll give a talk in Bordeaux about Unicode too, which will probably provide some additional help as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Tue Jun 5 22:14:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 17:14:07 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 16:59:01 EDT." References: Message-ID: <200106052114.f55LE7P30481@odiug.digicool.com> > Is the intent of using int and friends as constructors instead of just > coercion functions that I should (eventually) be able to do this: > > class NonNegativeInt(int): > def __init__(self, val): > if int(val) < 0: > raise ValueError, "Value must be >= 0" > int.__init__(self, val) > self.a = 47 > ... > > ? Yes, sort-of. The details will be slightly different. I'm not comfortable with letting a user-provided __init__() method change the value of self, so I am brooding on a work-around that separates allocation and one-time initialization from __init__(). Watch PEP 253. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@digicool.com Tue Jun 5 22:16:03 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:16:03 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: [MAL, to Skip] > Huh ? That should not be possible ! Python literals are still > ASCII. > > >>> �mlaut = '�mlaut' > File "", line 1 > �mlaut = '�mlaut' > ^ > SyntaxError: invalid syntax That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug . From gward@python.net Tue Jun 5 23:29:49 2001 From: gward@python.net (Greg Ward) Date: Tue, 5 Jun 2001 18:29:49 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com>; from guido@digicool.com on Tue, Jun 05, 2001 at 01:21:32PM -0400 References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <20010605182949.A7545@gerg.ca> On 05 June 2001, Guido van Rossum said: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 from me too. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. Cool! > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Probably not, as long as they are accessible somewhere. I could live with either a C-ified 'types' module or shoving these into the 'new' module, although I think I prefer the latter slightly. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? I love /F's suggestion dict(k=v, k=v, ...) but that's icing on the cake -- cool feature, looks pretty, etc. (And *finally* Python will have all the syntactic sugar that Perl programmers like to have. ;-) I think the real answer should be dict(k, v, k, v) like Jython. If both can be supported, that would be swell. Greg -- Greg Ward - Linux geek gward@python.net http://starship.python.net/~gward/ Does your DRESSING ROOM have enough ASPARAGUS? From barry@digicool.com Tue Jun 5 23:45:00 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 18:45:00 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <15133.24812.791796.557452@anthem.wooz.org> >>>>> "GW" == Greg Ward writes: GW> I love /F's suggestion GW> dict(k=v, k=v, ...) One problem with this syntax is that the `k's can only be valid Python identifiers, so you'd at least need /some/ other syntax to support construction with arbitrary hashable keys. -Barry From fredrik@pythonware.com Tue Jun 5 23:57:43 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 6 Jun 2001 00:57:43 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <011f01c0ee12$eeda9ba0$0900a8c0@spiff> greg wrote: > > - What should the argument to dict() be? A list of (key, value) > > pairs, a list of alternating keys and values, or something else? > > I love /F's suggestion > > dict(k=v, k=v, ...) > > but that's icing on the cake -- cool feature, looks pretty, etc. note that the python interpreter builds that dictionary for you if you use the METH_KEYWORDS flag... > I think the real answer should be > > dict(k, v, k, v) > > like Jython. given that Jython already gives a meaning to dict with more than one argument, I suggest: dict(d) # consistency dict(k, v, k, v, ...) # jython compatibility dict(*[k, v, k, v, ...]) # convenience dict(k=v, k=v, ...) # common pydiom and maybe: dict(d.items()) # symmetry > If both can be supported, that would be swell. how about: if (PyTuple_GET_SIZE(args)) { assert PyDict_GET_SIZE(kw) == 0 if (PyTuple_GET_SIZE(args) == 1) { args = PyTuple_GET_ITEM(args, 0); if (PyDict_Check(args)) dict = args.copy() else if (PySequence_Check(args)) dict = {} for k, v in args: dict[k] = v } else { assert (PySequence_Size(args) & 0) == 0 # maybe dict = {} for i in range(len(args)): dict[args[i]] = args[i+1] } } else { assert PyDict_GET_SIZE(kw) > 0 # probably dict = kw } From MarkH@ActiveState.com Wed Jun 6 00:13:27 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 6 Jun 2001 09:13:27 +1000 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: [Paul] > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. As a father of a 14 year old girl, I can relate to that!! [Aahz] > Are you trying to imply that there's a difference between girls and > boys? It would seem a safe assumption that you are not a parent of a teenager. :) Mark. From gward@python.net Wed Jun 6 02:03:33 2001 From: gward@python.net (Greg Ward) Date: Tue, 5 Jun 2001 21:03:33 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <011f01c0ee12$eeda9ba0$0900a8c0@spiff>; from fredrik@pythonware.com on Wed, Jun 06, 2001 at 12:57:43AM +0200 References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> <011f01c0ee12$eeda9ba0$0900a8c0@spiff> Message-ID: <20010605210333.B7687@gerg.ca> On 06 June 2001, Fredrik Lundh said: > given that Jython already gives a meaning to dict with more > than one argument, I suggest: > > dict(d) # consistency > dict(k, v, k, v, ...) # jython compatibility > dict(*[k, v, k, v, ...]) # convenience > dict(k=v, k=v, ...) # common pydiom Yikes. I still think that #2 is the "essential" spelling. I think Tim was speaking of #1 when he said we don't need another way to spell copy() -- I'm inclined to agree. I think the fact that you can say int(3) or str("foo") are not strong arguments in favour of dict({...}), because of mutability, because of the overhead of dicts, because we already have the copy module, maybe other factors as well. > and maybe: > > dict(d.items()) # symmetry I think this is massive overloading. Two interfaces to a single function ought to be enough. I for one have long wished for syntactic sugar like Perl's => operator, which lets you do this: %band = { geddy => "bass", alex => "guitar", neil => "drums" } ...and keyword arg syntax is really the natural thing here. Being able to say band = dict(geddy="bass", alex="guitar", neil="drums") would be good enough for me. And it's less mysterious than Perl's =>, which is just a magic comma that forces its LHS to be interpreted as a string. Weird. Greg -- Greg Ward - Linux geek gward@python.net http://starship.python.net/~gward/ If you and a friend are being chased by a lion, it is not necessary to outrun the lion. It is only necessary to outrun your friend. From mal@lemburg.com Wed Jun 6 09:03:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 10:03:13 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1DE3C1.90BA3DD6@lemburg.com> Tim Peters wrote: > > [MAL, to Skip] > > Huh ? That should not be possible ! Python literals are still > > ASCII. > > > > >>> �mlaut = '�mlaut' > > File "", line 1 > > �mlaut = '�mlaut' > > ^ > > SyntaxError: invalid syntax > > That was Guido's intent, and what the Ref Man says, but the tokenizer uses > C's isalpha() so in reality it's locale-dependent. I think at least one > German on Python-Dev has already threatened to kill him if he ever fixes > this bug . Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack@oratrix.nl Wed Jun 6 12:24:32 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:24:32 +0200 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: Message by "Eric S. Raymond" , Mon, 4 Jun 2001 17:19:08 -0400 , <20010604171908.A21831@thyrsus.com> Message-ID: <20010606112432.C4A43303181@snelboot.oratrix.nl> The early microcomputers (8008, 6800, 6502) are actually a lot more like the PDP-8 than the PDP-11: a single (or possibly double) accumulator register and a few special purpose registers hardwired to various instructions. The 68000, Z8000 and NS16032 were the first true successors of the PDP-11, sharing (to an extent) the unique characteristics of it's design with general purpose registers (with even SP and PC being general purpose registers with only very little magic attached to them) and an orthogonal design. The 68000 still had lots of little quirks in the instruction set, the latter two actually improved on the PDP-11 set (where a couple of instructions like XOR would only work with register-destination because it was added to the design in a stage where there weren't enough bits left in the instruction space, I guess). And the 8086 was just a souped-up 8080/8008: each register had a different function, no orthogonality, etc. Intel didn't get it "right" until the 386 32-bit instruction set (and even there some of the old baggage can still be seen). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Wed Jun 6 12:39:56 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:39:56 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Message by "Fredrik Lundh" , Tue, 5 Jun 2001 19:34:35 +0200 , <001301c0ede5$cb804a10$e46940d5@hagrid> Message-ID: <20010606113957.4A395303181@snelboot.oratrix.nl> For the dictionary initializer I would definitely want to be able to give an object that adheres to the dictionary protocol, so that I can to things like import anydbm f = anydbm.open("foo", "r") incore = dict(f) Hmm, I guess this goes for most types: list() and tuple() should take any iterable object, etc. The one question is what "dictionary protocol" mean. Should it support items()? Is only x.keys()/x[] good enough? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Wed Jun 6 19:36:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 20:36:48 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> <200106052022.f55KMhq30227@odiug.digicool.com> Message-ID: <3B1E7840.C93EA788@lemburg.com> Guido van Rossum wrote: > > > > -1 > > > > > > While this looks cute, I think it would break a lot of introspection > > > code or other code which special cases Python functions for > > > some reason since type(int) would no longer return > > > types.BuiltinFunctionType. > > > > Looks like I'm alone with my uncertain feeling about this move... > > oh well. > > Well, I don't see how someone could be doing introspection on int and > be confused when it's not a function -- either you (think you) know > it's a function, so you use it as a function without introspecting it, > and that continues to work; or you're open to all possibilities, and > then you'll introspect it, and then you'll discover what it is. Ok, let's put it in another way: The point is that your are changing the type of very basic building parts in Python and that is likely to cause failure in places which will most likely be hard to find to fix. Becides we don't really gain anything from replacing builtin functions with classes (to the contrary: we lose some, since we can no longer use the function call optimizations for builtins and have to go through all the generic call mechanism code instead). Also, have you considered the effects this has on restricted execution mode ? What will happen if someone replaces the builtins with special versions which hide some security relevant objects, e.g. open() is a prominent candidate for this. Why not put the type objects into a separate module instead of reusing the builtins ? > > BTW, we should consider having more than one contructor for an > > object rather than trying to stuff all possible options and parameters > > into one overloaded super-constructor. I've done this in many of > > my mx extensions and have so far had great success with it (better > > programming error detection, better docs, more intuitive interfaces, > > etc.). In that sense, more than one way to do something will > > actually help clarify what the programmer really wanted. Just > > a thought... > > Yes, but the other ways are spelled as factory functions. Maybe, > *maybe* the other factory functions could be class-methods, but don't > hold your hopes high. No... why make things complicated when simple functions work just fine as factories. Multilpe constructors on a class would make subclassing a pain... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Wed Jun 6 20:00:07 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 12:00:07 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1E7DB7.408BC089@ActiveState.com> Skip Montanaro wrote: > >... > > Okay, I found the encoding section. I changed the encoding variable > > assignment to be > > encoding = "latin1" Danger, Will Robinson! You can now write software that will work great on your version of Python and will crash on everyone else's. You haven't just changed the behavior of "print" but of EVERY attempted automatic coercion from Unicode to an 8-bit string. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim.one@home.com Wed Jun 6 20:27:59 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 6 Jun 2001 15:27:59 -0400 Subject: [Python-Dev] -U option? Message-ID: http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 python -U breaks import with 2.1 Anyone understand -U? Like, should it work, why is it there if it doesn't and isn't expected to, and are there docs for it beyond the "python -h" blurb? Last mention of it I found in c.l.py was """ Date: Tue, 06 Feb 2001 16:09:46 +0100 From: "M.-A. Lemburg" Subject: Re: [Python-Dev] Pre-PEP: Python Character Model ... Well, with -U on, Python will compile "" into u"", ... last I tried, Python didn't even start up :-( ... """ An earlier msg (08 Sep 2000) said: """ Note that many thing fail when Python is started with -U... that switch was introduced to be able to get an idea of which parts of the standard fail to work in a mixed string/Unicode environment. """ If this is just an internal development switch, python -h probably shouldn't advertise it. From barry@digicool.com Wed Jun 6 20:37:26 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 6 Jun 2001 15:37:26 -0400 Subject: [Python-Dev] -U option? References: Message-ID: <15134.34422.62060.936788@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Anyone understand -U? Like, should it work, why is it there TP> if it doesn't and isn't expected to, and are there docs for it TP> beyond the "python -h" blurb? Nope, except that /for me/ an installed Python 2.1 seems to start up just fine with -U. My uninstalled (i.e. run from the source tree) 2.2a0 fails when given -U: @anthem[[~/projects/python:1068]]% ./python Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1069]]% ./python -U 'import site' failed; use -v for traceback Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1070]]% ./python -U -v # ./Lib/site.pyc matches ./Lib/site.py import site # precompiled from ./Lib/site.pyc # ./Lib/os.pyc matches ./Lib/os.py import os # precompiled from ./Lib/os.pyc import posix # builtin # ./Lib/posixpath.pyc matches ./Lib/posixpath.py import posixpath # precompiled from ./Lib/posixpath.pyc # ./Lib/stat.pyc matches ./Lib/stat.py import stat # precompiled from ./Lib/stat.pyc # ./Lib/UserDict.pyc matches ./Lib/UserDict.py import UserDict # precompiled from ./Lib/UserDict.pyc 'import site' failed; traceback: Traceback (most recent call last): File "./Lib/site.py", line 91, in ? from distutils.util import get_platform ImportError: No module named distutils.util Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> # clear __builtin__._ # clear sys.path # clear sys.argv # clear sys.ps1 # clear sys.ps2 # clear sys.exitfunc # clear sys.exc_type # clear sys.exc_value # clear sys.exc_traceback # clear sys.last_type # clear sys.last_value # clear sys.last_traceback # restore sys.stdin # restore sys.stdout # restore sys.stderr # cleanup __main__ # cleanup[1] signal # cleanup[1] site # cleanup[1] posix # cleanup[1] exceptions # cleanup[2] stat # cleanup[2] posixpath # cleanup[2] UserDict # cleanup[2] os # cleanup sys # cleanup __builtin__ # cleanup ints: 1 unfreed int in 1 out of 3 blocks # cleanup floats -Barry From mal@lemburg.com Wed Jun 6 21:27:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 22:27:19 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1E9227.7F67971E@lemburg.com> Tim Peters wrote: > > http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 > python -U breaks import with 2.1 > > Anyone understand -U? Like, should it work, why is it there if it doesn't > and isn't expected to, and are there docs for it beyond the "python -h" > blurb? The -U option is there to be able to test drive Python into the Unicode age. As you and many others have noted, there's still a long way to go... > Last mention of it I found in c.l.py was > > """ > Date: Tue, 06 Feb 2001 16:09:46 +0100 > From: "M.-A. Lemburg" > Subject: Re: [Python-Dev] Pre-PEP: Python Character Model > > ... > Well, with -U on, Python will compile "" into u"", > ... > last I tried, Python didn't even start up :-( > ... > """ > > An earlier msg (08 Sep 2000) said: > > """ > Note that many thing fail when Python is started with -U... that > switch was introduced to be able to get an idea of which parts of > the standard fail to work in a mixed string/Unicode environment. > """ > > If this is just an internal development switch, python -h probably shouldn't > advertise it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Wed Jun 6 21:34:30 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 6 Jun 2001 22:34:30 +0200 Subject: [Python-Dev] -U option? Message-ID: <200106062034.f56KYUI02246@mira.informatik.hu-berlin.de> [Tim] > Anyone understand -U? Like, shoulQd it work, why is it there if it > doesn't and isn't expected to, and are there docs for it beyond the > "python -h" blurb? I'm not surprised it doesn't work, but I think it could be made working in many cases. I also think it would be worthwhile making that work; in the process, many places will be taught to accept Unicode strings which currently don't. [Barry] > Nope, except that /for me/ an installed Python 2.1 seems to start up > just fine with -U. [...] Sure, but it won't work martin@mira:~ > python -U [22:29] Python 2.2a0 (#336, May 29 2001, 09:28:57) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import string Traceback (most recent call last): File "", line 1, in ? ImportError: No module named string >>> import sys >>> sys.path ['', u'/usr/src/omni/lib/python', u'/usr/src/omni/lib/i586_linux_2.0_glibc2.1', u'/usr/ilu-2.0b1/lib', u'/home/martin', u'/usr/local/lib/python2.2', u'/usr/local/lib/python2.2/plat-linux2', u'/usr/local/lib/python2.2/lib-tk', u'/usr/local/lib/python2.2/lib-dynload', u'/usr/local/lib/python2.2/site-packages', u'/usr/local/lib/site-python'] The main problem (also with the SF bug report) seems to be that Unicode objects in sys.path are not accepted, but I think they should. Regards, Martin From tim.one@home.com Wed Jun 6 21:52:02 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 6 Jun 2001 16:52:02 -0400 Subject: [Python-Dev] -U option? In-Reply-To: <3B1E9227.7F67971E@lemburg.com> Message-ID: [MAL] > The -U option is there to be able to test drive Python into > the Unicode age. As you and many others have noted, there's > still a long way to go... That's cool. My question is why we're advertising (via -h) an option that end users have no chance of using successfully. From mal@lemburg.com Wed Jun 6 22:47:25 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 23:47:25 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1EA4ED.38BEB1AA@lemburg.com> Tim Peters wrote: > > [MAL] > > The -U option is there to be able to test drive Python into > > the Unicode age. As you and many others have noted, there's > > still a long way to go... > > That's cool. My question is why we're advertising (via -h) an option that > end users have no chance of using successfully. I guess I just added the flag to the -h message without thinking much about it... it was added in some alpha release. Anyway, these bug reports will keep hitting us which is good in the sense that it'll eventually push Python into the Unicode arena. We could need some funding for this, though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Thu Jun 7 00:00:52 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 16:00:52 -0700 Subject: [Python-Dev] urllib2 Message-ID: <3B1EB624.563DABE0@ActiveState.com> Tim asked me to look into test_urllib2 failure. I notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>: http://www.faqs.org/rfcs/rfc1738.html Anyhow, there are a variety of problems. :( First, test_urllib2 says: file_url = "file://%s" % urllib2.__file__ This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it. So that's problem #1, we aren't compatible with mildly broken Windows file URLs. Problem #2 is that the test program generates mildly broken URLs on Windows. That begs the question of what IS the right way to construct file urls in a cross-platform manner. I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this: """Convert a DOS path name to a file url. C:\foo\bar\spam.foo becomes ///C|/foo/bar/spam.foo """ is not really constructing a URL! And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry@scottb.demon.co.uk Thu Jun 7 00:31:51 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:31:51 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> Message-ID: <000a01c0eee0$dcfe9250$060210ac@private> Eric, As others have pointed out your time line is wrong... BArry p.s. I'm ex-DEC and old enough to have seen the introduction of the 6502 (got mine at university for $25 inc postage to the U.K.), Z80 and VAX (worked on product for V1.0 of VMS). Also for my sins argued with Gordon Bell and Dave Cutler about CPU architecture. > -----Original Message----- > From: Eric S. Raymond [mailto:esr@thyrsus.com] > Sent: 04 June 2001 21:11 > To: Barry Scott > Cc: python-dev (E-mail) > Subject: Re: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... > > > Barry Scott : > > Eric wrote: > > > While I'm at it, I should note that the design of the 11 was ancestral > > > to both the 8088 and 68000 microprocessors, and thus to essentially > > > every new general-purpose computer designed in the last fifteen years. > > > > The key to PDP-11 and VAX was lots of registers all a like and rich > > addressing modes for the instructions. > > > > The 8088 is very far from this design, its owes its design more to > > 4004 then the PDP-11. > > Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, > which was descended from the 11. Admiitedly, in the chain of transmission here > were two stages of redesign so bad that the connection got really tenuous. > -- > Eric S. Raymond > > ...Virtually never are murderers the ordinary, law-abiding people > against whom gun bans are aimed. Almost without exception, murderers > are extreme aberrants with lifelong histories of crime, substance > abuse, psychopathology, mental retardation and/or irrational violence > against those around them, as well as other hazardous behavior, e.g., > automobile and gun accidents." > -- Don B. Kates, writing on statistical patterns in gun crime > > From barry@scottb.demon.co.uk Thu Jun 7 00:57:11 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:57:11 +0100 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3B1E7840.C93EA788@lemburg.com> Message-ID: <000b01c0eee4$66f8a7e0$060210ac@private> Adding the atomic types of python as classes I'm +1 on. Perfomance is a problem for the parser to handle. If you have not already done so I suggest that you look at what MicroSoft .NET is doing in this area. In .NET, for example, int is a class and they have the technology to define the interface to an int and optimize the performace of the none derived cases. Barry From barry@scottb.demon.co.uk Thu Jun 7 01:03:54 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 01:03:54 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: <001001c0eee5$571a8090$060210ac@private> > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! If you embrace the world then NO. If America is you world then maybe. Barry From paulp@ActiveState.com Thu Jun 7 01:42:03 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 17:42:03 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> Message-ID: <3B1ECDDB.F1E8B19D@ActiveState.com> Barry Scott wrote: > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > If you embrace the world then NO. If America is you world then maybe. Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From MarkH@ActiveState.com Thu Jun 7 02:09:51 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 7 Jun 2001 11:09:51 +1000 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <000b01c0eee4$66f8a7e0$060210ac@private> Message-ID: > If you have not already done so I suggest that you look at > what MicroSoft .NET is doing in this area. In .NET, for example, > int is a class and they have the technology to define the > interface to an int and optimize the performace of the none > derived cases. Actually, that is not completely true. There is a "value type" and a class version. The value type is just the bits. The VM has instructions that work in the value type. As far as I am aware, you can not use a derived class with these instructions. They also have the concept of "sealed" meaning they can not be subclassed. Last time I looked, strings were an example of sealed classes. Mark. From greg@cosc.canterbury.ac.nz Thu Jun 7 03:16:00 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:16:00 +1200 (NZST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <20010606113957.4A395303181@snelboot.oratrix.nl> Message-ID: <200106070216.OAA02594@s454.cosc.canterbury.ac.nz> Jack Jansen : > Should it support > items()? Is only x.keys()/x[] good enough? Check for items(), and fall back on x.keys()/x[] if necessary. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Jun 7 03:19:03 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:19:03 +1200 (NZST) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <200106070219.OAA02597@s454.cosc.canterbury.ac.nz> > if we were really going to embrace the world we'd need to > handle more than a few European languages! -1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Jun 7 03:22:33 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:22:33 +1200 (NZST) Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... Message-ID: <200106070222.OAA02600@s454.cosc.canterbury.ac.nz> Jack Jansen : > with even SP and PC being general purpose registers The PC is not a general purpose register in the 68000. I've heard that this was because DEC had a patent on the idea. > the latter two actually improved on the PDP-11 The 16032 was certainly extremely orthogonal. I wrote an assembler and a compiler for it once, and it was a joy after coming from the Z80! It wasn't quite perfect, though - its lack of a "top-of-stack-indirect" addressing mode was responsible for the one wart in my otherwise-beautiful code generation strategy. Also, it must have been the most CISCy instruction set the world has ever seen, with the possible exception of the VAX... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Thu Jun 7 05:54:42 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 00:54:42 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: <3B1EB624.563DABE0@ActiveState.com> Message-ID: [Paul Prescod] > Tim asked me to look into test_urllib2 failure. Wow! I'm going to remember that. Have to ask people to do things more often . > notice that Guido's name is in the relevant RFC so I guess he's the > real expert <0.5 wink>: > > http://www.faqs.org/rfcs/rfc1738.html > > Anyhow, there are a variety of problems. :( I'm going to add one more. The spec says this is a file URL: fileurl = "file://" [ host | "localhost" ] "/" fpath But on Windows, urllib2.urlopen() throws up even on URLs like: file:///c:/bootlog.txt and file://localhost/c:/bootlog.txt AFAICT, those conform to the spec (the first with an empty host, the second with the special reserved hostname), Windows has no problem with either of them (heck, in Outlook I can click on them while I'm typing this email -- works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which Windows has no idea what to do with. Hard to see why it should, either. > First, test_urllib2 says: > > file_url = "file://%s" % urllib2.__file__ > > This is not going to construct a strictly standards conforming URL on > Windows but that form is still common enough and obvious enough that > maybe we should support it. Common among what? > So that's problem #1, we aren't compatible with mildly broken Windows > file URLs. I haven't found a sense in which Windows file URLs are broken. test_urllib2 creates bad URLs on Windows, and urllib2 itself transforms legit file URLs into broken ones on Windows, but both of those appear to be our (Python's) fault. Until std stuff works, worrying about extensions to the std seems premature. > Problem #2 is that the test program generates mildly broken URLs > on Windows. Yup. > That begs the question of what IS the right way to construct file urls > in a cross-platform manner. The spec seems vaguely clear to me on this point (it's vaguely unclear to me whether a colon is allowed in an fpath -- the text seems to say one thing but the BNF another). > I would have thought that urllib.pathname2url was the way but I note > that it isn't documented. Plus it is poorly named. A function that > does this: > > """Convert a DOS path name to a file url. > > C:\foo\bar\spam.foo > > becomes > > ///C|/foo/bar/spam.foo > """ > > is not really constructing a URL! Or anything else recognizable . > And the semantics of the function on multiple platforms do not seem > to me to be identical. On Windows it adds a bunch of leading slashes > and mac and Unix seem not to. So you can't safely paste a "file:" or > "file://" on the front. I don't know how widely pathname2url has been > used even though it is undocumented....should we fix it and document > it or write a new function? Maybe it's just time to write urllib3.py <0.8 wink>. no-conclusions-from-me-ly y'rs - tim From tim@digicool.com Thu Jun 7 06:16:37 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 7 Jun 2001 01:16:37 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: [M.-A. Lemburg] > Wasn't me for sure... even in the Unicode age, I believe that > Python source code should maintain readability by not allowing > all alpha(numeric) characters for use in identifiers (there are > lots of them in Unicode). > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week ). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class ). From fredrik@pythonware.com Thu Jun 7 06:50:35 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 7 Jun 2001 07:50:35 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-) From tim.one@home.com Thu Jun 7 07:15:35 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 02:15:35 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Message-ID: [/F] > I don't get it. If people use non-ascii characters, they're clearly not > using Python. from the language reference: My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>. > ... > either change the specification, and break every single tool written by > anyone who actually bothered to read the specification [1], or add a > warning to 2.2. This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here). > ... > 1) I assume the specification didn't exist when GvR wrote the first > CPython implementation ;-) Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms. From mal@lemburg.com Thu Jun 7 09:29:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:29:52 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <3B1F3B80.DB8F4117@lemburg.com> Paul Prescod wrote: > > Barry Scott wrote: > > > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > > and 'A'...'Z' ?! (same for digits) ?! > > > > If you embrace the world then NO. If America is you world then maybe. > > Actually, if we were really going to embrace the world we'd need to > handle more than a few European languages! I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Thu Jun 7 09:42:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:42:40 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1F3E80.F8CC16D7@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Wasn't me for sure... even in the Unicode age, I believe that > > Python source code should maintain readability by not allowing > > all alpha(numeric) characters for use in identifiers (there are > > lots of them in Unicode). > > > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. OTOH, nobody would come to > its defense with a hearty "whew! I'm so glad *that* hole finally got > plugged!". I'm sure it would cause less trouble to take away <> as an > alternative spelling of != (except that Barry is actually close enough to > strangle Guido a few days each week ). Is it worth the hassle? I > don't know, but I'd *guess* Guido would rather endure the complaints for > something more substantial (like, say, breaking 10 lines of an expert's > obscure code that relies on int() being a builtin instead of a class > ). Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas@xs4all.net Thu Jun 7 13:03:20 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 7 Jun 2001 14:03:20 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1F3E80.F8CC16D7@lemburg.com>; from mal@lemburg.com on Thu, Jun 07, 2001 at 10:42:40AM +0200 References: <3B1F3E80.F8CC16D7@lemburg.com> Message-ID: <20010607140320.Z690@xs4all.nl> On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote: > still, it's funny sometimes how pydevs are willing to break perfectly > valid code in some areas while not considering pointing users to clean up > invalid code in other areas. Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh@python.net Thu Jun 7 13:54:55 2001 From: mwh@python.net (Michael Hudson) Date: Thu, 7 Jun 2001 13:54:55 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-24 - 2001-06-07 Message-ID: This is a summary of traffic on the python-dev mailing list between May 24 and Jun 7 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the ninth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 305 50 | [|] | [|] | [|] | [|] 40 | [|] | [|] | [|] | [|] [|] [|] 30 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-018-014-011-014-020-019-034-035-032-014-008-020-051-015 Thu 24| Sat 26| Mon 28| Wed 30| Fri 01| Sun 03| Tue 05| Fri 25 Sun 27 Tue 29 Thu 31 Sat 02 Mon 04 Wed 06 Another busy-ish fortnight. I've been in Exam Hell(tm) and am writing this when hungover, this so summary might be a bit sketchier than normal. Apologies in advance. * strop vs. string * Greg Stein leapt up to defend the slated-to-be-deprecated strop module by pointing out that it's functions work on any object that supports the buffer API, whereas the 1.6-era string.py only works with objects that sprout the right methods: The discussion quickly degenerated into the usual griping about the fact that the buffer API is flawed and undocumented and not really well understood by many people. * Special-casing "O" * As a followup to the discussion mentioned in the last summary, Martin von Loewis posted a patch to sf enabling functions written in C that expect zero or one object arguments to dispense with the time wasting call to PyArg_ParseTuple: The first version of the patch was criticized for being overly general, and for not being general enough . It seems the forces of simplicity have won, but I don't think the patch has been checked in yet. * the late, unlamented, yearly list.append panic * Tim Peters posted c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). And then ameliorated the worst-case behaviour. So that one was easy. * making dicts ... * You might think that as dictionaries are so central to Python that their implementation would be bulletproof and one the areas of the source that would be least likely to change. This might be true *now*; Tim Peters seems to have spent most of the last fortnight implementing performance improvements one after the other and fixing core-dumping holes in the implementation pointed out by Michael Hudson. The first improvement was to "using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play." If you don't understand what that means, ignore it because Tim came up with a more radical rewrite: which seems to be a win, but sadly removes the shock of finding comments about Galois theory in dictobject.c... Most of the discussion in the thread following Tim's patch was about whether we need 128-bit floats or ints, which is another way of saying everyone liked it :-) This one hasn't been checked in either. * ... and breaking dicts * Inspired by a post to comp.lang.python by Wolfgang Lipp and driven slightly insane by revision, Michael Hudson posted a short program that used a hole in the dict implementation to trigger a core dump: This got fixed, so he did it again: The cause of both problems was C code assuming things about dictionaries remained the same across calls to code that ended up executing arbitrary Python code, which could mutate the dict exactly as much as it pleased, which in turn caused pointers to dangle. This problem has a history in Python; the .sort() method on lists has to fight the same issues. These holes have been plugged, although it is still possible to crash Python with exceptionally contrived code: There's another approach, which is was the .sort() method uses: >>> list = range(10) >>> def c(x,y): ... del list[:] ... return cmp(x, y) ... >>> list.sort(c) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in c TypeError: a list cannot be modified while it is being sorted The .sort() method magically changes the type of the list being sorted to one that doesn't support mutation while it's sorting the list. This approach would have some merit to use with dictionaries too; for one thing we could lose all the contrived code in dictobject.c protecting against this sort of silliness... * arbitrary radix formatting * Greg Wilson made a plea for the addition of a "%b" formatting operator to display integers in binary, e.g: >>> print "%d %x %o %b"%(10,10,10,10) 10 a 12 1010 There was general support for the idea, but Tim Peters and Greg Ewing pointed out that it would be neater to invent a general format code that would enable one to format an integer into an arbitrary base, so that >>>> int("1111", 7) 400 has an inverse at long last. But no-one could think of a spelling that wasn't in general use, and the discussion died :-(. * quick poll * Guido asked if anyone would object violently to the builtin conversion functions becoming type objects on the descr-branch: in analogy to class objects. There was general support and only a few concerns, and the changes have begun to hit descr-branch. I'm sure I'm not the only one who wishes they had the time to understand what is going on in there... Cheers, M. From gmcm@hypernet.com Thu Jun 7 14:06:55 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 7 Jun 2001 09:06:55 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: References: <3B1EB624.563DABE0@ActiveState.com> Message-ID: <3B1F442F.26920.1ECC32A9@localhost> [Tim & Paul on file URLs] [Tim] > But on Windows, urllib2.urlopen() throws up even on URLs like: > > file:///c:/bootlog.txt Curiously enough, url = "file:///" + urllib.quote_plus(fnm) seems to work on Windows. It even seems to work on mac, if you first turn '/' into '%2f', then undo the double quoting (turn '%252f' back into '%2f' in the ensuing url). It even seems to work on mac directory names with Unicode characters in them (though I haven't looked too closely, in fear of jinxing it). eye-of-newt-considered-helpful-ly y'rs - Gordon From Samuele Pedroni Thu Jun 7 14:56:30 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Thu, 7 Jun 2001 15:56:30 +0200 (MET DST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106071356.PAA04511@core.inf.ethz.ch> Hi. [GvR] > > Is the intent of using int and friends as constructors instead of just > > coercion functions that I should (eventually) be able to do this: > > > > class NonNegativeInt(int): > > def __init__(self, val): > > if int(val) < 0: > > raise ValueError, "Value must be >= 0" > > int.__init__(self, val) > > self.a = 47 > > ... > > > > ? > > Yes, sort-of. The details will be slightly different. I'm not > comfortable with letting a user-provided __init__() method change the > value of self, so I am brooding on a work-around that separates > allocation and one-time initialization from __init__(). Watch PEP > 253. jython already supports vaguely this: from types import IntType as Int class NonNegInt(Int): def __init__(self,val,annot=None): if int(val)<0: raise ValueError,"val<0" Int.__init__(self,val) self._annot = annot def neg(self): return -self def __add__(self,b): if type(b) is NonNegInt: return NonNegInt(Int.__add__(self,b)) return Int.__add__(self,b) def annot(self): return self._annot Jython 2.0 on java1.3.0 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> from NonNegInt import NonNegInt >>> x=NonNegInt(-2) Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 5, in __init__ ValueError: val<0 >>> x=NonNegInt(2) >>> y=NonNegInt(3,"foo") >>> y._annot Traceback (innermost last): File "", line 1, in ? AttributeError: 'int' object has no attribute '_annot' >>> y.annot() Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 15, in annot AttributeError: 'int' object has no attribute '_annot' >>> x+y, type(x+y) (5, ) >>> x.neg() -2 >>> x+(-2),type(x+(-2)) (0, ) >>> As one can see, the semantic is not without holes. The support for this is mainly a side-effect of the fact that internally jython objects are instances of java classes and jython allows to subclass java classes. I have no idea whether someone is already using this kind of stuff, I just remember that someone reported a bug concerning subclassing ListType so ... By the way int, long being types seems nice and elegant to me. A more general note FYI: I have read the PEP drafts about descrs and type as classes, I have not played with the descr-branch yet. I think that the descr and metaclasses stuff can help on jython side to put a lot of things (dealing with java classes, subclassing from them, etc) in a more precise framework polishing up many design aspects and the code. First I suppose that backward compatibility on the jython side is not a real problem, this aspects are so much under-documented that there are no promises about them. On the other hand until we start coding things on jython side (it's complex stuff and jython internals are already complex) it will be really difficult to make constructive comments on possible problems for jython, or toward a design that better fits both jython and CPython needs. Given that we are still working on jython 2.1, maybe we will be able to start working on jython 2.2 only late in 2.2 release cycle when things are somehow fixed and we can only do our best to re-implemnt them. regards Samuele Pedroni. From Greg.Wilson@baltimore.com Thu Jun 7 17:03:44 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Thu, 7 Jun 2001 12:03:44 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Prompted in part by the comment in Michael Hudson's python-dev summary about this discussion having died, I'd like to summarize: 1. Most people who commented felt that a base-2 format would be useful, if only for teaching and debugging. With regard to questions about byte order: A. Integer values are printed as base-2 numbers, so byte order is irrelevant. B. Floating-point numbers are printed as: [sign] [mantissa] [exponent] The mantissa and exponent are shown according to rule A. 2. Inventing a format for converting to arbitrary bases is dubious hypergeneralization (to borrow a phrase). 3. Implementation should mirror octal and hexadecimal support, e.g. a 'bin()' function to go with 'oct()' and 'hex()'. 4. The desirability or otherwise of a "%b" format specifier has nothing to do with the relative merits of any early microprocessor :-). If no-one has strong objections, I'll put together a PEP on this basis. Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From greg@cosc.canterbury.ac.nz Fri Jun 8 01:55:05 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Jun 2001 12:55:05 +1200 (NZST) Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Message-ID: <200106080055.MAA02711@s454.cosc.canterbury.ac.nz> Greg Wilson : [good stuff about binary format support] > If no-one has strong objections, I'll put together a > PEP on this basis. Sounds okay to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Fri Jun 8 02:39:53 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 21:39:53 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <20010607140320.Z690@xs4all.nl> Message-ID: [Thomas Wouters] > ... > I'm also not terribly worried about the use of non-ASCII characters in > identifiers in Python, though a warning for the next one or two releases > would be a good thing -- if anything, it should warn that that trick > won't work for people with different locale settings! Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first. From skip@pobox.com (Skip Montanaro) Fri Jun 8 15:51:27 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 8 Jun 2001 09:51:27 -0500 Subject: [Python-Dev] sys.modules["__main__"] in Jython Message-ID: <15136.58991.72069.433197@beluga.mojam.com> Would someone with Jython experience check to see if it interprets sys.modules["__main__"] in the same manner as Python? I'm interested to see if doctest's normal usage can be simplified slightly. The doctest documentation states: In normal use, end each module M with: def _test(): import doctest, M # replace M with your module's name return doctest.testmod(M) # ditto if __name__ == "__main__": _test() I'm wondering if this works for Jython as well as Python: def _test(): import doctest, sys return doctest.testmod(sys.modules["__main__"]) if __name__ == "__main__": _test() If so, then I think doctest.testmod's signature can be changed to def testmod(m=None, name=None, globs=None, verbose=None, isprivate=None, report=1): with the following extra code added to the start of the function: if m is None: import sys m = sys.modules["__main__"] That way the most common doctest usage can be changed to def _test(): import doctest return doctest.testmod() if __name__ == "__main__": _test() (I ran into a problem with a module that had initialization code that barfed if executed more than once.) Of course, these changes are ultimately Tim's decision. I'm just trying to knock down various potential hurdles. Thx, Skip From guido@digicool.com Fri Jun 8 17:06:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 08 Jun 2001 12:06:19 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: Your message of "Fri, 08 Jun 2001 12:01:37 EDT." References: Message-ID: <200106081606.f58G6Jj11829@odiug.digicool.com> > Prompted in part by the comment in Michael Hudson's > python-dev summary about this discussion having died, > I'd like to summarize: > > 1. Most people who commented felt that a base-2 format > would be useful, if only for teaching and debugging. > With regard to questions about byte order: > > A. Integer values are printed as base-2 numbers, so > byte order is irrelevant. > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > > The mantissa and exponent are shown according > to rule A. Why bother with floats at all? We can't print floats as hex either. If I were doing any kind of float-representation fiddling, I'd probably want to print it in hex anyway (I can read hex). But as I say, that's not for the general public. > 2. Inventing a format for converting to arbitrary > bases is dubious hypergeneralization (to borrow a > phrase). Agreed. > 3. Implementation should mirror octal and hexadecimal > support, e.g. a 'bin()' function to go with 'oct()' > and 'hex()'. > > 4. The desirability or otherwise of a "%b" format > specifier has nothing to do with the relative > merits of any early microprocessor :-). > > If no-one has strong objections, I'll put together a > PEP on this basis. Go for it. Or just submit a patch to SF -- this seems almost too small for a PEP to me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Fri Jun 8 17:10:50 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 8 Jun 2001 12:10:50 -0400 Subject: [Python-Dev] re: %b format (no, really) References: <200106081606.f58G6Jj11829@odiug.digicool.com> Message-ID: <15136.63754.927103.77358@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Go for it. Or just submit a patch to SF -- this seems almost GvR> too small for a PEP to me. :-) Since we all seem to agree, I'd agree. :) From Greg.Wilson@baltimore.com Fri Jun 8 17:14:14 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 12:14:14 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> > > Greg: > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > Guido: > Why bother with floats at all? For teaching purposes, which is what started me on this in the first place --- I would like an easy way to show people the bit patterns corresponding to basic types. > Guido: > Go for it. Or just submit a patch to SF -- this seems almost too > small for a PEP to me. :-) Thanks, Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr@snark.thyrsus.com Fri Jun 8 17:23:34 2001 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 12:23:34 -0400 Subject: [Python-Dev] Glowing endorsement of open source and Python Message-ID: <200106081623.f58GNYf22712@snark.thyrsus.com> It doesn't get much better than this: http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From mal@lemburg.com Fri Jun 8 18:08:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 08 Jun 2001 19:08:53 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <3B2106A5.FD16D95C@lemburg.com> "Eric S. Raymond" wrote: > > It doesn't get much better than this: > > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html I wonder what those MS Office XP ads are doing on that page... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 8 18:21:10 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:21:10 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> Message-ID: [Guido] > Why bother with floats at all? [Greg Wilson] > For teaching purposes, which is what started me on this > in the first place --- I would like an easy way to show > people the bit patterns corresponding to basic types. I'm confused by this: while for integers the bits correspond very clearly to what's stored in the machine, if you separate the mantissa and exponent for floats the result won't "look like" the storage at all. Please give an example first, like what do you intend to produce for print "%b" % 0.1 print "%b" % -42e300 ? You have to make decisions about whether or not to unbias the exponent for display (if you don't, it's incomprehensible; if you do, it's not really what's stored); whether or not to materialize the implicit most-significant mantissa bit in 754 normalized values (pretty much ditto); and what to do about Infs, NaNs, signed zeroes and denormal numbers. The kicker is that, to be truly useful for teaching floats, you need a way to select among all combinations of "yes" and "no" for each such decision. A single fixed set of answers will confound more than clarify; e.g., it's important to know what the "true exponent" is, but also to know what biased exponents look like inside the box. This is too much for %b -- write a float-format module instead. From Greg.Wilson@baltimore.com Fri Jun 8 18:34:13 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 13:34:13 -0400 Subject: [Python-Dev] RE: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> > [Guido] > > Why bother with floats at all? > > [Greg Wilson] > > For teaching purposes > [Tim Peters] > if you separate the mantissa and exponent > for floats the result won't "look like" the storage at all. > Please give an example first This is part of what was going to go into the PEP, along with what to do about character data (I've had a couple of emails from people who'd like to be able to look at 8-bit and Unicode characters as bit patterns). > This is too much for %b -- write a float-format module instead. How about a quick patch to do "%b" for int and long-int, and a PEP for a generic "format" module --- arbitrary radix, options for IEEE numbers, etc.? Any objections? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr@thyrsus.com Fri Jun 8 18:44:40 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 13:44:40 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Fri, Jun 08, 2001 at 01:34:13PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: <20010608134440.A23160@thyrsus.com> Greg Wilson : > How about a quick patch to do "%b" for int and long-int, and a > PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? I like it. -- Eric S. Raymond The people cannot delegate to government the power to do anything which would be unlawful for them to do themselves. -- John Locke, "A Treatise Concerning Civil Government" From tim.one@home.com Fri Jun 8 18:51:50 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:51:50 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > How about a quick patch to do "%b" for int and long-int, Don't know how quick it will be (it should cover type slots and bin() and __bin__ and 0b1101 notation too, right?), but +1 from me. That much is routinely requested. > and a PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? None here. From bckfnn@worldonline.dk Fri Jun 8 20:15:14 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 08 Jun 2001 19:15:14 GMT Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <15136.58991.72069.433197@beluga.mojam.com> References: <15136.58991.72069.433197@beluga.mojam.com> Message-ID: <3b212431.21754982@smtp.worldonline.dk> [Skip] >Would someone with Jython experience check to see if it interprets >sys.modules["__main__"] in the same manner as Python? To me it seems like Jython defines sys.modules["__main__"] in the same way as CPython. >I'm wondering if this works for Jython as well as Python: > > def _test(): > import doctest, sys > return doctest.testmod(sys.modules["__main__"]) > > if __name__ == "__main__": > _test() It works for Jython. regards, finn From thomas@xs4all.net Fri Jun 8 22:41:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 8 Jun 2001 23:41:02 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python In-Reply-To: <200106081623.f58GNYf22712@snark.thyrsus.com>; from esr@snark.thyrsus.com on Fri, Jun 08, 2001 at 12:23:34PM -0400 References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <20010608234102.B690@xs4all.nl> On Fri, Jun 08, 2001 at 12:23:34PM -0400, Eric S. Raymond wrote: > It doesn't get much better than this: > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html It's a nice (and very flattering!) piece, but it's a tad buzzword heavy. "[Python] supports XML for e-commerce and mobile applications" ? Well, shit, so *that*'s what XML is for :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Fri Jun 8 23:02:06 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 18:02:06 -0400 Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <3b212431.21754982@smtp.worldonline.dk> Message-ID: [Finn Bock] > To me it seems like Jython defines sys.modules["__main__"] in the same > way as CPython. Thank you, Finn! doctest has always avoided introspection tricks for which Jython doesn't work "exactly the same way" as CPython. However, in the past it achieved this by not paying any attention , then ripping out bad ideas when a Jython user reported failure. But now that it's in the std library, I want to proceed more carefully. Skip's idea is much more attractive now that you've confirmed it will work there too. From tim.one@home.com Sun Jun 10 02:10:53 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 9 Jun 2001 21:10:53 -0400 Subject: [Python-Dev] Struct schizophrenia Message-ID: I'm adding "long long" integral types to struct (in native mode, "long long" or __int64 on platforms that have them; in standard mode, 64 bits). This is proving harder than it should be, because the code that's already there is schizophrenic across boundaries, so is failing as a base to build on (raises more questions than it answers). Like: >>> x = 256 >>> struct.pack("b", x) # complains about magnitude in native mode Traceback (most recent call last): File "", line 1, in ? struct.error: byte format requires -128<=number<=127 >>> struct.pack("=b", x) # but doesn't with native order + std align '\x00' >>> struct.pack(">> struct.pack(">> struct.pack("", line 1, in ? OverflowError: long int too large to convert >>> Much the same is true of other small int sizes: you can't predict what will happen without trying it; and once you get to ints, no range-checking is performed even in native mode. Surely this can't stand, but what do people *want*? My preference is to raise the same "byte format requires -128<=number<=127" exception in all these cases; OTOH, the code structure fights that, working with Python longs is clumsy in C, and there are other "undocumented features" here that may or may not be accidents: >>> struct.pack("B", 234.3) '\xea' >>> That is, did we *intend* to accept floats packed via integer typecodes? Feature or bug? In the other (unpack) direction, the docs say for 'I' (unsigned int): The "I" conversion code will convert to a Python long if the C int is the same size as a C long, which is typical on most modern systems. If a C int is smaller than a C long, an Python integer will be created instead. That's in a footnote. In another part, they say: For the "I" and "L" format characters, the return value is a Python long integer. The footnote is wrong -- but is the footnote what was intended (somebody went to a fair bit of work to write all the stuff )? From tim.one@home.com Sun Jun 10 05:25:51 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 10 Jun 2001 00:25:51 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb Message-ID: Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its extension language. but-then-what-doesn't-ly y'rs - tim -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Skip Montanaro Sent: Saturday, June 09, 2001 12:31 AM To: python-list@python.org Subject: printing Python stack info from gdb >From time to time I've wanted to be able to print the Python stack from gdb. Today I broke down and spent some time actually implementing something. set $__trimpath = 1 define ppystack set $__fr = 0 select-frame $__fr while !($pc > Py_Main && $pc < Py_GetArgcArgv) if $pc > eval_code2 && $pc < set_exc_info set $__fn = PyString_AsString(co->co_filename) set $__n = PyString_AsString(co->co_name) if $__n[0] == '?' set $__n = "" end if $__trimpath set $__f = strrchr($__fn, '/') if $__f set $__fn = $__f + 1 end end printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n end set $__fr = $__fr + 1 select-frame $__fr end select-frame 0 end Output looks like this (and dribbles out *quite slowly*): Text_Editor.py (147): apply_tag Text_Editor.py (152): apply_tag_by_name Script_GUI.py (302): push_help Script_GUI.py (113): put_help Script_GUI.py (119): focus_enter Signal.py (34): handle_signal Script_GUI.py (324): main Script_GUI.py (338): If you don't want to trim the paths from the filenames, set $__trimpath to 0. Warning: I've only tried this with a very recent CVS version of Python on a PIII-based Linux system with an interpreter compiled using gcc. I rely on the ordering of functions within the while loop to detect when to exit the loop and when the frame I'm examining is an eval_code2 frame. I'm sure there are plenty of people out there with more gdb experience than me. I welcome any feedback on ways to improve this little bit of code. -- Skip Montanaro (skip@pobox.com) (847)971-7098 -- http://mail.python.org/mailman/listinfo/python-list From tim.one@home.com Sun Jun 10 20:36:50 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 10 Jun 2001 15:36:50 -0400 Subject: [Python-Dev] FW: list-display semantics? Message-ID: I opened a bug on this: If anyone's keen to play with the grammar, have at it! Everyone at PythonLabs would +1 it. -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of jainweiwu Sent: Sunday, June 10, 2001 2:30 PM To: python-list@python.org Subject: list-display semantics? Hi all: I tried the one-line command in a interaction mode: [x for x in [1, 2, 3], y for y in [4, 5, 6]] and the result surprised me, that is: [[1,2,3],[1,2,3],[1,2,3],9,9,9] Who can explain the behavior? Since I expected the result should be: [[1,4],[1,5],[1,6],[2,4],...] -- Pary All Rough Yet. parywu@seed.net.tw -- http://mail.python.org/mailman/listinfo/python-list From dan@cgsoftware.com Sun Jun 10 21:30:24 2001 From: dan@cgsoftware.com (Daniel Berlin) Date: 10 Jun 2001 16:30:24 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb In-Reply-To: ("Tim Peters"'s message of "Sun, 10 Jun 2001 00:25:51 -0400") References: Message-ID: <87n17grsbj.fsf@cgsoftware.com> "Tim Peters" writes: > Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next > time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its > extension language. HP has patches to do this, actually. Works quite nicely. And trust me, i've tried to get them to do it more than once. As I pointed out to skip, if he can profile gdb and tell me where the slowness is, it's likely I can make it a ton faster. GDB could use major optimizations almost everywhere. And i've done quite a lot of them, they just haven't been reviewed/integrated yet. --Dan C++ support maintainer - GDB DWARF2 reader person - GDB Symbol table patch submitting weirdo - GDB etc > > but-then-what-doesn't-ly y'rs - tim > > -----Original Message----- > From: python-list-admin@python.org > [mailto:python-list-admin@python.org]On Behalf Of Skip Montanaro > Sent: Saturday, June 09, 2001 12:31 AM > To: python-list@python.org > Subject: printing Python stack info from gdb > > >>From time to time I've wanted to be able to print the Python stack from gdb. > Today I broke down and spent some time actually implementing something. > > set $__trimpath = 1 > define ppystack > set $__fr = 0 > select-frame $__fr > while !($pc > Py_Main && $pc < Py_GetArgcArgv) > if $pc > eval_code2 && $pc < set_exc_info > set $__fn = PyString_AsString(co->co_filename) > set $__n = PyString_AsString(co->co_name) > if $__n[0] == '?' > set $__n = "" > end > if $__trimpath > set $__f = strrchr($__fn, '/') > if $__f > set $__fn = $__f + 1 > end > end > printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n > end > set $__fr = $__fr + 1 > select-frame $__fr > end > select-frame 0 > end > > Output looks like this (and dribbles out *quite slowly*): > > Text_Editor.py (147): apply_tag > Text_Editor.py (152): apply_tag_by_name > Script_GUI.py (302): push_help > Script_GUI.py (113): put_help > Script_GUI.py (119): focus_enter > Signal.py (34): handle_signal > Script_GUI.py (324): main > Script_GUI.py (338): > > If you don't want to trim the paths from the filenames, set $__trimpath to > 0. > > Warning: I've only tried this with a very recent CVS version of Python on a > PIII-based Linux system with an interpreter compiled using gcc. I rely on > the ordering of functions within the while loop to detect when to exit the > loop and when the frame I'm examining is an eval_code2 frame. I'm sure > there are plenty of people out there with more gdb experience than me. I > welcome any feedback on ways to improve this little bit of code. > > -- > Skip Montanaro (skip@pobox.com) > (847)971-7098 > > -- > http://mail.python.org/mailman/listinfo/python-list > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- "I saw a man with a wooden leg, and a real foot. "-Steven Wright From greg@cosc.canterbury.ac.nz Mon Jun 11 03:44:54 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 11 Jun 2001 14:44:54 +1200 (NZST) Subject: [Python-Dev] FW: list-display semantics? In-Reply-To: Message-ID: <200106110244.OAA03090@s454.cosc.canterbury.ac.nz> parywu@seed.net.tw: > [x for x in [1, 2, 3], y for y in [4, 5, 6]] > and the result surprised me, that is: > [[1,2,3],[1,2,3],[1,2,3],9,9,9] Did you by any chance execute that in an environment where y was previously bound to 9? It will be parsed as [x for x in ([1, 2, 3], y) for y in [4, 5, 6]] which should give a NameError if y is previously unbound, since it will try to evaluate ([1, 2, 3], y) before y is bound by the inner loop. But executing y = 9 beforehand will give the results you got. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From gstein@lyra.org Mon Jun 11 12:31:59 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 11 Jun 2001 04:31:59 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jun 06, 2001 at 07:34:15AM -0700 References: Message-ID: <20010611043158.E26210@lyra.org> On Wed, Jun 06, 2001 at 07:34:15AM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv17474 > > Modified Files: > Tag: descr-branch > object.c > Log Message: > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > where __dict__ is stored in an object. The simplest case is to add > tp_dictoffset to the start of the object, but there are comlications: > tp_flags may tell us that tp_dictoffset is not defined, or the offset > may be negative: indexing from the end of the object, where > tp_itemsize may have to be taken into account. Why would you ever have a negative size in there? That seems like an unnecessary "feature". The offsets are easily set up by the compiler as positive values. (not even sure how you'd come up with a proper/valid negative value) Cheers, -g > > > Index: object.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v > retrieving revision 2.124.4.11 > retrieving revision 2.124.4.12 > diff -C2 -r2.124.4.11 -r2.124.4.12 > *** object.c 2001/06/06 14:27:54 2.124.4.11 > --- object.c 2001/06/06 14:34:13 2.124.4.12 > *************** > *** 1074,1077 **** > --- 1074,1111 ---- > } > > + /* Helper to get a pointer to an object's __dict__ slot, if any */ > + > + PyObject ** > + _PyObject_GetDictPtr(PyObject *obj) > + { > + #define PTRSIZE (sizeof(PyObject *)) > + > + long dictoffset; > + PyTypeObject *tp = obj->ob_type; > + > + if (!(tp->tp_flags & Py_TPFLAGS_HAVE_CLASS)) > + return NULL; > + dictoffset = tp->tp_dictoffset; > + if (dictoffset == 0) > + return NULL; > + if (dictoffset < 0) { > + dictoffset += tp->tp_basicsize; > + assert(dictoffset > 0); /* Sanity check */ > + if (tp->tp_itemsize > 0) { > + int n = ((PyVarObject *)obj)->ob_size; > + if (n > 0) { > + dictoffset += tp->tp_itemsize * n; > + /* Round up, if necessary */ > + if (tp->tp_itemsize % PTRSIZE != 0) { > + dictoffset += PTRSIZE - 1; > + dictoffset /= PTRSIZE; > + dictoffset *= PTRSIZE; > + } > + } > + } > + } > + return (PyObject **) ((char *)obj + dictoffset); > + } > + > /* Generic GetAttr functions - put these in your tp_[gs]etattro slot */ > > *************** > *** 1082,1086 **** > PyObject *descr; > descrgetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1116,1120 ---- > PyObject *descr; > descrgetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1097,1103 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject *dict = * (PyObject **) ((char *)obj + dictoffset); > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > --- 1131,1137 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > ! PyObject *dict = *dictptr; > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > *************** > *** 1129,1133 **** > PyObject *descr; > descrsetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1163,1167 ---- > PyObject *descr; > descrsetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1143,1149 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject **dictptr = (PyObject **) ((char *)obj + dictoffset); > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > --- 1177,1182 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Greg Stein, http://www.lyra.org/ From guido@digicool.com Mon Jun 11 13:57:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 08:57:18 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: Your message of "Mon, 11 Jun 2001 04:31:59 PDT." <20010611043158.E26210@lyra.org> References: <20010611043158.E26210@lyra.org> Message-ID: <200106111257.IAA03505@cj20424-a.reston1.va.home.com> > > Modified Files: > > Tag: descr-branch > > object.c > > Log Message: > > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > > where __dict__ is stored in an object. The simplest case is to add > > tp_dictoffset to the start of the object, but there are comlications: > > tp_flags may tell us that tp_dictoffset is not defined, or the offset > > may be negative: indexing from the end of the object, where > > tp_itemsize may have to be taken into account. > > Why would you ever have a negative size in there? That seems like an > unnecessary "feature". The offsets are easily set up by the compiler as > positive values. (not even sure how you'd come up with a proper/valid > negative value) When extending a type like tuple or string, the __dict__ has to be added to the end, after the last item, because we can't change the starting offset of the first item. This is not at a fixed offset from the start of the structure. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Jun 11 17:50:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:50:11 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <3B24F6C3.C911C0BF@lemburg.com> I would like to add a .decode() method to Unicode objects and also enable the builtin unicode() to accept Unicode object as input. The .decode() method will work just like the .encode() method except that it interfaces to the decode API of the codec in question. While this may seem useless for the currently available encodings, it does have some use for codecs which recode Unicode to Unicode, e.g. codecs which do XML escaping or Unicode compression. Any objections ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jun 11 17:57:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:57:12 +0200 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <3B24F868.A3DFA649@lemburg.com> Tamito KAJIYAMA recently announced that he changed the licenses on his Japanese codecs from GPL to a BSD variant. This is great news since this would allow adding the codecs to the Python core which would certainly attract more users to Python in Asia. The codecs are available at: http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ The codecs are 280kB when compressed as .tar.gz file. Thoughts ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From aahz@rahul.net Mon Jun 11 18:42:30 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 11 Jun 2001 10:42:30 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B24F868.A3DFA649@lemburg.com> from "M.-A. Lemburg" at Jun 11, 2001 06:57:12 PM Message-ID: <20010611174230.0625E99C8D@waltz.rahul.net> M.-A. Lemburg wrote: > > Tamito KAJIYAMA recently announced that he changed the licenses > on his Japanese codecs from GPL to a BSD variant. This is great > news since this would allow adding the codecs to the Python core > which would certainly attract more users to Python in Asia. > > The codecs are 280kB when compressed as .tar.gz file. +0 I like the idea, am uncomfortable with that amount of space. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From fdrake@cj42289-a.reston1.va.home.com Mon Jun 11 20:15:06 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 11 Jun 2001 15:15:06 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Substantial additional material on floating point arithmetic in the tutorial, written by Tim Peters to explain why FP can fail to reflect the decimal world presented to the user. Lots of additional updates and corrections. From guido@digicool.com Mon Jun 11 21:07:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 16:07:40 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline Message-ID: <200106112007.f5BK7eW22506@odiug.digicool.com> Please comment on the following. This came up a while ago in python-dev and I decided to follow through. I'm making this a PEP because of the risk of breaking code (which everybody on Python-dev seemed to think was acceptable). --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 259 Title: Omit printing newline after newline Version: $Revision: 1.1 $ Author: guido@python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 11-Jun-2001 Post-History: 11-Jun-2001 Abstract Currently, the print statement always appends a newline, unless a trailing comma is used. This means that if we want to print data that already ends in a newline, we get two newlines, unless special precautions are taken. I propose to skip printing the newline when it follows a newline that came from data. In order to avoid having to add yet another magic variable to file objects, I propose to give the existing 'softspace' variable an extra meaning: a negative value will mean "the last data written ended in a newline so no space *or* newline is required." Problem When printing data that resembles the lines read from a file using a simple loop, double-spacing occurs unless special care is taken: >>> for line in open("/etc/passwd").readlines(): ... print line ... root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin: daemon:x:2:2:daemon:/sbin: (etc.) >>> While there are easy work-arounds, this is often noticed only during testing and requires an extra edit-test roundtrip; the fixed code is uglier and harder to maintain. Proposed Solution In the PRINT_ITEM opcode in ceval.c, when a string object is printed, a check is already made that looks at the last character of that string. Currently, if that last character is a whitespace character other than space, the softspace flag is reset to zero; this suppresses the space between two items if the first item is a string ending in newline, tab, etc. (but not when it ends in a space). Otherwise the softspace flag is set to one. The proposal changes this test slightly so that softspace is set to: -1 -- if the last object written is a string ending in a newline 0 -- if the last object written is a string ending in a whitespace character that's neither space nor newline 1 -- in all other cases (including the case when the last object written is an empty string or not a string) Then, the PRINT_NEWLINE opcode, printing of the newline is suppressed if the value of softspace is negative; in any case the softspace flag is reset to zero. Scope This only affects printing of 8-bit strings. It doesn't affect Unicode, although that could be considered a bug in the Unicode implementation. It doesn't affect other objects whose string representation happens to end in a newline character. Risks This change breaks some existing code. For example: print "Subject: PEP 259\n" print message_body In current Python, this produces a blank line separating the subject from the message body; with the proposed change, the body begins immediately below the subject. This is not very robust code anyway; it is better written as print "Subject: PEP 259" print print message_body In the test suite, only test_StringIO (which explicitly tests for this feature) breaks. Implementation A patch relative to current CVS is here: http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From BPettersen@NAREX.com Mon Jun 11 21:20:38 2001 From: BPettersen@NAREX.com (Bjorn Pettersen) Date: Mon, 11 Jun 2001 14:20:38 -0600 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <6957F6A694B49A4096F7CFD0D900042F27D452@admin56.narex.com> > From: Guido van Rossum [mailto:guido@digicool.com] > > Subject: PEP 259: Omit printing newline after newline This would probably break most of the cgi scripts I did at my last job without giving any useful error message. But then again... why should I care ? -- bjorn From skip@pobox.com (Skip Montanaro) Mon Jun 11 21:20:33 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 11 Jun 2001 15:20:33 -0500 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> References: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> Message-ID: <15141.10257.487549.196538@beluga.mojam.com> Fred> Substantial additional material on floating point arithmetic in Fred> the tutorial, written by Tim Peters to explain why FP can fail to Fred> reflect the decimal world presented to the user. I took a quick look at that appendix. One thing that confused me a bit was that if 0.1 is approximated by something ever-so-slightly larger than 0.1, how is it that if you add ten of them together you wind up with a result that is ever-so-slightly less than 1.0? I didn't expect it to be exactly 1.0. Other floating point naifs may be confused in the same way: >>> "%.55f" % 0.5 '0.5000000000000000000000000000000000000000000000000000000' >>> "%.55f" % 0.1 '0.1000000000000000055511151231257827021181583404541015625' >>> "%.55f" % (0.5+0.1) '0.5999999999999999777955395074968691915273666381835937500' I guess the explanation is that not only can't most decimals be represented exactly, but that summing the same approximation multiple times doesn't always skew the error in the same direction either: >>> "%.55f" % (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1) '0.7999999999999999333866185224906075745820999145507812500' >>> "%.55f" % (0.8) '0.8000000000000000444089209850062616169452667236328125000' IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, Skip From mal@lemburg.com Mon Jun 11 21:55:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 22:55:13 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <3B253031.AB1954CB@lemburg.com> Guido van Rossum wrote: > > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 259 > Title: Omit printing newline after newline > ... > Scope > > This only affects printing of 8-bit strings. It doesn't affect > Unicode, although that could be considered a bug in the Unicode > implementation. It doesn't affect other objects whose string > representation happens to end in a newline character. I guess I should fix the Unicode stuff ;-) > Risks > > This change breaks some existing code. For example: > > print "Subject: PEP 259\n" > print message_body > > In current Python, this produces a blank line separating the > subject from the message body; with the proposed change, the body > begins immediately below the subject. This is not very robust > code anyway; it is better written as > > print "Subject: PEP 259" > print > print message_body > > In the test suite, only test_StringIO (which explicitly tests for > this feature) breaks. Hmm, I think the above is a very typical idiom for RFC822 style content and used in CGI scripts a lot. I'm not sure whether this change is worth getting the CGI crowd upset... Wouldn't it make sense to only use this technique in inter- active mode ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Mon Jun 11 23:00:54 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 00:00:54 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> > I would like to add a .decode() method to Unicode objects and also > enable the builtin unicode() to accept Unicode object as input. -1. What is this good for? > While this may seem useless for the currently available encodings, > it does have some use for codecs which recode Unicode to Unicode, > e.g. codecs which do XML escaping or Unicode compression. I still can see the value. If you think the codec API is good for such transformation, why not use it? I.e. enc,dec,_,_ = codecs.lookup("compress-form-foo") s = dec(s) Furthermore, this seems like a form of hypergeneralization. If you have this, why not also add s = s.decode("capitalize") # instead of s.capitalize() i = s.decode("int") # instead of int(s) > Any objections ? Yes, I think this should not be added. Regards, Martin From paulp@ActiveState.com Tue Jun 12 00:38:55 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 11 Jun 2001 16:38:55 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25568F.B766E00D@ActiveState.com> "Martin v. Loewis" wrote: > >... > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) IMO, there is a huge usability difference between the above and mystr.decode("base64"). I think that we've done a good job of providing better ways to get at codecs than the codecs.lookup function. I don't see how this is any different. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Tue Jun 12 00:51:55 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 11:51:55 +1200 (NZST) Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: <200106112351.LAA03197@s454.cosc.canterbury.ac.nz> Skip Montanaro : > One thing that confused me a bit was > that if 0.1 is approximated by something ever-so-slightly larger than 0.1, > how is it that if you add ten of them together you wind up with a result > that is ever-so-slightly less than 1.0? I think what's happening is that the exact binary result of adding 0.1_plus_a_little to itself has one more bit than there is room for, so it gets shifted right and one bit falls off the end. The amount you lose when that happens a few times ends up outweighing the extra that you would expect. Whether it's worth trying to explain *that* in the tutorial I don't know! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Jun 12 01:00:33 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 12:00:33 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Guido: > I propose to skip printing the newline when it follows a newline > that came from data. -1 There's too much magic in the way print handles spaces and newlines already. Making it even more magical and inconsistent seems like exactly the wrong direction to be going in. If there are to be any changes to the way print works, I would prefer to see one that removes the need for the softspace flag altogether. The behaviour of a given print should not depend on state left behind by some previous one. Neither should it depend on whether the characters being printed come directly from a string or not. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Tue Jun 12 03:17:24 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 11 Jun 2001 22:17:24 -0400 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: [Skip Montanaro, on the in-progess 2.2 Tutorial appendix] > I took a quick look at that appendix. One thing that confused me > a bit was that if 0.1 is approximated by something ever-so-slightly > larger than 0.1, how is it that if you add ten of them together you > wind up with a result that is ever-so-slightly less than 1.0? Good for you, Skip! In all the years I've been explaining this stuff, I only recall one other picking up on that immediately. I'm not writing a book here, though , and any intro numeric programming text emphasizes that n*x is a better bet than adding x together n times. >>> .1 * 10 1.0 >>> Greg Ewing put you on the right track, if you want to figure it out yourself (as Deep Throat said, "follow the bits, Skip -- follow the bits"). > I didn't expect it to be exactly 1.0. Other floating point naifs > may be confused in the same way: > > >>> "%.55f" % 0.5 > '0.5000000000000000000000000000000000000000000000000000000' > >>> "%.55f" % 0.1 > '0.1000000000000000055511151231257827021181583404541015625' > >>> "%.55f" % (0.5+0.1) > '0.5999999999999999777955395074968691915273666381835937500' Note that this output is platform-dependent. For example, the last on Windows is >>> "%.55f" % (0.5+0.1) '0.5999999999999999800000000000000000000000000000000000000' > ... > IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, All computer arithmetic is; and among binary fp systems, 754 has got to be the best-behaved there is. Know how many irksome bugs I've fixed in Python mucking with different sizes of integers across platforms, and what C does and doesn't guarantee about them? About 20x more than fp bugs. Of course there's 10000x as much integer code in Python too . god-created-the-integers-from-1-through-3-inclusive-and-that's-it-ly y'rs - tim From barry@digicool.com Tue Jun 12 04:00:52 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 11 Jun 2001 23:00:52 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Message-ID: <15141.34276.191510.708654@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> There's too much magic in the way print handles spaces and GE> newlines already. Making it even more magical and inconsistent GE> seems like exactly the wrong direction to be going in. I tend to agree. I'm sometimes bitten by the double newlines, but as I think Andrew brought up in c.l.py, I'd rather see a way to tell readlines() to strip the newlines than to add more magic to print. print-has-all-the-magic-it-needs-now-<>-ly y'rs, -Barry From fredrik@pythonware.com Tue Jun 12 07:21:55 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 08:21:55 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> guido wrote: > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). when was this discussed on python-dev? From mal@lemburg.com Tue Jun 12 08:09:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:09:05 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25C011.125B6462@lemburg.com> "Martin v. Loewis" wrote: > > > I would like to add a .decode() method to Unicode objects and also > > enable the builtin unicode() to accept Unicode object as input. > > -1. What is this good for? See below :) > > While this may seem useless for the currently available encodings, > > it does have some use for codecs which recode Unicode to Unicode, > > e.g. codecs which do XML escaping or Unicode compression. > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) Sure and that's the point. I would like to add the .decode() method to make this just as simple as encoding Unicode to UTF-8. Note that strings already have this method: str.encode() str.decode() uni.encode() #uni.decode() # still missing > Furthermore, this seems like a form of hypergeneralization. If you > have this, why not also add > > s = s.decode("capitalize") # instead of s.capitalize() > i = s.decode("int") # instead of int(s) No, that's not the intention. One very useful application for this method is XML unescaping which turns numeric XML entities into Unicode chars. Others are Unicode decompression (using the Unicode compression algorithm) and certain forms of Unicode normalization. The key argument for these interfaces is that they provide an extensible transformation mechanism for string and binary data. > > Any objections ? > > Yes, I think this should not be added. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Tue Jun 12 08:29:02 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 12 Jun 2001 03:29:02 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: [/F] > when was this discussed on python-dev? It wasn't -- it actually came up on one of the SourceForge mailing lists ... ah, of course, tried to search but "Geocrawler is down for nightly database maintenance". They sure have long nights . I'm guessing it's the python-iterators list. It spun off of a thread where Guido was wondering whether one of the new ways to spell "iterate over a file" should return lines without trailing \n, so that e.g. for line in sys.stdin: print line wasn't a surprise. I opined it would be better to make all ways of iterating a file do the same thing, but change print instead. We both agreed that couldn't happen. But then I couldn't find any code it would break, only code of the form print line, where the "," was trying to suppress the extra newline, and that would continue to work the same way even if print were changed. The notion that legions of people are using print line as an obscure way to get double-spacing is taking me by surprise. Nobody on the iterators list had this objection. win-some-lose-some-lose-some-lose-some-lose-some-ly y'rs - tim From mal@lemburg.com Tue Jun 12 08:35:08 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:35:08 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010611174230.0625E99C8D@waltz.rahul.net> Message-ID: <3B25C62C.969B40B3@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > > > Tamito KAJIYAMA recently announced that he changed the licenses > > on his Japanese codecs from GPL to a BSD variant. This is great > > news since this would allow adding the codecs to the Python core > > which would certainly attract more users to Python in Asia. > > > > The codecs are 280kB when compressed as .tar.gz file. > > +0 > > I like the idea, am uncomfortable with that amount of space. Tamito corrected me about the size (his file includes the .pyc byte code files): the correct size for the sources is 143kB -- almost half of what I initially wrote. If that should still be too much, there are probably some ways to further compress the size of the mapping tables which could be investigated. PS: Tamito is very thrilled about getting his codecs into the core and I am quite certain that he is also prepared to maintain them (I have put him on CC). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim@digicool.com Tue Jun 12 08:37:55 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 12 Jun 2001 03:37:55 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Include longobject.h,2.19,2.20 In-Reply-To: <3B25C116.3E65A32D@lemburg.com> Message-ID: [M.-A. Lemburg] > I have tried to compile longobject.c/h on a HP-UX box and am getting > warnings about MIN/MAX being redefined. Perhaps you should add > an #undef for these before the #define ?! I changed nothing relevant here. Are you certain this is a new problem? The MIN/MAX macros have been in longobject.c for a long time, and I didn't touch them. In any case, I'm not inclined to fiddle things on a box where I can't see a problem so can't know whether I'm fixing it or just creating new problems. If you can figure out why it's happening on that box, and it's a legit problem there, feel free to fix it. From SBrunning@trisystems.co.uk Tue Jun 12 09:25:19 2001 From: SBrunning@trisystems.co.uk (Simon Brunning) Date: Tue, 12 Jun 2001 09:25:19 +0100 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <31575A892FF6D1118F5800600846864D78BD25@intrepid> > From: Guido van Rossum [SMTP:guido@digicool.com] > In order to avoid having to add yet another magic variable to file > objects, I propose to give the existing 'softspace' variable an > extra meaning: a negative value will mean "the last data written > ended in a newline so no space *or* newline is required." Better another magic variable than a magic value for an old one, I think. Cheers, Simon Brunning TriSystems Ltd. sbrunning@trisystems.co.uk ----------------------------------------------------------------------- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. TriSystems Ltd. cannot accept liability for statements made which are clearly the senders own. From thomas@xs4all.net Tue Jun 12 09:33:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 10:33:30 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: ; from tim.one@home.com on Tue, Jun 12, 2001 at 03:29:02AM -0400 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <20010612103330.D690@xs4all.nl> On Tue, Jun 12, 2001 at 03:29:02AM -0400, Tim Peters wrote: > [/F] > > when was this discussed on python-dev? > It wasn't -- it actually came up on one of the SourceForge mailing lists ... > I'm guessing it's the python-iterators list. I'm guessing the same thing, because I *did* see the proposal somewhere. I recall thinking 'that might work' but not much else, anyway. > The notion that legions of people are using > print line > as an obscure way to get double-spacing is taking me by surprise. Bah, humbug! (And you can quote me on that.) Backward compatibility is not an issue -- that's why we have future-imports and warning mechanisms. Import smart-print from future to get the new behaviour, and warn whenever print *would* *have* printed one newline less otherwise. Regardless, I'm -1 on this change. Not because of backward compatibility problem, but because of what GregE said. Let's not make print even more magically unpredictably confusing than it already is, with comma's that do something magical, softspace to control that magic, and shifting the print operator to the right :-) Why can't we use for line in file: print line, to print all lines in a file ? Softspace doesn't seem to add a space (though I had to write a testcase to make sure ;) and 'explicit is better than implicit'. I'd also prefer special syntax to control the softspace behaviour, like say: print "spam:", "ham" : "and" : "eggs" to print 'spamandeggs' without a space inbetween. Too late for that, I 'spose :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 10:42:52 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 11:42:52 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: "mal@lemburg.com"'s message of Tue, 12 Jun 2001 09:09:05 +0200 Message-ID: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> > str.encode() > str.decode() > uni.encode() > #uni.decode() # still missing It's not missing. str.decode and uni.encode go through a single codec; that's easy. str.encode is somewhat more confusing, because it really is unicode(str).encode. Now, you are not proposing that uni.decode is str(uni).decode, are you? If not that, what else would it mean? And if it means something else, it is clearly not symmetric to str.encode, so it is not "missing". > One very useful application for this method is XML unescaping > which turns numeric XML entities into Unicode chars. Ok. Please show me how that would work. More precisely, please write a PEP describing the rationale for this feature, including use case examples and precise semantics of the proposed addition. > The key argument for these interfaces is that they provide > an extensible transformation mechanism for string and binary > data. That is too general for me to understand; I need to see detailed examples that solve real-world problems. Regards, Martin P.S. I don't think that unescaping XML characters entities into Unicode characters is a useful application in itself. This is normally done by the XML parser, which not only has to deal with character entities, but also with general entities and a lot of other markup. Very few people write XML parsers, and they are using the string methods and the sre module successfully (if the parser is written in Python - a C parser would do the unescaping before even passing the text to Python). From thomas@xs4all.net Tue Jun 12 11:02:03 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 12:02:03 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl>; from thomas@xs4all.net on Tue, Jun 12, 2001 at 10:33:30AM +0200 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> Message-ID: <20010612120203.E690@xs4all.nl> On Tue, Jun 12, 2001 at 10:33:30AM +0200, Thomas Wouters wrote: > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. Err. I meant "hamandeggs" with no space inbetween. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue Jun 12 11:13:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:13:21 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> Message-ID: <3B25EB41.807C2C51@lemburg.com> "Martin v. Loewis" wrote: > > > str.encode() > > str.decode() > > uni.encode() > > #uni.decode() # still missing > > It's not missing. str.decode and uni.encode go through a single codec; > that's easy. str.encode is somewhat more confusing, because it really > is unicode(str).encode. Now, you are not proposing that uni.decode is > str(uni).decode, are you? No. uni.decode() will (just like the other methods) directly interface to the codecs decoder -- there is no magic conversion involved. It is meant to be used by Unicode-Unicode codecs > If not that, what else would it mean? And if it means something else, > it is clearly not symmetric to str.encode, so it is not "missing". It is in the sense that strings support this method and Unicode currently doesn't. > > One very useful application for this method is XML unescaping > > which turns numeric XML entities into Unicode chars. > > Ok. Please show me how that would work. More precisely, please write a > PEP describing the rationale for this feature, including use case > examples and precise semantics of the proposed addition. There's no need for a PEP. This addition is much too simple to require a PEP on its own. As for use cases: I have already given a whole bunch of them (Unicode compression, normalization, escaping in various ways). Codecs are in no way constrained to only interface between strings and Unicode. There are many other possibilities for their usage out there. Just look at the latest checkins for a bunch of string-string codecs for examples of codecs which solve common real-life problems and do not interface to Unicode. > > The key argument for these interfaces is that they provide > > an extensible transformation mechanism for string and binary > > data. > > That is too general for me to understand; I need to see detailed > examples that solve real-world problems. > > Regards, > Martin > > P.S. I don't think that unescaping XML characters entities into > Unicode characters is a useful application in itself. This is normally > done by the XML parser, which not only has to deal with character > entities, but also with general entities and a lot of other markup. > Very few people write XML parsers, and they are using the string > methods and the sre module successfully (if the parser is written in > Python - a C parser would do the unescaping before even passing the > text to Python). True, but not all XML text out there is meant for XML parsers to read ;-). Preprocessing of e.g. XML text in Python is a rather common thing to do and this is what the direct codec access methods are meant for. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@pythonware.com Tue Jun 12 11:46:36 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:46:36 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> mal wrote: > > Ok. Please show me how that would work. More precisely, please write a > > PEP describing the rationale for this feature, including use case > > examples and precise semantics of the proposed addition. > > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. we'd been better off if you'd written a PEP before you started adding decode and encode stuff. what's currently implemented is ugly enough; adding more warts won't make it any prettier. -1 on anything except a PEP that covers *all* aspects of encode/decode (including things that are already implemented) From fredrik@pythonware.com Tue Jun 12 11:47:49 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:47:49 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> Message-ID: <00ba01c0f32d$208d4160$0900a8c0@spiff> Thomas Wouters wrote: > > print "spam:", "ham" : "and" : "eggs" > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. and "+" (or plain whitespace) instead of ":", right? From fredrik@pythonware.com Tue Jun 12 11:55:27 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:55:27 +0200 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline References: <31575A892FF6D1118F5800600846864D78BD25@intrepid> Message-ID: <00c301c0f32e$31cd7ed0$0900a8c0@spiff> simon wrote: > > > In order to avoid having to add yet another magic variable to file > > objects, I propose to give the existing 'softspace' variable an > > extra meaning: a negative value will mean "the last data written > > ended in a newline so no space *or* newline is required." > > Better another magic variable than a magic value for an old one, I think. many file-like C types (e.g. cStringIO) already have special code to deal with a softspace integer attribute. From mal@lemburg.com Tue Jun 12 11:57:32 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:57:32 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <3B25F59C.9AAF604A@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Ok. Please show me how that would work. More precisely, please write a > > > PEP describing the rationale for this feature, including use case > > > examples and precise semantics of the proposed addition. > > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > we'd been better off if you'd written a PEP before you started > adding decode and encode stuff. what's currently implemented > is ugly enough; adding more warts won't make it any prettier. Could you please be more specific about what is "ugly" in the current implementation ? The .encode/.decode methods are a direct interface to the codecs encoder and decoder APIs. I can't find anything ugly about this in general except maybe some of the constraints which were originally put into these interface on the grounds of using them for string/Unicode conversions -- I have already removed most of these and would like to clean this up completely before 2.2 gets out. > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Gee, Guido starts breaking code and nobody objects; I try to clean up some left-overs in the Unicode implementation and people start huge discussions about it. Something is backwards here... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 12:00:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 13:00:40 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B25EB41.807C2C51@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> > > > str.encode() > > > str.decode() > > > uni.encode() > > > #uni.decode() # still missing > > > > It's not missing. str.decode and uni.encode go through a single codec; > > that's easy. str.encode is somewhat more confusing, because it really > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > str(uni).decode, are you? > > No. uni.decode() will (just like the other methods) directly > interface to the codecs decoder -- there is no magic conversion > involved. It is meant to be used by Unicode-Unicode codecs When invoking "Hallo".encode("utf-8"), two conversions are executed: first the default decoding into Unicode, then the UTF-8 encoding. Of course, that is not the intended use (but then, is the intended use documented anywhere?): instead, people should write "Hallo".encode("base64") instead. This is an example I can understand, although I'm not sure why it is inherently better to write this instead of writing base64.encodestring("Hallo"). > > If not that, what else would it mean? And if it means something else, > > it is clearly not symmetric to str.encode, so it is not "missing". > > It is in the sense that strings support this method and Unicode > currently doesn't. The rationale for string.encode is weak: it argues that string->string conversions are frequent enough to justify this API, even though these conversions have nothing to do with coded character sets. So far, I can see *no* rationale for unicode.decode. > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. PEP 1 says: # We intend PEPs to be the primary mechanisms for proposing new # features, for collecting community input on an issue, and for # documenting the design decisions that have gone into Python. The # PEP author is responsible for building consensus within the # community and documenting dissenting opinions. So we have a proposal for a new feature, and we have dissenting opinions. Who are you to decide that this additions is too simple to require a PEP on its own? > As for use cases: I have already given a whole bunch of them > (Unicode compression, normalization, escaping in various ways). I was asking for specific examples: Names of specific codecs that you want to implement, and application code fragments using these specific codecs. I don't know how to use Unicode compression if I had such this proposed feature, for example. I know what XML escaping is, and I cannot see how this feature would help. > True, but not all XML text out there is meant for XML parsers to > read ;-). Preprocessing of e.g. XML text in Python is a rather common > thing to do and this is what the direct codec access methods are > meant for. Can you give an example of an application which processes XML without a parser, but with converting character entities (preferably open-source, so I can study its code)? I wonder whether they get CDATA sections right... MAL, I really mean that: Please don't make claims that something is common or useful without giving an *exact* example. Regards, Martin P.S. This insistence on adding Unicode and string methods makes it appear as if the author of the codecs module now thinks that the API of it sucks. From thomas@xs4all.net Tue Jun 12 12:16:05 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 13:16:05 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <00ba01c0f32d$208d4160$0900a8c0@spiff> References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> <00ba01c0f32d$208d4160$0900a8c0@spiff> Message-ID: <20010612131605.Q22849@xs4all.nl> On Tue, Jun 12, 2001 at 12:47:49PM +0200, Fredrik Lundh wrote: > Thomas Wouters wrote: > > > print "spam:", "ham" : "and" : "eggs" > > > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. > and "+" (or plain whitespace) instead of ":", right? Not really. That would only work for string-types. Print auto-converts, remember ? At least the ':' is unambiguous. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue Jun 12 12:42:31 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 13:42:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> Message-ID: <3B260027.7DD33246@lemburg.com> "Martin v. Loewis" wrote: > > > > > str.encode() > > > > str.decode() > > > > uni.encode() > > > > #uni.decode() # still missing > > > > > > It's not missing. str.decode and uni.encode go through a single codec; > > > that's easy. str.encode is somewhat more confusing, because it really > > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > > str(uni).decode, are you? > > > > No. uni.decode() will (just like the other methods) directly > > interface to the codecs decoder -- there is no magic conversion > > involved. It is meant to be used by Unicode-Unicode codecs > > When invoking "Hallo".encode("utf-8"), two conversions are executed: > first the default decoding into Unicode, then the UTF-8 encoding. Of > course, that is not the intended use (but then, is the intended use > documented anywhere?): instead, people should write > "Hallo".encode("base64") instead. This is an example I can understand, > although I'm not sure why it is inherently better to write this > instead of writing base64.encodestring("Hallo"). Please note that the conversion from string to Unicode is done by the codec, not the .encode() interface. > > > If not that, what else would it mean? And if it means something else, > > > it is clearly not symmetric to str.encode, so it is not "missing". > > > > It is in the sense that strings support this method and Unicode > > currently doesn't. > > The rationale for string.encode is weak: it argues that string->string > conversions are frequent enough to justify this API, even though these > conversions have nothing to do with coded character sets. You still don't get it: codecs can be used for much more than just character set conversion ! > So far, I can see *no* rationale for unicode.decode. > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > PEP 1 says: > > # We intend PEPs to be the primary mechanisms for proposing new > # features, for collecting community input on an issue, and for > # documenting the design decisions that have gone into Python. The > # PEP author is responsible for building consensus within the > # community and documenting dissenting opinions. > > So we have a proposal for a new feature, and we have dissenting > opinions. Who are you to decide that this additions is too simple to > require a PEP on its own? So you want a PEP for each and every small addition to in the core ?! (I am not talking about features which might break code !) > > As for use cases: I have already given a whole bunch of them > > (Unicode compression, normalization, escaping in various ways). > > I was asking for specific examples: Names of specific codecs that you > want to implement, and application code fragments using these specific > codecs. I don't know how to use Unicode compression if I had such this > proposed feature, for example. I know what XML escaping is, and I > cannot see how this feature would help. I think I have given enough examples in this thread already. See below for some more. > > True, but not all XML text out there is meant for XML parsers to > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > thing to do and this is what the direct codec access methods are > > meant for. > > Can you give an example of an application which processes XML without > a parser, but with converting character entities (preferably > open-source, so I can study its code)? I wonder whether they get CDATA > sections right... MAL, I really mean that: Please don't make claims > that something is common or useful without giving an *exact* example. Yes, I am using these feature in real code and no, I can't show it to you because it's closed source. XML is only one example where this would be useful, HTML is another text format which would benefit from it, URL encoding is yet another application. You basically find these applications in all situations where some form of escaping is needed. What I am trying to do here is simplify codec access and usage for the casual user. .encode() and .decode() are very intuitive ways to deal with data transformation, IMHO. > Regards, > Martin > > P.S. This insistence on adding Unicode and string methods makes it > appear as if the author of the codecs module now thinks that the API > of it sucks. No comment. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry@digicool.com Tue Jun 12 15:22:26 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:22:26 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <15142.9634.842402.241225@anthem.wooz.org> >>>>> "M" == M writes: M> Codecs are in no way constrained to only interface between M> strings and Unicode. There are many other possibilities for M> their usage out there. Just look at the latest checkins for a M> bunch of string-string codecs for examples of codecs which M> solve common real-life problems and do not interface to M> Unicode. Having just followed this thread tangentially, I do have to say it seems quite cool to be able to do something like the following in Python 2.2: >>> s = msg['from'] >>> parts = s.split('?') >>> if parts[2].lower() == 'q': ... name = parts[3].decode('quopri') ... elif parts[2].lower() == 'b': ... name = parts[3].decode('base64') ... -Barry From fredrik@pythonware.com Tue Jun 12 15:45:16 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 16:45:16 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> barry wrote: > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') uhuh? and how exactly is this cooler than being able to do something like the following: import quopri, base64 s = msg['from'] parts = s.split('?') if parts[2].lower() == 'q': name = quopri.decodestring(parts[3]) elif parts[2].lower() == 'b': name = base64.decodestring(parts[3]) (going through the codec registry is slower, and imports more modules, but what's so cool with that?) From barry@digicool.com Tue Jun 12 15:50:01 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:50:01 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <15142.11289.16053.424966@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> uhuh? and how exactly is this cooler than being able to do FL> something like the following: | import quopri, base64 | s = msg['from'] | parts = s.split('?') | if parts[2].lower() == 'q': | name = quopri.decodestring(parts[3]) | elif parts[2].lower() == 'b': | name = base64.decodestring(parts[3]) FL> (going through the codec registry is slower, and imports more FL> modules, but what's so cool with that?) -------------------- snip snip -------------------- Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import quopri >>> quopri.decodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'decodestring' >>> quopri.encodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'encodestring' -------------------- snip snip -------------------- Much cooler :) Okay, okay, so we /could/ add encodestring/decodestring to quopri.py, which isn't a bad idea. But it seems to me that the s.encode() s.decode() API is nicely universal for any supported encoding. but-what-do-i-know?-ly y'rs, -Barry From skip@pobox.com (Skip Montanaro) Tue Jun 12 16:32:11 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 12 Jun 2001 10:32:11 -0500 Subject: [Python-Dev] Re: metaclasses -- aka Don Beaudry hook/hack In-Reply-To: References: Message-ID: <15142.13819.477491.993419@beluga.mojam.com> James> Before I head too deeply into Zope dependencies, I would be James> interested in knowing whether or not "type(MyClass) == James> types.ClassType" and "isinstance(myInstance,MyClass)" work for James> classes derived from ExtensionClass. Straight from the horse's mouth: >>> type(gtk.GtkButton) >>> type(gtk.GtkButton) == types.ClassType 0 >>> isinstance(gtk.GtkButton(), gtk.GtkButton) 1 James> (And if so, why do these work for C extension classes using the James> Don Beaudry hook but not for Python classes using the same hook?) You'll have to ask someone with more subject knowledge. (Don would probably be a good start. ;-) I've cc'd python-dev because the experts in this area are all there. -- Skip Montanaro (skip@pobox.com) (847)971-7098 From skip@pobox.com (Skip Montanaro) Tue Jun 12 16:53:24 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 12 Jun 2001 10:53:24 -0500 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <15142.15092.57490.275201@beluga.mojam.com> Tim> The notion that legions of people are using Tim> print line Tim> as an obscure way to get double-spacing is taking me by surprise. Tim> Nobody on the iterators list had this objection. I suspect that most CGI scripts that didn't use any abstraction for HTTP responses suffer from this potential problem. I've been using one abstraction or another for quite awhile now, but I still have a few CGI scripts laying around that still use print to emit headers and bodies of HTTP responses. Skip From barry@digicool.com Tue Jun 12 17:06:53 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 12:06:53 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <15142.15092.57490.275201@beluga.mojam.com> Message-ID: <15142.15901.223641.151562@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> I suspect that most CGI scripts that didn't use any SM> abstraction for HTTP responses suffer from this potential SM> problem. I've been using one abstraction or another for quite SM> awhile now, but I still have a few CGI scripts laying around SM> that still use print to emit headers and bodies of HTTP SM> responses. Same here. From paulp@ActiveState.com Tue Jun 12 18:22:31 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:22:31 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <3B264FD7.86ACB034@ActiveState.com> "Barry A. Warsaw" wrote: > >... > > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... I think that the central point is that if code like the above is useful and supported then it needs to be the same for Unicode strings as for 8-bit strings. If the code above is NOT useful and should NOT be supported then we need to undo it before 2.2 ships. This unicode.decode argument is just a proxy for the real argument about the above. I don't feel strongly one way or another about this (ab?)use of the codecs concept, myself, but I do feel strongly that Unicode strings should behave as much as possible like 8-bit strings. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Tue Jun 12 18:31:54 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:31:54 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <3B26520A.C579D00C@ActiveState.com> Fredrik Lundh wrote: > >... > > uhuh? and how exactly is this cooler than being able to do > something like the following: > > import quopri, base64 >... > > (going through the codec registry is slower, and imports more > modules, but what's so cool with that?) One argument in favor is that the base64 and quopri modules are not standardized today. In fact, Python has a huge problem with standardization of access paradigms in the standard library. We get the best standardization (i.e. of the "file interface") when we force module authors to conform to a standard in order to get some "extra feature" of the standard library. A counter argument is that the conflation of the concept of Unicode encoding/decoding and other forms of encoding/decoding could be confusing. MAL would not have to keep pointing out that "codecs are for more than Unicode encoding/decoding" if it was obvious. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry@digicool.com Tue Jun 12 19:24:25 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:24:25 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <15142.24153.921774.610559@anthem.wooz.org> >>>>> "PP" == Paul Prescod writes: PP> I don't feel strongly one way or another about this (ab?)use PP> of the codecs concept, myself, but I do feel strongly that PP> Unicode strings should behave as much as possible like 8-bit PP> strings. I'd agree with both statements. time-to-add-{encode,decode}string()-to-quopri-ly y'rs, -Barry From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:00:19 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:00:19 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B260027.7DD33246@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> <3B260027.7DD33246@lemburg.com> Message-ID: <200106121800.f5CI0Jw00946@mira.informatik.hu-berlin.de> > > So we have a proposal for a new feature, and we have dissenting > > opinions. Who are you to decide that this additions is too simple to > > require a PEP on its own? > > So you want a PEP for each and every small addition to in the > core ?! (I am not talking about features which might break code !) No, additions that find immediate consent and come with complete patches (including documentation and test cases) don't need this overhead. Features that find resistance should go through the full process. > > I was asking for specific examples: Names of specific codecs that you > > want to implement, and application code fragments using these specific > > codecs. I don't know how to use Unicode compression if I had such this > > proposed feature, for example. I know what XML escaping is, and I > > cannot see how this feature would help. > > I think I have given enough examples in this thread already. See > below for some more. I haven't seen a single example involving actual Python code. > > > True, but not all XML text out there is meant for XML parsers to > > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > > thing to do and this is what the direct codec access methods are > > > meant for. > > > > Can you give an example of an application [...] > > Yes, I am using these feature in real code and no, I can't show it to > you because it's closed source. Not very convincing... If this is "a rather common thing to do", it shouldn't be hard to find examples in other people's code, shouldn't it? > XML is only one example where this would be useful, HTML is another > text format which would benefit from it, URL encoding is yet another > application. You basically find these applications in all situations > where some form of escaping is needed. These are all not specific examples. I'm still looking for a specific application that might use this feature, and specific codec names and implementations. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:08:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:08:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.9634.842402.241225@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... What is the type of parts[3] here? If it is a plain string, it is already possible: >>> 'SGVsbG8=\n'.decode("base64") 'Hello' I doubt you'd ever have a Unicode string that represents a base64-encoded byte string, and if you had, .decode would probably do the wrong thing: >>> import codecs >>> enc,dec,_,_ = codecs.lookup("base64") >>> dec(u'SGVsbG8=\n') ('Hello', 9) Note that this returns a byte string, not a Unicode string. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:18:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:18:45 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B264FD7.86ACB034@ActiveState.com> (message from Paul Prescod on Tue, 12 Jun 2001 10:22:31 -0700) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> > > Having just followed this thread tangentially, I do have to say it > > seems quite cool to be able to do something like the following in > > Python 2.2: > > > > >>> s = msg['from'] > > >>> parts = s.split('?') > > >>> if parts[2].lower() == 'q': > > ... name = parts[3].decode('quopri') > > ... elif parts[2].lower() == 'b': > > ... name = parts[3].decode('base64') > > ... > > I think that the central point is that if code like the above is useful > and supported then it needs to be the same for Unicode strings as for > 8-bit strings. Why is that? An encoding, by nature, is something that produces a byte sequence from some input. So you can only decode byte sequences, not character strings. > If the code above is NOT useful and should NOT be supported then we > need to undo it before 2.2 ships. This unicode.decode argument is > just a proxy for the real argument about the above. No, it isn't. The code is useful for byte strings, but not for Unicode strings. > I don't feel strongly one way or another about this (ab?)use of the > codecs concept, myself, but I do feel strongly that Unicode strings > should behave as much as possible like 8-bit strings. Not at all. Byte strings and character strings are as different as are byte strings and lists of DOM child nodes (i.e. the only common thing is that they are sequences). Regards, Martin From barry@digicool.com Tue Jun 12 19:35:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:35:10 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> Message-ID: <15142.24798.941322.762791@anthem.wooz.org> >>>>> "MvL" == Martin v Loewis writes: MvL> What is the type of parts[3] here? If it is a plain string, MvL> it is already possible: >> 'SGVsbG8=\n'.decode("base64") MvL> 'Hello' But only in Python 2.2a0 currently, right? And yes, the type is plain string. MvL> I doubt you'd ever have a Unicode string that represents a MvL> base64-encoded byte string, and if you had, .decode would MvL> probably do the wrong thing: >> import codecs enc,dec,_,_ = codecs.lookup("base64") >> dec(u'SGVsbG8=\n') MvL> ('Hello', 9) MvL> Note that this returns a byte string, not a Unicode string. I trust you on that. ;) I've only played with this tangentially since this thread cropped up. -Barry From paulp@ActiveState.com Tue Jun 12 19:51:25 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 11:51:25 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> Message-ID: <3B2664AD.B560D685@ActiveState.com> "Martin v. Loewis" wrote: > >... > > Why is that? An encoding, by nature, is something that produces a byte > sequence from some input. So you can only decode byte sequences, not > character strings. According to this logic, it is not logical to "encode" a Unicode string into a base64'd Unicode string or "decode" a Unicode string from a base64'd Unicode string. But I have seen circumstances where one XML document is base64'd into another. In that circumstance, it would be useful to say node.nodeValue.decode("base64"). Let me turn the argument around? What would the *harm* in having 8-bit strings and Unicode strings behave similarly in this manner? >... > Not at all. Byte strings and character strings are as different as are > byte strings and lists of DOM child nodes (i.e. the only common thing > is that they are sequences). 8-bit strings are not purely byte strings. They are also "character strings". That's why they have methods like "capitalize", "isalpha", "lower", "swapcase", "title" and so forth. DOM nodes and byte strings have virtually no methods in common. We could argue angels on the head of a pin until the cows come home but 90% of all Python users think of 8-bit strings as strings of characters. So arguments based on the idea that they are not "really" character strings are wishful thinking. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 21:01:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 22:01:39 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.24798.941322.762791@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> <15142.24798.941322.762791@anthem.wooz.org> Message-ID: <200106122001.f5CK1de01350@mira.informatik.hu-berlin.de> > MvL> What is the type of parts[3] here? If it is a plain string, > MvL> it is already possible: > > >> 'SGVsbG8=\n'.decode("base64") > MvL> 'Hello' > > But only in Python 2.2a0 currently, right? Exactly, since MAL's last patch. If people think that byte strings must behave exactly as Unicode strings, I'd rather prefer to back out this patch instead of adding unicode.decode. Personally, I think the status quo is fine and should not be changed. Regards, Martin From aahz@rahul.net Wed Jun 13 00:48:14 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 12 Jun 2001 16:48:14 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B25C62C.969B40B3@lemburg.com> from "M.-A. Lemburg" at Jun 12, 2001 09:35:08 AM Message-ID: <20010612234815.2C90599C82@waltz.rahul.net> M.-A. Lemburg wrote: > Aahz Maruch wrote: >> M.-A. Lemburg wrote: >>> >>> Tamito KAJIYAMA recently announced that he changed the licenses >>> on his Japanese codecs from GPL to a BSD variant. This is great >>> news since this would allow adding the codecs to the Python core >>> which would certainly attract more users to Python in Asia. >>> >>> The codecs are 280kB when compressed as .tar.gz file. >> >> +0 >> >> I like the idea, am uncomfortable with that amount of space. > > Tamito corrected me about the size (his file includes the .pyc > byte code files): the correct size for the sources is 143kB -- > almost half of what I initially wrote. That makes me +0.5, possibly a bit higher. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From greg@cosc.canterbury.ac.nz Wed Jun 13 00:57:35 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 11:57:35 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl> Message-ID: <200106122357.LAA03316@s454.cosc.canterbury.ac.nz> Thomas Wouters : > I'd also prefer special syntax to control the softspace > behaviour... Too late for that, I 'spose Maybe not. I'd suggest spelling "don't add a newline or a space after this" as: print a, b, c... This could coexist with the current softspace behaviour, and the use of a trailing comma could be deprecated. After a suitable warning period, the softspace flag could then be removed. > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. I don't think it's so important to have a special syntax for that, since it can be accomplished in other ways without too much difficulty, e.g. print "%s: %s%s%s" % ("spam", "ham", "and", "eggs")... The main thing I'd like is to get rid of the statefulness of the current behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jun 13 01:02:40 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 12:02:40 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Particularly, it should clearly explain why we need a completely new and separate namespace mechanism for these codec things, and provide a firm rationale for deciding whether any proposed new form of encoding or decoding should be placed in this namespace or the module namespace. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From paulp@ActiveState.com Wed Jun 13 01:32:17 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 17:32:17 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B26B491.CA8536BD@ActiveState.com> Aahz Maruch wrote: > >.... > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We really shouldn't consider the Japanese without Chinese and Korean. And those both seem *larger* than the Japanese. :( What if we add them to CVS and formally maintain them as part of the core but distribute them as a separate download? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Wed Jun 13 03:25:23 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:25:23 -0700 Subject: [Python-Dev] Pure Python strptime Message-ID: <3B26CF13.2A337AC6@ActiveState.com> Should this strptime implementation be added to the standard library? http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/56036 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Wed Jun 13 03:41:53 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:41:53 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> Message-ID: <3B26D2F1.8840FB1A@ActiveState.com> Greg Ewing wrote: > > > -1 on anything except a PEP that covers *all* aspects of > > encode/decode (including things that are already implemented) > > Particularly, it should clearly explain why we need a > completely new and separate namespace mechanism for these > codec things, I don't know whether MAL will write the PEP or not but the rationale for a new namespace is trivial. The namespace exists and is maintained by the Internet Assigned Names Association. You can't work with Unicode without working with names from this list: http://www.iana.org/assignments/character-sets MAL is basically exending it to include names from this list: http://www.iana.org/assignments/transfer-encodings and others. > and provide a firm rationale for deciding > whether any proposed new form of encoding or decoding > should be placed in this namespace or the module namespace. *My* answer would be that any function that has strings (8-bit or Unicode) as both domain and range is potentially a codec. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Wed Jun 13 05:45:36 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 16:45:36 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <200106130445.QAA03370@s454.cosc.canterbury.ac.nz> Paul Prescod : > The namespace exists and is maintained by > the Internet Assigned Names Association. Hmmm... so, is the only reason that we're not using the module namespace the fact that these names can contain non-alphanumeric characters? Or is there more to it than that? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com (Skip Montanaro) Wed Jun 13 06:09:38 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 00:09:38 -0500 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B26B491.CA8536BD@ActiveState.com> References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <15142.62866.180570.158325@beluga.mojam.com> Paul> What if we add them to CVS and formally maintain them as part of Paul> the core but distribute them as a separate download? That seems to make sense to me. I suspect most Linux distributions (for example) bundle Python into multiple pieces already. My Mandrake system splits the core into (I think) four pieces. It also bundles several other RPMs for PIL, NumPy, Postgres and RPM. Adding another package for a set of codecs doesn't seem like a big deal. Skip From mal@lemburg.com Wed Jun 13 08:02:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:02:05 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B270FED.8E2A4ECB@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > Aahz Maruch wrote: > >> M.-A. Lemburg wrote: > >>> > >>> Tamito KAJIYAMA recently announced that he changed the licenses > >>> on his Japanese codecs from GPL to a BSD variant. This is great > >>> news since this would allow adding the codecs to the Python core > >>> which would certainly attract more users to Python in Asia. > >>> > >>> The codecs are 280kB when compressed as .tar.gz file. > >> > >> +0 > >> > >> I like the idea, am uncomfortable with that amount of space. > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We will be working on reducing the size of the mapping tables. Can't promise anything, but I believe that Tamito can squeeze them into under 100k using some compression technique (which one is yet to be determined ;). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 08:05:31 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:05:31 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <3B2710BB.CFD8215@lemburg.com> Paul Prescod wrote: > > Aahz Maruch wrote: > > > >.... > > > > > > Tamito corrected me about the size (his file includes the .pyc > > > byte code files): the correct size for the sources is 143kB -- > > > almost half of what I initially wrote. > > > > That makes me +0.5, possibly a bit higher. > > We really shouldn't consider the Japanese without Chinese and Korean. > And those both seem *larger* than the Japanese. :( Unfortunately, these aren't available under a usable (=non-GPL) license yet. > What if we add them to CVS and formally maintain them as part of the > core but distribute them as a separate download? Good idea. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 08:17:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:17:14 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <3B27137A.E7BFC4EC@lemburg.com> Paul Prescod wrote: > > Greg Ewing wrote: > > > > > -1 on anything except a PEP that covers *all* aspects of > > > encode/decode (including things that are already implemented) > > > > Particularly, it should clearly explain why we need a > > completely new and separate namespace mechanism for these > > codec things, > > I don't know whether MAL will write the PEP or not With the kind of attitude towards the proposed extensions which I am currently getting in this forum, I'd rather spend my time on something more useful. > but the rationale for > a new namespace is trivial. The namespace exists and is maintained by > the Internet Assigned Names Association. You can't work with Unicode > without working with names from this list: > > http://www.iana.org/assignments/character-sets > > MAL is basically exending it to include names from this list: > > http://www.iana.org/assignments/transfer-encodings > > and others. Right. Since these codecs live in the encoding package, I don't think we have a namespace problem here. Codecs which are hooked into the codec registry by the encoding package's search function will have to provide a getregentry() entry point. If this API is not available, the codec won't load. Since the encoding package's search function is using standard Python imports for loading the codecs, we can also benefit from a nice side-effect: codec names can use Python's dotted names (which then map to standard Python packages). This allows codec writers like Tamito to place their codecs into Python package thereby avoiding any conflict with other authors of codecs with similar names. > > and provide a firm rationale for deciding > > whether any proposed new form of encoding or decoding > > should be placed in this namespace or the module namespace. > > *My* answer would be that any function that has strings (8-bit or > Unicode) as both domain and range is potentially a codec. Right. (Hey, the first time *we* agree on something ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 13:53:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 14:53:50 +0200 Subject: [Python-Dev] Weird message to stderr Message-ID: <3B27625E.F18046F7@lemburg.com> Running Python 2.1 using a .pyc file I get these weird messages printed to stderr: run_pyc_file: nested_scopes: 0 These originate in pythonrun.c: static PyObject * run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) { PyCodeObject *co; PyObject *v; long magic; long PyImport_GetMagicNumber(void); magic = PyMarshal_ReadLongFromFile(fp); if (magic != PyImport_GetMagicNumber()) { PyErr_SetString(PyExc_RuntimeError, "Bad magic number in .pyc file"); return NULL; } (void) PyMarshal_ReadLongFromFile(fp); v = PyMarshal_ReadLastObjectFromFile(fp); fclose(fp); if (v == NULL || !PyCode_Check(v)) { Py_XDECREF(v); PyErr_SetString(PyExc_RuntimeError, "Bad code object in .pyc file"); return NULL; } co = (PyCodeObject *)v; v = PyEval_EvalCode(co, globals, locals); if (v && flags) { if (co->co_flags & CO_NESTED) flags->cf_nested_scopes = 1; fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", flags->cf_nested_scopes); } Py_DECREF(co); return v; } Is this is left over debug printf or should I be warned in some way ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed Jun 13 15:41:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 10:41:37 -0400 Subject: [Python-Dev] Re: Adding .decode() method to Unicode In-Reply-To: Your message of "Tue, 12 Jun 2001 22:40:01 EDT." References: Message-ID: <200106131441.KAA16557@cj20424-a.reston1.va.home.com> Wow, this almost looks like a real flamefest. ("Flame" being defined as the presence of metacomments.) (In the following, s is an 8-bit string, u is a Unicode string, and e is an encoding name.) The original design of the encode() methods of string and Unicode objects (in 2.0 and 2.1) is asymmetric, and clearly geared towards Unicode codecs only: to decode an 8-bit string you *have* to use unicode(s, encoding) while to encode a Unicode string into a specific 8-bit encoding you *have* to use u.encode(e). 8-bit strings also have an encode() method: s.encode(e) is the same as unicode(s).encode(e). (This is useful since code that expects Unicode strings should also work when it is passed ASCII-encoded 8-bit strings.) I'd say there's no need for s.decode(e), since this can already be done with unicode(s, e) -- and to me that API looks better since it clearly states that the result is Unicode. We *could* have designed the encoding API similarly: str(u, e) is available, symmetric with unicode(s, e), and a logical extension of str(u) which uses the default encoding. But I accept the argument that u.encode(e) is better because it emphasizes the encoding action, and because it means no API changes to str(). I guess what I'm saying here is that 'str' does not give enough of a clue that an encoding action is going on, while 'unicode' *does* give a clue that a decoding action is being done: as soon as you read "Unicode" you think "Mmm, encodings..." -- but "str" is pretty neutral, so u.encode(e) is needed to give a clue. Marc-Andre proposes (and has partially checked in) changes that stretch the meaning of the encode() method, and add a decode() method, to be basically interfaces to anything you can do with the codecs module. The return type of encode() and decode() is now determined by the codec (formerly, encode() always returned an 8-bit string). Some new codecs have been added that do things like gzip and base64. Initially, I liked this, and even contributed a codec. But questions keep coming up. What is the problem being solved? True, the codecs module has a clumsy interface if you just want to invoke a codec on some data. But that can easily be remedied by adding convenience functions encode() and decode() to codecs.py -- which would have the added advantage that it would work for other datatypes that support the buffer interface, e.g. codecs.encode(myPILobject, "base64"). True, the "codec" pattern can be used for other encodings than Unicode. But it seems to me that the entire codecs architecture is rather strongly geared towards en/decoding Unicode, and it's not clear how well other codecs fit in this pattern (e.g. I noticed that all the non-Unicode codecs ignore the error handling parameter or assert that it is set to 'strict'). Is it really right that x.encode("gzip") and x.encode("utf-8") look similar, while the former requires an 8-bit string and the latter only makes sense if x is a Unicode string? Another (minor) issue is that Unicode encoding names are an IANA namespace. Is it wise to add our own names to this? I'm not forcing a decision here, but I do ask that we consider these issues before forging ahead with what might be a mistake. A PEP would be most helpful to focus the discussion. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jun 13 16:19:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 11:19:03 -0400 Subject: [Python-Dev] Releasing 2.0.1 Message-ID: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> I think it's now or never with the 2.0.1 release. Moshe seems to have disappeared from the face of the earth. His last mail to me (May 23) suggested that it was good to go except for the SRE checkin and the NEWS file. I did the SRE checkin today (making it identical to what's in 2.1, per /F's recommendation) and added a note about that to the NEWS file -- I wouldn't know what else would be needed there. So I think it's good to go now. I can release a 2.0.1c1 this week (indicating a release candidate) and a final 2.0.1 next week. If you know a good reason why I should hold off on releasing this, or if you have a patch that absolutely should make it into 2.0.1, please let me know NOW! This project is way overdue. (Thomas is ready to release 2.1.1 as soon as this goes out, I believe. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed Jun 13 16:29:19 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 17:29:19 +0200 Subject: [Python-Dev] Releasing 2.0.1 References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <023f01c0f41d$9dfb87b0$0900a8c0@spiff> guido wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 From skip@pobox.com (Skip Montanaro) Wed Jun 13 16:49:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 10:49:58 -0500 Subject: [Python-Dev] on announcing point releases Message-ID: <15143.35750.837420.376281@beluga.mojam.com> (Just thinking out loud) I wonder if it would help gain wider distribution for the point releases if explicit announcements were sent to the various Linux distributors so they could create updated packages (RPMs, debs, whatever) for their users. On a related note, I see one RedHat email address on python-dev (and one Debian address on python-list). Are there other Linux distributions that are heavy Python users (as opposed to simply packaging it up for inclusion)? If so, perhaps they should be invited to join python-dev. Skip From niemeyer@conectiva.com Wed Jun 13 16:54:08 2001 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 13 Jun 2001 12:54:08 -0300 Subject: [Python-Dev] sre improvements Message-ID: <20010613125408.W13940@tux.distro.conectiva> I'm forwarding this to the dev list.. probably somebody here knows about this... -------------- Hi there!! I have looked into sre, and was wondering if somebody is working to implement more features in it. I'd like, for example, to see the (?(1)blah) operator, available in perl, working. Should I care about this? Should I write some code?? Anybody working in sre currently? Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From skip@pobox.com (Skip Montanaro) Wed Jun 13 17:03:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 11:03:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <20010613125408.W13940@tux.distro.conectiva> References: <20010613125408.W13940@tux.distro.conectiva> Message-ID: <15143.36590.447465.657241@beluga.mojam.com> Gustavo> I'd like, for example, to see the (?(1)blah) operator, Gustavo> available in perl, working. Gustavo, For the non-Perl-heads on the list, can you explain what the (?(1)blah) operator does? -- Skip Montanaro (skip@pobox.com) (847)971-7098 From gregor@hoffleit.de Wed Jun 13 17:13:17 2001 From: gregor@hoffleit.de (Gregor Hoffleit) Date: Wed, 13 Jun 2001 18:13:17 +0200 Subject: [Python-Dev] on announcing point releases In-Reply-To: <15143.35750.837420.376281@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 10:49:58AM -0500 References: <15143.35750.837420.376281@beluga.mojam.com> Message-ID: <20010613181317.B30006@mediasupervision.de> On Wed, Jun 13, 2001 at 10:49:58AM -0500, Skip Montanaro wrote: > I wonder if it would help gain wider distribution for the point releases if > explicit announcements were sent to the various Linux distributors so they > could create updated packages (RPMs, debs, whatever) for their users. > > On a related note, I see one RedHat email address on python-dev (and one > Debian address on python-list). Are there other Linux distributions that > are heavy Python users (as opposed to simply packaging it up for inclusion)? > If so, perhaps they should be invited to join python-dev. Rest assured that Debian is present on python-dev as well, and nervously looking forward to the maintenance releases ;-) I hope 2.1.1 will make it out in time as well for our next release (being aware that 'before the next Debian release happens' is no very tight timeframe ;-). Gregor From guido@digicool.com Wed Jun 13 17:16:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 12:16:42 -0400 Subject: [Python-Dev] Re: PEP 259: Omit printing newline after newline Message-ID: <200106131616.MAA17468@cj20424-a.reston1.va.home.com> OK, OK, PEP 259 is dead. It seemed a nice idea at the time. :-) Alex and others, if you're serious about implementing print as __print__(), why don't you write a PEP? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Jun 13 17:21:20 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Jun 2001 12:21:20 -0400 (EDT) Subject: [Python-Dev] on announcing point releases In-Reply-To: <20010613181317.B30006@mediasupervision.de> References: <15143.35750.837420.376281@beluga.mojam.com> <20010613181317.B30006@mediasupervision.de> Message-ID: <15143.37632.758887.966026@cj42289-a.reston1.va.home.com> Gregor Hoffleit writes: > looking forward to the maintenance releases ;-) I hope 2.1.1 will make it > out in time as well for our next release (being aware that 'before the next Personally, I see no reason for Thomas to wait for the 2.0.1 release if he doesn't want to. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fredrik@pythonware.com Wed Jun 13 17:32:13 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 18:32:13 +0200 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <007801c0f426$84d1f220$4ffa42d5@hagrid> skip wrote: > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? conditionals: (?(cond)true) (?(cond)true|false) where cond is a group number (true if defined) or an assertion pattern, and true/false are patterns. (imo, whoever invented that needs help ;-) From akuchlin@mems-exchange.org Wed Jun 13 17:39:58 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 13 Jun 2001 12:39:58 -0400 Subject: [Python-Dev] sre improvements Message-ID: >For the non-Perl-heads on the list, can you explain what the (?(1)blah) >operator does? Conditionals. From http://www.perl.com/pub/doc/manual/html/pod/perlre.html, (...)(?(1)A|B) will match 'A' if group 1 matched, and B if it didn't. I'm not sure how "matched" is defined, as the Perl docs are vague; judging from the example, it means 'matched something of nonzero length'. Perl 5.6 introduced a bunch of new regex features, but I'm not sure how much we actually *care* about them; they're no doubt useful if regexes are the only tool you've got and you try to do full parsers using them, but they're also complicated to explain and will make the compiler messier. For example, lookaheads can also go into the conditional, not just an integer. (?i) now obeys the scoping from parens, and you can turn it off with (?-i). If Gustavo wants to implement these features and /F approves of his patches, then sure, put them in. But if either of those conditions fails, little will be lost. --amk From dmitry.antipov@auriga.ru Wed Jun 13 17:46:09 2001 From: dmitry.antipov@auriga.ru (dmitry.antipov@auriga.ru) Date: Wed, 13 Jun 2001 20:46:09 +0400 Subject: [Python-Dev] Why not Lisp-like list-related functions ? Message-ID: <3B2798D1.16F832A3@auriga.ru> Hello all, I'm new to Python but quite familiar with Lisp. So my question is about Python list-related functions. Why append(), extend(), sort(), reverse() etc. doesn't return a reference to it's own (modified) argument ? IMHO (I'm tweaking Python 2.1 to allow first example possible), >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) [9, 13, 19, 21, 8, 3, 6] >>> looks much better (and more "functional") than >>> x = [5, 8, 9, 3] >>> x.sort() >>> x = [3 + x * 2 for x in x] >>> y = [6, 3, 8] >>> y.reverse() >>> x.extend(y) >>> x [9, 13, 19, 21, 8, 3, 6] >>> Python designers and fans, please explain it to me :-). Any comments are welcome. Thanks and reply to me directly if possible, Dmitry Antipov From guido@digicool.com Wed Jun 13 18:01:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 13:01:34 -0400 Subject: [Python-Dev] Weird message to stderr Message-ID: <200106131701.NAA17619@cj20424-a.reston1.va.home.com> > Running Python 2.1 using a .pyc file I get these weird messages > printed to stderr: > > run_pyc_file: nested_scopes: 0 > > These originate in pythonrun.c: > > static PyObject * > run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, > PyCompilerFlags *flags) > { [...] > if (v && flags) { > if (co->co_flags & CO_NESTED) > flags->cf_nested_scopes = 1; > fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", > flags->cf_nested_scopes); > } > Py_DECREF(co); > return v; > } > > Is this is left over debug printf or should I be warned > in some way ? I'll channel Jeremy... Looks like a debug message -- this code isn't tested by the standard test suite. Feel free to get rid of the fprintf() statement (and no, you don't have to write a PEP for this :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed Jun 13 18:06:52 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 19:06:52 +0200 Subject: [Python-Dev] Why not Lisp-like list-related functions ? References: <3B2798D1.16F832A3@auriga.ru> Message-ID: <012d01c0f42b$45453b30$4ffa42d5@hagrid> Dmitry wrote: > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? doesn't Lisp have a FAQ? ;-) http://www.python.org/doc/FAQ.html#6.20 Q. Why doesn't list.sort() return the sorted list? ... basically, operations that modify an object generally don't return the object itself, to avoid mistakes like: for item in list.reverse(): print item # backwards ... for item in list.reverse(): print item # backwards, or? a slightly more pythonic way would be to add sorted, extended, reversed (etc) -- but that leads to method bloat. in addition, based on studying huge amounts of python code, I doubt cascading list operations would save the world that much typing... followups to python-list@python.org From paulp@ActiveState.com Wed Jun 13 18:22:09 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 13 Jun 2001 10:22:09 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> Message-ID: <3B27A141.6C69EC55@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > > > We really shouldn't consider the Japanese without Chinese and Korean. > > And those both seem *larger* than the Japanese. :( > > Unfortunately, these aren't available under a usable (=non-GPL) > license yet. Frank Chen has agreed to make them available under a Python-style license. > > What if we add them to CVS and formally maintain them as part of the > > core but distribute them as a separate download? > > Good idea. All in favour? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From aahz@rahul.net Wed Jun 13 18:32:24 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 13 Jun 2001 10:32:24 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B27A141.6C69EC55@ActiveState.com> from "Paul Prescod" at Jun 13, 2001 10:22:09 AM Message-ID: <20010613173224.0FFB999C87@waltz.rahul.net> >>> What if we add them to CVS and formally maintain them as part of the >>> core but distribute them as a separate download? >> >> Good idea. > > All in favour? +1 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gward@python.net Wed Jun 13 19:53:20 2001 From: gward@python.net (Greg Ward) Date: Wed, 13 Jun 2001 14:53:20 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <007801c0f426$84d1f220$4ffa42d5@hagrid>; from fredrik@pythonware.com on Wed, Jun 13, 2001 at 06:32:13PM +0200 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> Message-ID: <20010613145320.G5114@gerg.ca> On 13 June 2001, Fredrik Lundh said: > conditionals: > > (?(cond)true) > (?(cond)true|false) > > where cond is a group number (true if defined) or an assertion > pattern, and true/false are patterns. > > (imo, whoever invented that needs help ;-) I think I'd have to agree with /F on this one... somewhere around Perl 5.003 or 5.004, regexes in Perl went from being a powerful and really cool facility to being a massively overgrown language-within-a-language. I *tried* to use some of the fancy new features a few times out of curiosity, but could never get them to work. (At the time, I think I was a pretty sharp Perl programmer, although I've dulled since then.) Greg -- Greg Ward - Unix bigot gward@python.net http://starship.python.net/~gward/ No animals were harmed in transmitting this message. From jepler@inetnebr.com Wed Jun 13 17:09:58 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Wed, 13 Jun 2001 11:09:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <15143.36590.447465.657241@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 11:03:58AM -0500 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <20010613110957.C29405@inetnebr.com> On Wed, Jun 13, 2001 at 11:03:58AM -0500, Skip Montanaro wrote: > > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > Gustavo, > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? from perlre(1): (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero- width assertion. Say, m{ ( $ )? [^()]+ (?(1) $ ) }x matches a chunk of non-parentheses, possibly included in parentheses themselves. Jeff From tim.one@home.com Thu Jun 14 07:12:48 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 14 Jun 2001 02:12:48 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B2664AD.B560D685@ActiveState.com> Message-ID: [Paul Prescod] > ... > We could argue angels on the head of a pin until the cows come home but > 90% of all Python users think of 8-bit strings as strings of characters. Actually, if you count me, make that 92%. some-things-were-easier-when-python-had-50-users-and-i-was-two- of-them-ly y'rs - tim From paulp@ActiveState.com Thu Jun 14 08:30:19 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 00:30:19 -0700 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> Message-ID: <3B28680B.A46CF171@ActiveState.com> Greg Ward wrote: > >... > > I think I'd have to agree with /F on this one... somewhere around Perl > 5.003 or 5.004, regexes in Perl went from being a powerful and really > cool facility to being a massively overgrown language-within-a-language. > I *tried* to use some of the fancy new features a few times out of > curiosity, but could never get them to work. (At the time, I think I > was a pretty sharp Perl programmer, although I've dulled since then.) I would rather see us try a new approach to regular expressions. I've seen a few proposals for more verbose-but-readable syntaxes. I think one was from Greg Ewing? And maybe one from Ping? For those of us who use regular expressions only once in a while (i.e. the lucky ones), the current syntax is a holy terror. Which characters are magical again? In what contexts? With how many levels of backslashing? Upper case W versus lower case W? Obviously we can never abandon the tried and true Perl5 RE module, but I think we could have another syntax on top. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From arigo@ulb.ac.be Thu Jun 14 09:58:48 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Thu, 14 Jun 2001 10:58:48 +0200 (MET DST) Subject: [Python-Dev] Special-casing "O" Message-ID: Hello everybody, For comparison purposes, I implemented the idea of optimizing PyArg_ParseTuple calls by modifying the C code itself. Here is the result: http://homepages.ulb.ac.be/~arigo/pyarg_pp.tgz I did not upload this as a patch at SourceForge for several reasons. The most fundamental is that it raises bootstrapping issues: how can we compile the Python interpreter if we first have to run a Python script on the source files ? Fixing this would make the Makefiles significantly more complex. The other reason is that the METH_O solution is probably still faster, as it often completely avoids to build the 1-tuple of arguments. More serious performance tests might be needed, however. A bientot, Armin. From thomas@xs4all.net Thu Jun 14 12:10:01 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 14 Jun 2001 13:10:01 +0200 Subject: [Python-Dev] Releasing 2.0.1 In-Reply-To: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <20010614131001.B1659@xs4all.nl> On Wed, Jun 13, 2001 at 11:19:03AM -0400, Guido van Rossum wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 here. > If you know a good reason why I should hold off on releasing this, or > if you have a patch that absolutely should make it into 2.0.1, please > let me know NOW! This project is way overdue. (Thomas is ready to > release 2.1.1 as soon as this goes out, I believe. :-) Well, not quite, but I can put in a couple of allnighters (I want to do a review of all log-messages since 2.1-final, to see if I missed any checkin messages, and I want to update the NEWS file with a list of bugs fixed) and have it ready in a week or two. I don't think 2.1.1 should be released *that* soon after 2.0.1 anyway. I noticed this in the LICENCE file, by the way: Python 2.1 is a derivative work of Python 1.6.1, as well as of Python 2.0. and 8. By copying, installing or otherwise using Python 2.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Thu Jun 14 12:14:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:14:22 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? Message-ID: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> > Hello all, > > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? IMHO (I'm tweaking Python 2.1 to allow first example > possible), > > >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) > [9, 13, 19, 21, 8, 3, 6] > >>> > > looks much better (and more "functional") than > > >>> x = [5, 8, 9, 3] > >>> x.sort() > >>> x = [3 + x * 2 for x in x] > >>> y = [6, 3, 8] > >>> y.reverse() > >>> x.extend(y) > >>> x > [9, 13, 19, 21, 8, 3, 6] > >>> > > Python designers and fans, please explain it to me :-). > Any comments are welcome. > > Thanks and reply to me directly if possible, > Dmitry Antipov Funny, to me your first form is much harder to read than your second. With the first form, I have to stop and think and look carefully at where the brackets are to see in which order the operations are executed, while in the second form it's obvious, because it's broken down in smaller chunks. So I guess that's the real reason: Python users have a procedural brain, not a functional brain, and we don't like Lispish code. Maybe we also have a smaller brain than the typical Lisper -- I would say, that would make us more normal, and if Python caters to people with a closer-to-average brain size, that would mean more people will be able to program in Python. History will decide... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Jun 14 12:31:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:31:16 -0400 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +1, as long as they're not in the CVS subtree that's normally extracted for a regular source distribution. I propose this location in the CVS tree: python/dist/encodings/... (So 'encodings' would be a sibling of 'src', which has been pretty lonely ever since I started using CVS. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Jun 14 16:19:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 14 Jun 2001 11:19:28 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <200106141114.HAA25430@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jun 14, 2001 at 07:14:22AM -0400 References: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> Message-ID: <20010614111928.A4560@ute.cnri.reston.va.us> On Thu, Jun 14, 2001 at 07:14:22AM -0400, Guido van Rossum wrote: >Maybe we also have a smaller brain than the typical Lisper -- I would >say, that would make us more normal, and if Python caters to people >with a closer-to-average brain size, that would mean more people will >be able to program in Python. History will decide... I thought it already has, pretty much. --amk From tim@digicool.com Thu Jun 14 17:49:07 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 14 Jun 2001 12:49:07 -0400 Subject: [Python-Dev] PEP 255: Simple Generators Message-ID: You can view an HTML version of PEP 255 here: http://python.sourceforge.net/peps/pep-0255.html Discussion should take place primarily on the Python Iterators list: mailto:python-iterators@lists.sourceforge.net If replying directly to this message, please remove (at least) Python-Dev and Python-Announce. PEP: 255 Title: Simple Generators Version: $Revision: 1.3 $ Author: nas@python.ca (Neil Schemenauer), tim.one@home.com (Tim Peters), magnus@hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators@lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 Post-History: 14-Jun-2001 Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. Specification A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase this in. [XXX spell this out] The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). When a return statement is encountered, nothing is returned, but a StopIteration exception is raised, signalling that the iterator is exhausted. The same is true if control flows off the end of the function. Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions. Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print Q & A Q. Why a new keyword? Why not a builtin function instead? A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new keyword makes that easy. Reference Implementation A preliminary patch against the CVS Python source is available[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html Copyright This document has been placed in the public domain. From guido@digicool.com Thu Jun 14 18:30:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 13:30:42 -0400 Subject: [Python-Dev] Python 2.0.1c1 - GPL-compatible release candidate Message-ID: <200106141730.f5EHUgX03621@odiug.digicool.com> With a sigh of relief I announce Python 2.0.1c1 -- the first Python release in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Note that this is a release candidate. We don't expect any problems, but we're being careful nevertheless. We're planning to do the final release of 2.0.1 a week from now; expect it to be identical to the release candidate except for some dotted i's and crossed t's. Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=39267 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Thu Jun 14 12:46:25 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:46:25 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <02db01c0f4c7$a491c620$0900a8c0@spiff> during a late hacking pass, I was perplexed to realized that r"[\u0000-\uffff]" didn't match any unicode character, and reported it as bug #420011. but a few minutes later, I realized that SRE doesn't support \u and \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works as expected. should I close the bug report, or turn it into a feature request? From fredrik@pythonware.com Thu Jun 14 12:52:26 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:52:26 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> Message-ID: <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Paul wrote: > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +0.5 I still think adding them to the core is okay, but that's me. Cheers /F From gward@python.net Thu Jun 14 21:11:49 2001 From: gward@python.net (Greg Ward) Date: Thu, 14 Jun 2001 16:11:49 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <3B28680B.A46CF171@ActiveState.com>; from paulp@ActiveState.com on Thu, Jun 14, 2001 at 12:30:19AM -0700 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> <3B28680B.A46CF171@ActiveState.com> Message-ID: <20010614161149.C9884@gerg.ca> On 14 June 2001, Paul Prescod said: > I would rather see us try a new approach to regular expressions. I've > seen a few proposals for more verbose-but-readable syntaxes. I think one > was from Greg Ewing? And maybe one from Ping? I remember Ping's from a few year's back. It was pretty cool, but awfully verbose. I *like* the compactness of the One True Regex Language (ie. the one implemented by Perl 5, PCRE, and SRE). > For those of us who use regular expressions only once in a while (i.e. > the lucky ones), the current syntax is a holy terror. Which characters > are magical again? In what contexts? With how many levels of > backslashing? Upper case W versus lower case W? Wow, you should try keeping grep vs. egrep vs. sed vs. awk (which version again?) vs. emacs straight. I generally don't bother: as soon as a problem gets too hairy for grep/sed/awk/etc., I whip out my trusty old friend "perl -e" and all is well again. Unless I'm already coding in Python of course, in which case I whip out my trusty old friend re.compile(), and everything just works. I guess I just have a good memory for line noise. > Obviously we can never abandon the tried and true Perl5 RE module, but I > think we could have another syntax on top. Yeah, I s'pose it could be useful. Yet another great teaching tool, at any rate. Greg -- Greg Ward - Python bigot gward@python.net http://starship.python.net/~gward/ Quick!! Act as if nothing has happened! From greg@cosc.canterbury.ac.nz Fri Jun 15 01:56:50 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 12:56:50 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <20010614161149.C9884@gerg.ca> Message-ID: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Paul Prescod: > I think one > was from Greg Ewing? And maybe one from Ping? I can't remember what my first proposal (many years ago now) was like, but you might like to look at what I'm using in my Plex module: http://www.cosc.canterbury.ac.nz/~greg/python/Plex Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From paulp@ActiveState.com Fri Jun 15 02:36:13 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 18:36:13 -0700 Subject: [Python-Dev] sre improvements References: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Message-ID: <3B29668D.ADFB3C22@ActiveState.com> Greg Ewing wrote: > > Paul Prescod: > > > I think one > > was from Greg Ewing? And maybe one from Ping? > > I can't remember what my first proposal (many years ago > now) was like, but you might like to look at what I'm > using in my Plex module: > > http://www.cosc.canterbury.ac.nz/~greg/python/Plex I would be interested in *both* your regular expression library and your lexer for the Python standard library. But separately. Maybe we need two short PEPs that point to the documentation and suggest how the two packages could be integrated into the standard library. What do you think? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Fri Jun 15 02:49:04 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 13:49:04 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <3B29668D.ADFB3C22@ActiveState.com> Message-ID: <200106150149.NAA03631@s454.cosc.canterbury.ac.nz> > I would be interested in *both* your regular expression library and your > lexer for the Python standard library. But separately. Well, the regular expressions aren't really a separable part of Plex. I mentioned it as a possible source of ideas for anyone working on a new syntax for the regexp stuff. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mal@lemburg.com Fri Jun 15 08:58:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 09:58:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Message-ID: <3B29C037.FB1DB6B8@lemburg.com> Fredrik Lundh wrote: > > Paul wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +0.5 > > I still think adding them to the core is okay, but that's me. What would be the threshold for doing so ? Tamito is actively working on reducing the table sizes of the the codecs and after what I have seen you do on these sort of tables I am pretty sure Tamito can turn these tables into shared libs which are smaller than 200k. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From MarkH@ActiveState.com Fri Jun 15 09:05:26 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Fri, 15 Jun 2001 18:05:26 +1000 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B29C037.FB1DB6B8@lemburg.com> Message-ID: > > I still think adding them to the core is okay, but that's me. > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. But isn't this set only one of the many possible Asian codecs? I would have no objection to one 200k module, but if we really wanted to handle "asian codecs" I believe this is only the start. For this reason, I would give a -0 to adding these to the core, and a +1 to adding them to the directory structure proposed by Guido. Mark. From guido@digicool.com Fri Jun 15 17:59:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 12:59:40 -0400 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106151659.MAA30396@cj20424-a.reston1.va.home.com> > during a late hacking pass, I was perplexed to realized that > r"[\u0000-\uffff]" didn't match any unicode character, and reported > it as bug #420011. > > but a few minutes later, I realized that SRE doesn't support \u and > \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works > as expected. > > should I close the bug report, or turn it into a feature request? > > You meant ur"[\u0000-\uffff]", right? (It works the same -- Unicode raw strings still do \u expansion, although the rationale escapes me at the moment -- as does the rationale for why ru"..." is a syntax error...) Looks like a feature request to me. Since \000 and \x00 work in that context, \u0000 would be expected to work. And suppose someone uses u"[\u0000-\u005d]"... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jun 15 20:00:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 15:00:26 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch Message-ID: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> I've checked in Neil's latest generator patch into a branch of the CVS tree. That makes it (hopefully) easier for folks to play with. Tim, can you update the PEP to point to this branch? (There's some boilerplate code about branches in PEP 252 or 253 that you could adapt.) I had to change the code in ceval.c because of recent conflicting changes there. The test suite runs (except test_inspect), but I'd appreciate it if someone (Neil?) could make sure that I didn't overlook anything. (I should probably check the CVS logs. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. If you saw a checkin of Grammar/Grammar in the *head* branch, that was a mistake, and I've already corrected it. From paulp@ActiveState.com Fri Jun 15 20:19:08 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 15 Jun 2001 12:19:08 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> Message-ID: <3B2A5FAC.C5089CC2@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. Don't forget Chinese (Taiwan and mainland) and Korean! I guess I don't see the big deal in making them separate downloads. We can use distutils to make them easy to install .exe's for Reference Python and PPM for ActivePython. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal@lemburg.com Fri Jun 15 21:05:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 22:05:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> <3B2A5FAC.C5089CC2@ActiveState.com> Message-ID: <3B2A6A9B.AC156262@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > What would be the threshold for doing so ? > > > > Tamito is actively working on reducing the table sizes of the the > > codecs and after what I have seen you do on these sort of tables I > > am pretty sure Tamito can turn these tables into shared libs which are > > smaller than 200k. > > Don't forget Chinese (Taiwan and mainland) and Korean! > > I guess I don't see the big deal in making them separate downloads. We > can use distutils to make them easy to install .exe's for Reference > Python and PPM for ActivePython. Ok. BTW, how come www.python.org no longer provides precompiled (contributed) binaries for the various OSes out there ? The FTP server only has these for Python <= 1.5.2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 15 22:39:42 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 15 Jun 2001 17:39:42 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch In-Reply-To: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I've checked in Neil's latest generator patch into a branch of the CVS > tree. That makes it (hopefully) easier for folks to play with. It will for me, and I thank you. > Tim, can you update the PEP to point to this branch? Done. From martin@loewis.home.cs.tu-berlin.de Fri Jun 15 23:17:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 16 Jun 2001 00:17:49 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> > should I close the bug report, or turn it into a feature request? I think the bug report can be closed. Myself, I found it sufficient that you can write normal \u escapes in strings, in particular as you can also use them in raw strings: >>> ur"Ha\u006Clo" u'Hallo' Perhaps not very intuitive, and perhaps even a bug (how do you put a backslash in front of a "u" in a raw unicode string), but useful in this context. Regards, Martin From guido@digicool.com Sat Jun 16 16:46:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 11:46:14 -0400 Subject: [Python-Dev] 2.0.1's GPL-compatibility is official! Message-ID: <200106161546.LAA05521@cj20424-a.reston1.va.home.com> Richard Stallman, Eben Moglen and the FSF agree: Python 2.0.1 is compatible with the GPL. They've updated the text about the Python license on http://www.gnu.org/philosophy/license-list.html, stating in particular: GPL-Compatible, Free Software Licenses [...] The License of Python 1.6a2 and earlier versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that newer versions of Python are under other licenses (see below). The License of Python 2.0.1, 2.1.1, and newer versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that intermediate versions of Python (1.6b1, through 2.0 and 2.1) are under a different license (see below). I would like to emphasize and clarify (again!) that Python is *not* released under the GPL, so if you think the GPL is a bad thing, you don't have to worry about Python being contaminated. The GPL compatibility is important for folks who distribute Python binaries: e.g. the new license makes it okay to release Python binaries linked with GNU readline and other GPL-covered libraries. We'll release the final release of 2.0.1 within a week; so far we've had only one bug reported in the release candidate. I expect that we won't have to wait long for 2.1.1, which will have the same GPL-compatible license as 2.0.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat Jun 16 17:10:27 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 12:10:27 -0400 Subject: [Python-Dev] contributed binaries (was: Adding Asian codecs...) Message-ID: <200106161610.MAA05684@cj20424-a.reston1.va.home.com> > BTW, how come www.python.org no longer provides precompiled > (contributed) binaries for the various OSes out there ? > The FTP server only has these for Python <= 1.5.2. There are some binaries for newer versions, mostly Linux RPMs, but these are in different places. I agree the FTP download area is a mess. I propose to give up on the FTP area and start over on the new Zope-based web server, if and when it's ready. Not enough people are helping out, so it's going slowly. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sat Jun 16 19:59:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 16 Jun 2001 20:59:52 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions References: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> Message-ID: <3B2BACA7.CDA96737@lemburg.com> "Martin v. Loewis" wrote: > > > should I close the bug report, or turn it into a feature request? > > I think the bug report can be closed. Myself, I found it sufficient > that you can write normal \u escapes in strings, in particular as you > can also use them in raw strings: > > >>> ur"Ha\u006Clo" > u'Hallo' > > Perhaps not very intuitive, and perhaps even a bug (how do you put a > backslash in front of a "u" in a raw unicode string), but useful in > this context. >>> print ur"backslash in front of an 'u': \u005cu" backslash in front of an 'u': \u A double backslash is easier to have: >>> print ur"double backslash in front of an 'u': \\u" double backslash in front of an 'u': \\u Python uses C's convention for \uXXXX where \u is only interpreted as Unicode escape of it is used with an odd number of backslashes in front of it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Mon Jun 18 01:57:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 17 Jun 2001 20:57:53 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <20010614111928.A4560@ute.cnri.reston.va.us> Message-ID: [Guido] > Maybe we also have a smaller brain than the typical Lisper -- I would > say, that would make us more normal, and if Python caters to people > with a closer-to-average brain size, that would mean more people will > be able to program in Python. History will decide... [Andrew Kuchling] > I thought it already has, pretty much. OK, I've kept quiet for days, but can't bear it any longer: Andrew, are you waiting for someone to *force* you to immortalize this exchange in your Python Quotes collection? If so, the PSU knows where you liv From mal@lemburg.com Mon Jun 18 11:14:04 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 18 Jun 2001 12:14:04 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> Message-ID: <3B2DD46C.EEC20857@lemburg.com> Guido van Rossum wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +1, as long as they're not in the CVS subtree that's normally > extracted for a regular source distribution. I propose this location > in the CVS tree: > > python/dist/encodings/... > > (So 'encodings' would be a sibling of 'src', which has been pretty > lonely ever since I started using CVS. ;-) Ok. When Tamito has completed his work on the codecs (he is currently reimplementing them in C), I'll check them in under the new directory. BTW, how should we ship these codecs ? I'd propose to provide a distutils setup.py file which wraps up all codecs under encodings and can be used to create a standard Python add-on "Python-X.X Encoding Add-on". The generated files should then ideally be published right next to the Python source/binary links on the python.org web-pages to achieve high visibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Mon Jun 18 13:25:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 18 Jun 2001 08:25:35 -0400 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: Your message of "Mon, 18 Jun 2001 12:14:04 +0200." <3B2DD46C.EEC20857@lemburg.com> References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> <3B2DD46C.EEC20857@lemburg.com> Message-ID: <200106181225.IAA15518@cj20424-a.reston1.va.home.com> > Ok. When Tamito has completed his work on the codecs (he is currently > reimplementing them in C), I'll check them in under the new directory. Excellent! > BTW, how should we ship these codecs ? > > I'd propose to provide a distutils setup.py file which wraps up > all codecs under encodings and can be used to create a standard > Python add-on "Python-X.X Encoding Add-on". Sounds like a good plan. > The generated files should then ideally be published right next > to the Python source/binary links on the python.org web-pages to > achieve high visibility. Sure, for some defininition of "right next to" :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Mon Jun 18 15:35:12 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 18 Jun 2001 16:35:12 +0200 Subject: [Python-Dev] Moshe Message-ID: <20010618163512.D8098@xs4all.nl> Just FYI: Moshe has been sighted, alive and well. He's been caught up in personal matters, apparently. He apologized and said he'd mail python-dev with an update soonish. Don't-you-wish-you-lurked-on-#python-too-ly y'rs ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From m.favas@per.dem.csiro.au Mon Jun 18 22:28:23 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 05:28:23 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? Message-ID: <3B2E7277.D6109E7E@per.dem.csiro.au> [Platform: Tru64 Unix, Compaq C compiler) The current CVS of 2.2a0 fails test_struct for me with: test test_struct failed -- pack('>i', -2147483649) did not raise error more extensively, trying std iI on -2147483649 == 0xffffffff7fffffff Traceback (most recent call last): File "Lib/test/test_struct.py", line 367, in ? t.run() File "Lib/test/test_struct.py", line 353, in run self.test_one(x) File "Lib/test/test_struct.py", line 269, in test_one any_err(pack, ">" + code, x) File "Lib/test/test_struct.py", line 38, in any_err raise TestFailed, "%s%s did not raise error" % ( test_support.TestFailed: pack('>i', -2147483649) did not raise error A 64-bit platform issue? Also, the current imap.py causes "make test" (test___all__ and test_sundry) to fail with: "exceptions.TabError: inconsistent use of tabs and spaces in indentation (imaplib.py, line 576)" - untested checkin ? -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim@digicool.com Mon Jun 18 23:04:06 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 18 Jun 2001 18:04:06 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: [Mark Favas] > [Platform: Tru64 Unix, Compaq C compiler) > The current CVS of 2.2a0 fails test_struct for me with: > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > more extensively, > trying std iI on -2147483649 == 0xffffffff7fffffff > Traceback (most recent call last): > File "Lib/test/test_struct.py", line 367, in ? > t.run() > File "Lib/test/test_struct.py", line 353, in run > self.test_one(x) > File "Lib/test/test_struct.py", line 269, in test_one > any_err(pack, ">" + code, x) > File "Lib/test/test_struct.py", line 38, in any_err > raise TestFailed, "%s%s did not raise error" % ( > test_support.TestFailed: pack('>i', -2147483649) did not raise error > > A 64-bit platform issue? In test_struct.py, please change this line (right after "class IntTester"): BUGGY_RANGE_CHECK = "bBhHIL" to BUGGY_RANGE_CHECK = "bBhHiIlL" and try again. I suspect you're bumping into a pre-existing bug that simply wasn't checked before (and, yes, there's A Reason it *may* screw up on a 64-bit box but not a 32-bit one). Note that since in standard mode, "i" is considered to be a 4-byte int regardless of platform, we really *should* bitch about trying to pack -2147483649 under "i" (but we don't -- and in general no codes except the new q/Q reliably bitch about out-of-range errors in the standard modes). > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? Leaving that to some loser who cares about whitespace . From m.favas@per.dem.csiro.au Mon Jun 18 23:11:37 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 06:11:37 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? References: Message-ID: <3B2E7C99.E9BEFC3C@per.dem.csiro.au> [Tim Peters suggests] > > [Mark Favas] > > [Platform: Tru64 Unix, Compaq C compiler) > > The current CVS of 2.2a0 fails test_struct for me with: > > > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > In test_struct.py, please change this line (right after "class IntTester"): > > BUGGY_RANGE_CHECK = "bBhHIL" > > to > > BUGGY_RANGE_CHECK = "bBhHiIlL" > > and try again. Yep, passes with this change. > > Also, the current imap.py causes "make test" (test___all__ and > > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > > tabs and spaces in indentation (imaplib.py, line 576)" - untested > > checkin ? > > Leaving that to some loser who cares about whitespace . Guess we'll have to advertise widely, then . -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From barry@digicool.com Mon Jun 18 23:28:21 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 18 Jun 2001 18:28:21 -0400 Subject: [Python-Dev] Bogosities in quopri module? Message-ID: <15150.32901.611349.524220@yyz.digicool.com> I've been playing a bit with the quopri module (trying to support RFC 2047 in mimelib), and I've run across a few bogosities that I'd like to fix. Fixing some of them could break code, so I wanted to see what people think first. First, quopri should have encodestring() and decodestring() functions which take a string and return a string. This would make it more consistent API-wise with e.g. base64. One difference is that quopri.encodestring() should probably take a default argument quotetabs (defaulted to 1) for passing to the encode() function. This shouldn't be very controversial. I think there are two problems with encode(). First, it always tacks on an extra \n character, such that an encode->decode roundtrip is not idempotent. I propose fixing this so that encode() doesn't add the extra newline, but this can break code that expects that newline to be present. Third, I think that encode()'s quotetabs flag should also apply to spaces. RFC 1521 says that both ASCII tabs and spaces may be encoded, and I don't think it's worthwhile that there be a separate flag to independently choose to encode tabs or spaces. Lastly, if you buy the extra-newline solution above, then encode() has to be fixed w.r.t. trailing spaces and tabs. Currently, an encode->decode roundtrip for, e.g. "hello " returns "hello =\n", but what it should really return is "hello=20". Likewise "hello\t" should return "hello=09". The patches must take multiline strings into account though, so that it doesn't chomp newlines out of """hello great big world """ I haven't worked up a patch yet, but when I do I'll upload it to SF to get some feedback. I think there are a few other things in the module that could be cleaned up. I also plan to add a test_quopri.py. Comments? -Barry From see@my.signature Tue Jun 19 07:21:14 2001 From: see@my.signature (Greg Ewing) Date: Tue, 19 Jun 2001 18:21:14 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Something is bothering me about this. In fact, it's bothering me a LOT. In the following, will f() work as a generator-function: def f(): for i in range(5): g(i) def g(i): for j in range(10): yield i,j If I understand PEP255 correctly, this will *not* work. But it seems entirely reasonable to me that it *should* work. It *has* to work, otherwise how am I to write generators that are too complicated to fit into a single function? Someone please tell me I'm wrong about this! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From jepler@inetnebr.com Tue Jun 19 14:25:23 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Tue, 19 Jun 2001 08:25:23 -0500 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619082522.A12200@inetnebr.com> On Tue, Jun 19, 2001 at 06:21:14PM +1200, Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. But it seems entirely reasonable to me that > it *should* work. It *has* to work, otherwise how > am I to write generators that are too complicated > to fit into a single function? The following similar code seems to produce the results you have in mind. def f(): for i in range(5): #g(i) #yield g(i) for x in g(i): yield x def g(i): for j in range(10): yield i, j It would be nice to have a succinct way to say 'for dummy in iterator: yield dummy'. Maybe 'yield from iterator'? Then f would become: def f(): for i in range(5): yield from g(i) Jeff PS I noticed that the generator branch got merged into the trunk. Cool! From fdrake@acm.org Tue Jun 19 14:24:46 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 09:24:46 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 Message-ID: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> I built GCC 3.0 last night, and Python built and passed the regression tests. I've not done any further comparisons, but using --with-cxx=... failed; the C++ ABI changed and a new version of the C++ runtime is required before that will work. I didn't want to install that over my working installation, just in case. ;-) I'll report more as I find out more. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas@python.ca Tue Jun 19 15:00:39 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 07:00:39 -0700 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619070039.A13712@glacier.fnational.com> Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. No, it will not work. The title of PEP 255 is "Simple Generators". What you want will require something like stackless in order to get the C stack out of the way. That's a major change to the Python internals. To make your example work you need to do: def f(): for i in range(5): for j in g(i): yield j def g(i): for j in range(10): yield i,j Stackless may still be in Python's future but no for 2.2. Neil From barry@digicool.com Tue Jun 19 15:19:58 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 19 Jun 2001 10:19:58 -0400 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> Message-ID: <15151.24462.400930.295658@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr