From tim.one@home.com Fri Jun 1 01:24:01 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 20:24:01 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0005_01C0EA0F.A145F760 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Another version of the patch attached, a bit faster and with a large new comment block explaining it. It's looking good! As I hope the new comments make clear, nothing about this approach is "a mystery" -- there are explainable reasons for each fiddly bit. This gives me more confidence in it than in the previous approach, and, indeed, it turned out that when I *thought* "hmm! I bet this change would be a little faster!", it actually was . ------=_NextPart_000_0005_01C0EA0F.A145F760 Content-Type: text/plain; name="dict.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dict.txt" Index: Objects/dictobject.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.96 diff -c -r2.96 dictobject.c *** Objects/dictobject.c 2001/05/27 07:39:22 2.96 --- Objects/dictobject.c 2001/06/01 00:17:07 *************** *** 12,123 **** */ #define MINSIZE 8 =20 ! /* define this out if you don't want conversion statistics on exit */ #undef SHOW_CONVERSION_COUNTS =20 /* ! Table of irreducible polynomials to efficiently cycle through ! GF(2^n)-{0}, 2<=3Dn<=3D30. A table size is always a power of 2. ! For a table size of 2**i, the polys entry is 2**i + j for some j in 1 = thru ! 2**i-1 inclusive. The polys[] entries here happen to add in the = smallest j ! values "that work". Work means this: given any integer k in 1 thru = 2**i-1 ! inclusive, a poly works if & only if repeating this code: ! print k ! k <<=3D 1 ! if k >=3D 2**i: ! k ^=3D poly ! prints every integer in 1 thru 2**i-1 inclusive exactly once before = printing=20 ! k a second time. Theory can be used to find such polys efficiently, = but the=20 ! operational defn. of "works" is sufficient to find them in reasonable = time=20 ! via brute force program (hint: any poly that has an even number of 1 = bits=20 ! cannot work; ditto any poly with low bit 0; exploit those). !=20 ! Some major subtleties: Most hash schemes depend on having a "good" = hash ! function, in the sense of simulating randomness. Python doesn't: = some of ! its hash functions are trivial, such as hash(i) =3D=3D i for ints i = (excepting ! i =3D=3D -1, because -1 is the "error occurred" return value from = tp_hash). !=20 ! This isn't necessarily bad! To the contrary, that our hash tables are = powers ! of 2 in size, and that we take the low-order bits as the initial table = index, ! means that there are no collisions at all for dicts indexed by a = contiguous ! range of ints. This is "better than random" behavior, and that's very ! desirable. !=20 ! On the other hand, when collisions occur, the tendency to fill = contiguous ! slices of the hash table makes a good collision resolution strategy = crucial; ! e.g., linear probing is right out. !=20 ! Reimer Behrends contributed the idea of using a polynomial-based = approach,=20 ! using repeated multiplication by x in GF(2**n) where a polynomial is = chosen=20 ! such that x is a primitive root. This visits every table location = exactly=20 ! once, and the sequence of locations probed is highly non-linear. !=20 ! The same is also largely true of quadratic probing for power-of-2 = tables, of ! the specific !=20 ! (i + comb(1, 2)) mod size ! (i + comb(2, 2)) mod size ! (i + comb(3, 2)) mod size ! (i + comb(4, 2)) mod size ! ... ! (i + comb(j, 2)) mod size !=20 ! flavor. The polynomial approach "scrambles" the probe indices better, = but ! more importantly allows to get *some* additional bits of the hash code = into ! play via computing the initial increment, thus giving a weak form of = double ! hashing. Quadratic probing cannot be extended that way (the first = probe ! offset must be 1, the second 3, the third 6, etc). !=20 ! Christian Tismer later contributed the idea of using polynomial = division ! instead of multiplication. The problem is that the multiplicative = method ! can't get *all* the bits of the hash code into play without expensive ! computations that slow down the initial index and/or initial increment ! computation. For a set of keys like [i << 16 for i in range(20000)], = under ! the multiplicative method the initial index and increment were the = same for ! all keys, so every key followed exactly the same probe sequence, and = so ! this degenerated into a (very slow) linear search. The division = method uses ! all the bits of the hash code naturally in the increment, although it = *may* ! visit locations more than once until such time as all the high bits of = the ! increment have been shifted away. It's also impossible to tell in = advance ! whether incr is congruent to 0 modulo poly, so each iteration of the = loop has ! to guard against incr becoming 0. These are minor costs, as we = usually don't ! get into the probe loop, and when we do we usually get out on its = first ! iteration. */ =20 - static long polys[] =3D { - /* 4 + 3, */ /* first active entry if MINSIZE =3D=3D 4 */ - 8 + 3, /* first active entry if MINSIZE =3D=3D 8 */ - 16 + 3, - 32 + 5, - 64 + 3, - 128 + 3, - 256 + 29, - 512 + 17, - 1024 + 9, - 2048 + 5, - 4096 + 83, - 8192 + 27, - 16384 + 43, - 32768 + 3, - 65536 + 45, - 131072 + 9, - 262144 + 39, - 524288 + 39, - 1048576 + 9, - 2097152 + 5, - 4194304 + 3, - 8388608 + 33, - 16777216 + 27, - 33554432 + 9, - 67108864 + 71, - 134217728 + 39, - 268435456 + 9, - 536870912 + 5, - 1073741824 + 83 - /* 2147483648 + 9 -- if we ever boost this to unsigned long */ - }; -=20 /* Object used as dummy key to fill deleted entries */ static PyObject *dummy; /* Initialized by first call to = newdictobject() */ =20 --- 12,117 ---- */ #define MINSIZE 8 =20 ! /* Define this out if you don't want conversion statistics on exit. */ #undef SHOW_CONVERSION_COUNTS =20 + /* See large comment block below. This must be >=3D 1. */ + #define PERTURB_SHIFT 5 +=20 /* ! Major subtleties ahead: Most hash schemes depend on having a "good" = hash ! function, in the sense of simulating randomness. Python doesn't: its = most ! important hash functions (for strings and ints) are very regular in = common ! cases: !=20 ! >>> map(hash, (0, 1, 2, 3)) ! [0, 1, 2, 3] ! >>> map(hash, ("namea", "nameb", "namec", "named")) ! [-1658398457, -1658398460, -1658398459, -1658398462] ! >>> !=20 ! This isn't necessarily bad! To the contrary, in a table of size 2**i, = taking ! the low-order i bits as the initial table index is extremely fast, and = there ! are no collisions at all for dicts indexed by a contiguous range of = ints. ! The same is approximately true when keys are "consecutive" strings. = So this ! gives better-than-random behavior in common cases, and that's very = desirable. !=20 ! OTOH, when collisions occur, the tendency to fill contiguous slices of = the ! hash table makes a good collision resolution strategy crucial. Taking = only ! the last i bits of the hash code is also vulnerable: for example, = consider ! [i << 16 for i in range(20000)] as a set of keys. Since ints are = their own ! hash codes, and this fits in a dict of size 2**15, the last 15 bits of = every ! hash code are all 0: they *all* map to the same table index. !=20 ! But catering to unusual cases should not slow the usual ones, so we = just take ! the last i bits anyway. It's up to collision resolution to do the = rest. If ! we *usually* find the key we're looking for on the first try (and, it = turns ! out, we usually do -- the table load factor is kept under 2/3, so the = odds ! are solidly in our favor), then it makes best sense to keep the = initial index ! computation dirt cheap. !=20 ! The first half of collision resolution is to visit table indices via = this ! recurrence: !=20 ! j =3D ((5*j) + 1) mod 2**i !=20 ! For any initial j in range(2**i), repeating that 2**i times generates = each ! int in range(2**i) exactly once (see any text on random-number = generation for ! proof). By itself, this doesn't help much: like linear probing = (setting j ! +=3D 1, or j -=3D 1, on each loop trip), it scans the table entries in = a fixed ! order. This would be bad, except that's not the only thing we do, and = it's ! actually *good* in the common cases where hash keys are consecutive. = In an ! example that's really too small to make this entirely clear, for a = table of ! size 2**3 the order of indices is: !=20 ! 0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 [and here it's = repeating] !=20 ! If two things come in at index 5, the first place we look after is = index 2, ! not 6, so if another comes in at index 6 the collision at 5 didn't = hurt it. ! Linear probing is deadly in this case because there the fixed probe = order ! is the *same* as the order consecutive keys are likely to arrive. But = it's ! extremely unlikely hash codes will follow a 5*j+1 recurrence by = accident, ! and certain that consecutive hash codes do not. !=20 ! The other half of the strategy is to get the other bits of the hash = code ! into play. This is done by initializing a (unsigned) vrbl "perturb" = to the ! full hash code, and changing the recurrence to: !=20 ! j =3D (5*j) + 1 + perturb; ! perturb >>=3D PERTURB_SHIFT; ! use j % 2**i as the next table index; !=20 ! Now the probe sequence depends (eventually) on every bit in the hash = code, ! and the pseudo-scrambling property of recurring on 5*j+1 us more = valuable. ! because it quickly magnifies small differences in the bits that didn't = affect ! the initial index. Note that because perturb is unsigned, if the = recurrence ! is executed often enough perturb eventually becomes and remains 0. At = that ! point (very rarely reached) the recurrence is on (just) 5*j+1 again, = and ! that's certain to find an empty slot eventually (since it generates = every int ! in range(2**i), and we make sure there's always at least one empty = slot). !=20 ! Selecting a good value for PERTURB_SHIFT is a balancing act. You want = it ! small so that the high bits of the hash code continue to affect the = probe ! sequence across iterations; but you want it large so that in really = bad cases ! the high-order hash bits have an effect on early iterations. 5 was = "the ! best" in minimizing total collisions across experiments Tim Peters = ran (on ! both normal and pathological cases), but 4 and 6 weren't significantly = worse. !=20 ! Historical: Reimer Behrends contributed the idea of using a = polynomial-based ! approach, using repeated multiplication by x in GF(2**n) where an = irreducible ! polynomial for each table size was chosen such that x was a primitive = root. ! Christian Tismer later extended that to use division by x instead, as = an ! efficient way to get the high bits of the hash code into play. This = scheme ! also gave excellent collision statistics, but was more expensive: two ! if-tests were required inside the loop; computing "the next" index = took about ! the same number of operations but without as much potential = parallelism ! (e.g., computing 5*j can go on at the same time as computing 1+perturb = in the ! above, and then shifting perturb can be done while the table index is = being ! masked); and the dictobject struct required a member to hold the = table's ! polynomial. In Tim's experiments the current scheme ran faster, and = with ! less code and memory. */ =20 /* Object used as dummy key to fill deleted entries */ static PyObject *dummy; /* Initialized by first call to = newdictobject() */ =20 *************** *** 168,174 **** int ma_fill; /* # Active + # Dummy */ int ma_used; /* # Active */ int ma_size; /* total # slots in ma_table */ - int ma_poly; /* appopriate entry from polys vector */ /* ma_table points to ma_smalltable for small tables, else to * additional malloc'ed memory. ma_table is never NULL! This rule * saves repeated runtime null-tests in the workhorse getitem and --- 162,167 ---- *************** *** 202,209 **** (mp)->ma_table =3D (mp)->ma_smalltable; \ (mp)->ma_size =3D MINSIZE; \ (mp)->ma_used =3D (mp)->ma_fill =3D 0; \ - (mp)->ma_poly =3D polys[0]; \ - assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2); \ } while(0) =20 PyObject * --- 195,200 ---- *************** *** 235,262 **** This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead). - However, instead of going through the table at constant steps, we = cycle - through the values of GF(2^n). This avoids modulo computations, being - much cheaper on RISC machines, without leading to clustering. -=20 - The initial probe index is computed as hash mod the table size. - Subsequent probe indices use the values of x^i in GF(2^n)-{0} as an = offset, - where x is a root. The initial offset is derived from hash, too. =20 All arithmetic on hash should ignore overflow. =20 ! (This version is due to Reimer Behrends, some ideas are also due to ! Jyrki Alakuijala and Vladimir Marangozov.) =20 This function must never return NULL; failures are indicated by = returning a dictentry* for which the me_value field is NULL. Exceptions are = never reported by this function, and outstanding exceptions are maintained. */ static dictentry * lookdict(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int incr; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; --- 226,268 ---- This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead). =20 + The initial probe index is computed as hash mod the table size. = Subsequent + probe indices are computed as explained earlier. +=20 All arithmetic on hash should ignore overflow. =20 ! (The details in this version are due to Tim Peters, building on many = past ! contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir = Marangozov and ! Christian Tismer). =20 This function must never return NULL; failures are indicated by = returning a dictentry* for which the me_value field is NULL. Exceptions are = never reported by this function, and outstanding exceptions are maintained. */ +=20 + /* #define DUMP_HASH_STUFF */ + #ifdef DUMP_HASH_STUFF + static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0; + #define BUMP_ENTRY ++nEntry + #define BUMP_COLLIDE ++nCollide + #define BUMP_TRIP ++nTrip + #define PRINT_HASH_STUFF \ + if ((nEntry & 0x1ff) =3D=3D 0) \ + fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip) +=20 + #else + #define BUMP_ENTRY + #define BUMP_COLLIDE + #define BUMP_TRIP + #define PRINT_HASH_STUFF + #endif +=20 static dictentry * lookdict(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int perturb; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; *************** *** 265,273 **** register int checked_error =3D 0; register int cmp; PyObject *err_type, *err_value, *err_tb; ! /* We must come up with (i, incr) such that 0 <=3D i < ma_size ! and 0 < incr < ma_size and both are a function of hash. ! i is the initial table index and incr the initial probe offset. */ i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) --- 271,277 ---- register int checked_error =3D 0; register int cmp; PyObject *err_type, *err_value, *err_tb; ! BUMP_ENTRY; i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) *************** *** 294,309 **** } freeslot =3D NULL; } ! /* Derive incr from hash, just to make it more arbitrary. Note that ! incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); --- 298,310 ---- } freeslot =3D NULL; } ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) { ! BUMP_TRIP; ! i =3D (i << 2) + i + perturb + 1; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); *************** *** 335,344 **** } else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 336,341 ---- *************** *** 356,362 **** lookdict_string(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int incr; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; --- 353,359 ---- lookdict_string(dictobject *mp, PyObject *key, register long hash) { register int i; ! register unsigned int perturb; register dictentry *freeslot; register unsigned int mask =3D mp->ma_size-1; dictentry *ep0 =3D mp->ma_table; *************** *** 370,377 **** mp->ma_lookup =3D lookdict; return lookdict(mp, key, hash); } ! /* We must come up with (i, incr) such that 0 <=3D i < ma_size ! and 0 < incr < ma_size and both are a function of hash */ i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) --- 367,374 ---- mp->ma_lookup =3D lookdict; return lookdict(mp, key, hash); } ! BUMP_ENTRY; ! PRINT_HASH_STUFF; i =3D hash & mask; ep =3D &ep0[i]; if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key) *************** *** 385,400 **** } freeslot =3D NULL; } ! /* Derive incr from hash, just to make it more arbitrary. Note that ! incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key --- 382,394 ---- } freeslot =3D NULL; } ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ ! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) { ! BUMP_TRIP; ! i =3D (i << 2) + i + perturb + 1; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key *************** *** 404,413 **** return ep; if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 398,403 ---- *************** *** 448,454 **** static int dictresize(dictobject *mp, int minused) { ! int newsize, newpoly; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; --- 438,444 ---- static int dictresize(dictobject *mp, int minused) { ! int newsize; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; *************** *** 456,475 **** =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused, and its poly[] entry. */ ! newpoly =3D 0; ! newsize =3D MINSIZE; ! for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) { ! if (newsize > minused) { ! newpoly =3D polys[i]; ! break; ! } ! newsize <<=3D 1; ! if (newsize < 0) /* overflow */ ! break; ! } ! if (newpoly =3D=3D 0) { ! /* Ran out of polynomials or newsize overflowed. */ PyErr_NoMemory(); return -1; } --- 446,457 ---- =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused. */ ! for (newsize =3D MINSIZE; ! newsize <=3D minused && newsize >=3D 0; ! newsize <<=3D 1) ! ; ! if (newsize < 0) { PyErr_NoMemory(); return -1; } *************** *** 511,517 **** mp->ma_table =3D newtable; mp->ma_size =3D newsize; memset(newtable, 0, sizeof(dictentry) * newsize); - mp->ma_poly =3D newpoly; mp->ma_used =3D 0; i =3D mp->ma_fill; mp->ma_fill =3D 0; --- 493,498 ---- *************** *** 1255,1261 **** if (a->ma_used !=3D b->ma_used) /* can't be equal if # of entries differ */ return 0; ! =20 /* Same # of entries -- check all of 'em. Exit early on any diff. */ for (i =3D 0; i < a->ma_size; i++) { PyObject *aval =3D a->ma_table[i].me_value; --- 1236,1242 ---- if (a->ma_used !=3D b->ma_used) /* can't be equal if # of entries differ */ return 0; !=20 /* Same # of entries -- check all of 'em. Exit early on any diff. */ for (i =3D 0; i < a->ma_size; i++) { PyObject *aval =3D a->ma_table[i].me_value; ------=_NextPart_000_0005_01C0EA0F.A145F760-- From tim.one@home.com Fri Jun 1 02:32:30 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 21:32:30 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com> Message-ID: Heh. I was implementing 128-bit floats in software, for Cray, in about 1980. They didn't do it because they *wanted* to make the Cray boxes look like pigs . A 128-bit float type is simply necessary for some scientific work: not all problems are well-conditioned, and the "extra" bits can vanish fast. Went thru the same bit at KSR. Just yesterday Konrad Hinsen was worrying on c.l.py that his scripts that took 2 hours using native floats zoomed to 5 days when he started using GMP's arbitrary-precision float type *just* to get 100 bits of precision. When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was never quite sure why the founders thought that would be a killer selling point, but it wasn't for floats. Down in the trenches we thought it would be mondo cool to have an address space so large that for the rest of our lives we'd never need to bother calling free() again <0.8 wink>. From tim.one@home.com Fri Jun 1 02:46:11 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 21:46:11 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531124533.J690@xs4all.nl> Message-ID: [Thomas Wouters] > Why ? Bumping register size doesn't mean Intel expects to use it all as > address space. They could be used for video-processing, Bingo. Common wisdom holds that vector machines are dead, but the truth is virtually *everyone* runs on a vector box now: Intel just renamed "vector" to "multimedia" (or AMD to "3D Now!"), and adopted a feeble (but ever-growing) subset of traditional vector machines' instruction sets. > or to represent a modest range of rationals , or to help core > 'net routers deal with those nasty IPv6 addresses. KSR's founders had in mind bit-level addressability of networks of machines spanning the globe. Were he to press the point, though, I'd have to agree with Eric that they didn't really *need* 128 bits for that modest goal. > I'm sure cryptomunchers would like bigger registers as well. Agencies we can't talk about would like them as big as they can get them. Each vector register in a Cray box actually consisted of 64 64-bit words, or 4K bits per register. Some "special" models were constructed where the vector FPU was thrown away and additional bit-fiddling units added in its place: they really treated the vector registers as giant bitstrings, and didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. > Oh wait... I get it! You were trying to get yourself in the > historybooks as the guy that said "64 bits ought to be enough for > everyone" :-) That would be foolish indeed! 128, though, now *that's* surely enough for at least a decade . From fdrake@acm.org Fri Jun 1 02:45:45 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 21:45:45 -0400 (EDT) Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531044332.B5026@thyrsus.com> Message-ID: <15126.62409.909290.736779@cj42289-a.reston1.va.home.com> Tim Peters writes: > When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was > never quite sure why the founders thought that would be a killer selling > point, but it wasn't for floats. Down in the trenches we thought it would > be mondo cool to have an address space so large that for the rest of our > lives we'd never need to bother calling free() again <0.8 wink>. And given what (little) I know about the memory architecture on those things, that actually would have be quite reasonable on that platform! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Fri Jun 1 03:23:47 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 22:23:47 -0400 Subject: [Python-Dev] FW: CP4E and Python newbies, it works! Message-ID: Good for the soul! -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Ron Stephens [mailto:rdsteph@earthlink.net] Sent: Thursday, May 31, 2001 7:12 PM To: python-list@python.org Subject: CP4E and Python newbies, it works! I am a complete newbie, and with a very low programming IQ. Although I had programmed a little in college thirty years ago, in Basic, PL/1 and a very little assembler, and fooled around in later years on PC's at home with Basic, then tried PERL, then an effort at Java, they were all too much trouble to really use to program, given that it was a *hobby* that was supposed to be fun. After all, I have a demanding day job that has nothing to do with software, that requires extensive travel, and four kids, a wife, two dogs, and a cat. Java et al, by the time I had digested a couple of books and put in a lot of hours, was just no fun at all to program; and I had to look in the book every other line of code just to recall the syntax etc.; I could not keep it in my head. Now, four months into Python, after being attracted by reading a blurb about Guido van Rossum's Computer Programming for Everybody project, I am in awe of his achievement. I am having fun; and if I can do so then almost anyone can. I am really absent minded, lazy, and not good at detail. Yet I have done the following in four months, and I believe Python therefore has the potential to open up programming to a much wider audience for a lot of people, which is nice: 1. I have written a half dozen scripts that are meaningful to me in Python, more than I ever accomplished with any other language. 2. I am able to have fun by sitting down in the evening, or especially on a weekend, and just programming in Python. The syntax and keywords are gratifyingly just in my head, enough anyway that I can just program like I am having a conversation, and check the details later for errors etc. This is the most satisfying thing of all. 3. I find the debugger just works; magically, it helps me turn my scripts into actual working programs, simply by rather mindlessly following the road laid out for me by using the debugger. 4. I have pleasurably read more Python books from front cover to back than I care to admit. I must be enjoying myself ;-))) 5. I am exploring Jython, which is also pleasurable. After fooling around with Java a couple of years ago, it is really a kick to see jython generating such detailed Java code for me, just as if I had written it (but it would have taken me untold pain to actually do so in Java). Whether or not I actually end up using the java code so generated, I still am enjoying the sheer experience. 6. I have Zope and other things to look forward to. 7. I am able to enjoy the discussions on this newsgroup, even though they are over my head technically. I find them intriguing. Now, I may never actually accomplish anything truly useful by my programming. But I am happy. I hope that others, younger and brighter than myself, who have an interest in programming, but need the right stimulus to get going, will find Python and produce programs of real value. I think Guido van Rossum and his team should be very proud of what they are enabling. The CP4E idea is alive and well. My hat's off to Guido and the whole community which he has spawned, especially those on this newsgroup. I am humbled and honored to read your erudite technical discussions, as a voyeur of mysteries and wonders I can only dimly see on the horizon, but that nonetheless fill me with mental delight. Ron Stephens -- http://mail.python.org/mailman/listinfo/python-list From esr@thyrsus.com Fri Jun 1 04:51:48 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:51:48 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:32:30PM -0400 References: <20010531044332.B5026@thyrsus.com> Message-ID: <20010531235148.B14591@thyrsus.com> Tim Peters : > A 128-bit float type is simply necessary for some > scientific work: not all problems are well-conditioned, and the "extra" > bits can vanish fast. Makes me wonder how competent your customers' numerical analysts were. Where the heck did they think they were getting data with that many digits of accuracy? (Note that I didn't say "precision"...) -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From esr@thyrsus.com Fri Jun 1 04:54:33 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:54:33 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:46:11PM -0400 References: <20010531124533.J690@xs4all.nl> Message-ID: <20010531235433.C14591@thyrsus.com> Tim Peters : > Agencies we can't talk about would like them as big as they can get them. > Each vector register in a Cray box actually consisted of 64 64-bit words, or > 4K bits per register. Some "special" models were constructed where the > vector FPU was thrown away and additional bit-fiddling units added in its > place: they really treated the vector registers as giant bitstrings, and > didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. You've got a point...but I don't think it's really economical to build that kind of hardware into general-purpose processors. You end up with a camel. You know, a horse designed by committee? -- Eric S. Raymond To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 From tim.one@home.com Fri Jun 1 07:58:08 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 02:58:08 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235148.B14591@thyrsus.com> Message-ID: [Tim] > A 128-bit float type is simply necessary for some scientific work: not > all problems are well-conditioned, and the "extra" bits can vanish fast. [ESR] > Makes me wonder how competent your customers' numerical analysts were. > Where the heck did they think they were getting data with that many > digits of accuracy? (Note that I didn't say "precision"...) Not all scientific work consists of predicting the weather with inputs known to half a digit on a calm day . Knuth gives examples of ill-conditioned problems where resorting to unbounded rationals is faster than any known stable f.p. approach (stuck with limited precision) -- think, e.g., chaotic systems here, which includes parts of many hydrodynamics problems in real life. Some scientific work involves modeling ab initio across trillions of computations (and on a Cray box in particular, where addition didn't even bother to round, nor multiplication bother to compute the full product tree, the error bounds per operation were much worse than in a 754 world). You shouldn't overlook either that algorithms often needed massive rewriting to exploit vector and parallel architectures, and in a world where a supremely competent numerical analysis can take a month to verify the numerical robustness of a new algorithm covering two pages of Fortran, a million lines of massively reworked seat-of-the-pants modeling code couldn't be trusted at all without running it under many conditions in at least two precisions (it only takes one surprise catastrophic cancellation to destroy everything). A major oil company once threatened to sue Cray when their reservoir model produced wildly different results under a new release of the compiler. Some exceedingly sharp analysts worked on that one for a solid week. Turned out the new compiler evaluated a subexpression A*B*C by doing (B*C) first instead of (A*B), because it was faster in context (and fine to do so by Fortran's rules). It so happened A was very large, and B and C both small, and doing B*C first caused the whole product to underflow to zero where doing A*B first left a product of roughly C's magnitude. I can't imagine how they ever would have found this if they weren't able to recompile the code using twice the precision (which worked fine thanks to the larger dynamic range), then tracing to see where the runs diverged. Even then it took a week because this was 100s of thousands of lines of crufty Fortran than ran for hours on the world's then-fastest machine before delivering bogus results. BTW, if you think the bulk of the world's numeric production code has even been *seen* by a qualified numerical analyst, you should ride on planes more often . From tim.one@home.com Fri Jun 1 08:08:28 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 03:08:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235433.C14591@thyrsus.com> Message-ID: [EAR] > You've got a point... Well, really, they do -- but they had a much more compelling point when the Cold War came with an unlimited budget. > but I don't think it's really economical to build that kind of > hardware into general-purpose processors. Economical? The marginal cost of adding even nutso new features in silicon now for mass-market chips is pretty close to zero. Indeed, if you're in the speech recog or 3D imaging games (i.e., things that still tax a PC), Intel comes around *begging* for new ideas to use up all their chip real estate. The only one I recall them turning down was a request from Dragon's founder to add an instruction that, given x and y, returned log(exp(x)+exp(y)). They were skeptical, and turned out even *we* didn't need it . > You end up with a camel. You know, a horse designed by committee? Yup! But that's the camel Intel rides to the bank, so it will probably grow more humps, on which to hang more bags of gold. From esr@thyrsus.com Fri Jun 1 08:23:16 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 1 Jun 2001 03:23:16 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Fri, Jun 01, 2001 at 02:58:08AM -0400 References: <20010531235148.B14591@thyrsus.com> Message-ID: <20010601032316.A15635@thyrsus.com> Tim Peters : > Not all scientific work consists of predicting the weather with inputs known > to half a digit on a calm day . Knuth gives examples of > ill-conditioned problems where resorting to unbounded rationals is faster > than any known stable f.p. approach (stuck with limited precision) -- think, > e.g., chaotic systems here, which includes parts of many hydrodynamics > problems in real life. Hmmm...good answer. I still believe it's the case that real-world measurements max out below 48 bits or so of precision because the real world is a noisy, fuzzy place. But I can see that most of the algorithms for partial differential equationss would multiply those by very small or very large quantities repeatedly. The range-doubling trick for catching divergences is neat, too. So maybe there's a market for 128-bit floats after all. I'm still skeptical about how likely those applications are to influence the architecture of general-purpose processors. I saw a study once that said heavy-duty scientific floating point only accounts for about 2% of the computing market -- and I think it's significant that MMX instructions and so forth entered the Intel line to support *games*, not Navier-Stokes calculations. That 2% will have to get a lot bigger before I can see Intel doubling its word size again. It's not just the processor design; the word size has huge implications for buses, memory controllers, and the whole system architecture. -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From pf@artcom-gmbh.de Fri Jun 1 08:22:50 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Fri, 1 Jun 2001 09:22:50 +0200 (MEST) Subject: [Python-Dev] precision thread (was One more dict trick) Message-ID: Eric: > > You end up with a camel. You know, a horse designed by committee? Tim: > Yup! But that's the camel Intel rides to the bank, so it will probably grow > more humps, on which to hang more bags of gold. cam*ls? Guido is only one week on vacation and soon heretical words show up here. ;-) sorry, couldn't resist, Peter From thomas@xs4all.net Fri Jun 1 08:28:01 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 1 Jun 2001 09:28:01 +0200 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 01:06:01PM -0500 References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <20010601092800.K690@xs4all.nl> On Thu, May 31, 2001 at 01:06:01PM -0500, Skip Montanaro wrote: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? You had a sticky tag on the file, probably because you used '-rrelease21-maint' on a cvs checkout or update. Good thing it was release21-maint, though, and not some random other revision, or you would have created another branch :-) You can remove stickyness by using 'cvs update -A'. I personally just have two trees, ~/python/python-2.2 and ~/python/python-2.1.1, where the last one was checked out with -rrelease21-maint. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From gmcm@hypernet.com Fri Jun 1 12:29:28 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 1 Jun 2001 07:29:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531235433.C14591@thyrsus.com> Message-ID: <3B174458.1998.46DEEE2B@localhost> [ESR] > > You end up with a camel. You know, a horse designed by > > committee? [Tim] > Yup! But that's the camel Intel rides to the bank, so it will > probably grow more humps, on which to hang more bags of gold. Been a camel a long time, too. x86 assembler is the, er, Perl of assemblers. - Gordon From mwh@python.net Fri Jun 1 12:54:40 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 12:54:40 +0100 Subject: [Python-Dev] another dict crasher Message-ID: Adapted from a report on comp.lang.python from Wolfgang Lipp: class Child: def __init__(self, parent): self.__dict__['parent'] = parent def __getattr__(self, attr): self.parent.a = 1 self.parent.b = 1 self.parent.c = 1 self.parent.d = 1 self.parent.e = 1 self.parent.f = 1 self.parent.g = 1 self.parent.h = 1 self.parent.i = 1 return getattr(self.parent, attr) class Parent: def __init__(self): self.a = Child(self) print Parent().__dict__ segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't tried Tim's latest patch, but I don't believe that will make any difference. It's obvious what's happening; the dict's resizing inside the for loop in dict_repr and the ep pointer is dangling. By the time we've shaken all of these out of dictobject.c it's going to be pretty close to free-threading safe, I'd have thought. reentrancy-sucks-ly y'rs M. -- But since I'm not trying to impress anybody in The Software Big Top, I'd rather walk the wire using a big pole, a safety harness, a net, and with the wire not more than 3 feet off the ground. -- Grant Griffin, comp.lang.python From mwh@python.net Fri Jun 1 13:12:55 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 13:12:55 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: Michael Hudson's message of "01 Jun 2001 12:54:40 +0100" References: Message-ID: Michael Hudson writes: > Adapted from a report on comp.lang.python from Wolfgang Lipp: [snip] > segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't > tried Tim's latest patch, but I don't believe that will make any > difference. > > It's obvious what's happening; the dict's resizing inside the > for loop in dict_repr and the ep pointer is dangling. Actually this crash was dict_print (I always forget about tp_print...). It's pretty easy to mend: *** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter, and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem; if the user's being this contrary there's only so much we can do to help him or her. Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4 From Samuele Pedroni Fri Jun 1 13:49:11 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 1 Jun 2001 14:49:11 +0200 (MET DST) Subject: [Python-Dev] __xxxattr__ caching semantic Message-ID: <200106011249.OAA05837@core.inf.ethz.ch> Hi. What is the intendend semantic wrt to __xxxattr__ caching: class X: pass def cga(self,name): print name def iga(name): print name x=X() x.__dict__['__getattr__'] = iga # 1. x.__getattr__ = iga # 2. X.__dict__['__getattr__'] = cga # 3. X.__getattr__ = cga # 4. x.a for the manual http://www.python.org/doc/current/ref/customization.html with all the variants x.a should fail, they should have no effect. In practice 4. work. Is that an implementation manual mismatch, is this indented, is there code around using 4. ? I'm asking this because jython has differences/bugs in this respect? I imagine that 1.-4. should work for all other __magic__ methods (this should be fixed in jython for some methods), OTOH jython has such a restriction on __del__ too, and this one cannot be removed (is not simply a matter of caching/non caching). regards, Samuele Pedroni. From Greg.Wilson@baltimore.com Fri Jun 1 13:59:28 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 1 Jun 2001 08:59:28 -0400 Subject: [Python-Dev] re: %b format Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1E47@nsamcanms1.ca.baltimore.com> My thanks to everyone who commented on the idea of adding a binary format specifier to Python. I'll volunteer to draft the PEP --- volunteers for a co-author? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From tismer@tismer.com Fri Jun 1 14:56:26 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 15:56:26 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B179F0A.CFA3B2C@tismer.com> Tim Peters wrote: > > Another version of the patch attached, a bit faster and with a large new > comment block explaining it. It's looking good! As I hope the new comments > make clear, nothing about this approach is "a mystery" -- there are > explainable reasons for each fiddly bit. This gives me more confidence in > it than in the previous approach, and, indeed, it turned out that when I > *thought* "hmm! I bet this change would be a little faster!", it actually > was . Thanks a lot for this nice patch. It looks like a real improvement. Also thanks for mentioning my division idea. Since all bits of the hash are eventually taken into account, this idea has somehow survived in an even more efficient solution, good end, file closed. (and good that I saved the time to check my patch in, lately :-) cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From Samuele Pedroni Fri Jun 1 15:18:20 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 1 Jun 2001 16:18:20 +0200 (MET DST) Subject: [Python-Dev] Re: [Jython-dev] Using PyChecker in Jython Message-ID: <200106011418.QAA13570@core.inf.ethz.ch> Hi. [Neal Norwitz] > Hello! > > I have created a program PyChecker to perform Python source code checking. > (http://pychecker.sourceforge.net). > > PyChecker is implemented in C Python and does some "tricky" things. > It doesn't currently work in Jython due to the module dis (disassemble code) > not being available in Jython. > > Is there any fundamental problem with getting PyChecker to work under Jython? > > Here's a high-level overview of what PyChecker does: > > imp.find_module() > imp.load_module() > for each object in dir(module): > # object can be a class, function, imported module, etc. > for each instruction in disassembled byte code: > # handle each instruction appropriately > > This hides a lot of details, but I do lots of things like getting the code objects from the classes, methods, and > functions, look at the arguments > in functions, etc. > > Is it possible to make work in Jython? Easy? > > Thanks for any guidance, > Neal It would be great - really - but about easy? As easy as making PyChecker working on source code without using dis and without importing/executing modules and their top defs, I think there will be no dis support on jython side (we produce java bytecode and getting "back" to python vm bytecode would be very tricky, not very elegant, etc. ) any time soon . Seriously, two possible workaround hacks (they are also not very easy), this is just after small brainstorming and ignoring the concrete needs and code of PyChecker: +) more elegant one, but maybe still too difficult or requiring too much work: let PyChecker run under CPython even when checking jython code, jython code can compile down to py vm bytecode but then does not run: why? java classes imports and the jython specific builtin modules (not so many) So one needs to implement a sufficient amount of python (an import hook, etc) code that does the minimal partial evalution required and the required amount of loading&introspection on java, jython specific stuff in order to have the imports work and PyChecher feeded with the things it needs. This means dealing with the java class format, or a two passes approach: run the code under jython in order to gather the information needed to load it succesfully under python. If the top level code contains conditionals that depend on jython stuff this could be hard, but one can ignore that (at least for starting). Clearly the main PyChecker loop would require some adaptation, and maybe include some logic to check some jython specific stuff (subclassing from java, etc). *) let an adapted PyChecker run under jython, obtain someway the needed py vm bytecode stream from a source -> py vm bytecode compiler written in python (such a thing exists - if I remember well) . And similar ideas ... regards, Samuele Pedroni. From barry@digicool.com Fri Jun 1 15:43:59 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 10:43:59 -0400 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.43567.202950.192811@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> You can remove stickyness by using 'cvs update -A'. I TW> personally just have two trees, ~/python/python-2.2 and TW> ~/python/python-2.1.1, where the last one was checked out with TW> -rrelease21-maint. Very good advice for anybody playing with branches! -Barry From barry@digicool.com Fri Jun 1 16:12:33 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 11:12:33 -0400 Subject: [Python-Dev] another dict crasher References: Message-ID: <15127.45281.435849.822222@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that MH> will make any difference. That is highly, highly nasty. Sounds to me like there ought to be an emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if necessary. And if we can trojan in the NAIPL (New And Improved Python License), I wouldn't mind. :) -Barry From jeremy@digicool.com Fri Jun 1 16:18:05 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Fri, 1 Jun 2001 11:18:05 -0400 (EDT) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <15127.45613.947590.246269@slothrop.digicool.com> >>>>> "BAW" == Barry A Warsaw writes: >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that will MH> make any difference. BAW> That is highly, highly nasty. Sounds to me like there ought to BAW> be an emergency 2.1.1 patch made for this, bumping Thomas's BAW> work to 2.1.2 if necessary. And if we can trojan in the NAIPL BAW> (New And Improved Python License), I wouldn't mind. :) We can release a critical patch for this bug, ala the CriticalPatches page for the Python 2.0 release. Jeremy From mwh@python.net Fri Jun 1 17:03:55 2001 From: mwh@python.net (Michael Hudson) Date: Fri, 1 Jun 2001 17:03:55 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: On Fri, 1 Jun 2001, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Yes. > Sounds to me like there ought to be an emergency 2.1.1 patch made for > this, bumping Thomas's work to 2.1.2 if necessary. Really? Two mild counterpoints: 1) It's *old*; 1.5.2 at least, and that's only because that's the oldest version I happen to have lying around. It's quite similar to the test_mutants oddness in some ways. 2) There's at least one other crasher in 2.1; the one in the compiler where a variable is referenced in a class and in a contained method. (I've actually run into that one). But a "fix these crashers" release seems reasonable if there's someone with the time to put it out (not me!). > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) Well me neither... Cheers, M. From skip@pobox.com (Skip Montanaro) Fri Jun 1 17:26:35 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 1 Jun 2001 11:26:35 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <20010601092800.K690@xs4all.nl> References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.49723.186388.220648@beluga.mojam.com> Thomas> I personally just have two trees, ~/python/python-2.2 and Thomas> ~/python/python-2.1.1, where the last one was checked out with Thomas> -rrelease21-maint. Thanks, good advice. httplib.py has now been updated on both the head and release21-maint branches. Skip From loewis@informatik.hu-berlin.de Fri Jun 1 18:07:52 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 1 Jun 2001 19:07:52 +0200 (MEST) Subject: [Python-Dev] METH_NOARGS calling convention Message-ID: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> The patch http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 introduces two new calling conventions, METH_O and METH_NOARGS. The rationale for METH_O has been discussed already; the rationale for METH_NOARGS is that it allows a convient simplification (plus a marginal speed-up) of functions which do either PyArg_NoArgs(args) or PyArg_ParseTuple(args, ":function_name"). Now, one open issue is whether the METH_NOARGS functions should have a signature of PyObject * (*unaryfunc)(PyObject *); or of PyObject *(*PyCFunction)(PyObject *, PyObject *); which then would be called with a NULL second argument; the first argument would be self in either case. IMO, the advantage of passing the NULL argument is that NOARGS methods don't need to be cast into PyCFunction in the method table; the advantage of the second approach is that it is clearer in the function implementation. Any opinions which signature to use? Regards, Martin From mal@lemburg.com Fri Jun 1 18:18:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 19:18:21 +0200 Subject: [Python-Dev] METH_NOARGS calling convention References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: <3B17CE5D.9D4CE8D4@lemburg.com> Martin von Loewis wrote: > > The patch > > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 > > introduces two new calling conventions, METH_O and METH_NOARGS. The > rationale for METH_O has been discussed already; the rationale for > METH_NOARGS is that it allows a convient simplification (plus a > marginal speed-up) of functions which do either PyArg_NoArgs(args) or > PyArg_ParseTuple(args, ":function_name"). > > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The second... I'm not sure how you will get extension writers who have to maintain packages for all three Python versions to ever change their code to use the new style calling scheme: there simply is no clean way to use the same code base unless you are willing to add tons of #ifdefs. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fdrake@acm.org Fri Jun 1 18:31:15 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Jun 2001 13:31:15 -0400 (EDT) Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <3B17CE5D.9D4CE8D4@lemburg.com> References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> <3B17CE5D.9D4CE8D4@lemburg.com> Message-ID: <15127.53603.87216.103262@cj42289-a.reston1.va.home.com> M.-A. Lemburg writes: > > Any opinions which signature to use? > > The second... Seconded. ;-) > I'm not sure how you will get extension writers who > have to maintain packages for all three Python versions to > ever change their code to use the new style calling scheme: > there simply is no clean way to use the same code base unless > you are willing to add tons of #ifdefs. You won't, and that's OK. Even if 3rd-party extensions never use it, there are plenty of functions/methods in the standard distribution which can use it, and I imagine those would be converted fairly quickly. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tismer@tismer.com Fri Jun 1 19:29:11 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:29:11 +0200 Subject: [Python-Dev] Marshal bug in 2.1? Message-ID: <3B17DEF7.3E7C6BC6@tismer.com> This is a multi-part message in MIME format. --------------6AB95E65519E7075E373B33F Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi friends, there is a script which generates encrypted passwords for Starship users. There is a series of marshal, zlib and base64 calls, which is reversed by the script. Is there a known bug in Marshal, or should I start the debugger now? The passwphrase for the attached script is "hey". cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ --------------6AB95E65519E7075E373B33F Content-Type: text/plain; charset=us-ascii; name="letmein.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="letmein.py" import marshal,base64,zlib exec marshal.loads(zlib.decompress(base64.decodestring(""" eJytVM+PGzUUfs6PzWZYwapAqbbAuiyF6Yqsqt2iomq1HGkvuQQJaS+pM3YzbjP2yHY6CdrVHNr+ Exz5L/gn4MidC2f+Az5Pkq0QlFMnmTf2s+d73/vmPWeEq43b/wxT498mSXSOwbskGZ0zqm+QbNF5 i+o9km16idU21bdIdUh26GmLrCRWf0ayS8+6dN6l+oAU0XcP689JbZHcohfA6VF9mxQj1SbVi57r 2PAFqS7p7bVH9+kFkew1mDvA/JJUCziGEYs3AozS7ch1yIiSg7dwJfjxzCkRVFml4Q7ng8F6zgUv hfeVdZLzJ84WXJgln+rnyvCgFuEIbzoV5s54/g3PcuFEFpTzvMp1lnPhFM9sUc6DklwboEmF5UIb 7YPO8PJkHvhz5ZbcWDOYaaOE45VYrmI18N/n2sctXlvDMczmPthC/wjEJ9bxUrtFTOBt6OAPoqSH h4c85MqrdUaeT1SoFDIenJ0OmpyWdu5AxDllwmuB8GLC33gNzm7700EytBWfA3s0esiD5TM7hTAY +IBIuS6PymXIrTkyKiRYjKL5+MI607nXZsrVAjLPlpHmFck0m+lyYgWIOAXRC2UkNHowuJMII+Mm M10zv2K8QosojUvy0tmpE0WyomQLFfK4o7BIGgUhxWSmjhJ/F/U3CdVX/BHPRKyE2SwiA0mEVQgI g49agXtmIVMWbmWMOvi1yZexyfaovhmb7BnRJWsGjC7RXh/TBZqgFdsO3XCJJvuELtqkO3RB0cPq T5v5VmyTSwDt00WLdI/CduxQNGbc14pNGm2H+Ajgo7SLoEPfhz25e3x8cv/eyX0wYuADRjepAQpE ga3jIP514H2E4SiNZ8NQj2E1h2nmPposd80TYnrUDi3SaFdD/37c8O9q9bF7T2eimEhxtk8+Hj6N 0XEh7W+wC/m134qT4PANGpdRVYMtm4V5KdGijSM0DqmnygffwfCp1WaFIsq0s+EU/gt4Bfh/ZDdn wx75JJ6U7EN2je2y91izOh4XQpvxeOj3MStnSqC88f1RsqtSiMXKy9zB/8DvYs/jH/46fWR+q3+v fv3lz5/+eJUmm5ylzRr6eB5vBif/4LAOaUShxuOrdKJoTlRjbXDWNN6wCFeSvdYmbcR+U65RiW9R Dh/gufNOP+m3dnq7bIdtI9VrbJ/9DYOcdyU= """))) --------------6AB95E65519E7075E373B33F-- From tismer@tismer.com Fri Jun 1 19:47:02 2001 From: tismer@tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:47:02 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> Message-ID: <3B17E326.41D82CCE@tismer.com> Christian Tismer wrote: > > Hi friends, > > there is a script which generates encrypted passwords for > Starship users. There is a series of marshal, zlib and base64 > calls, which is reversed by the script. > > Is there a known bug in Marshal, or should I start the debugger now? > The passwphrase for the attached script is "hey". Aehmmm... can it be that code objects are no longer compatible between Python 2.0 and 2.1? sigh - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mwh@python.net Fri Jun 1 19:52:17 2001 From: mwh@python.net (Michael Hudson) Date: 01 Jun 2001 19:52:17 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: barry@digicool.com's message of "Fri, 1 Jun 2001 11:12:33 -0400" References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: Warning! VERY SICK CODE INDEED ahead! barry@digicool.com (Barry A. Warsaw) writes: > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Not as nasty as this, though: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli: def __repr__(self): dict.clear() print # doesn't crash without this. don't know why return `"machiavelli"` def __hash__(self): return 0 dict[Machiavelli()] = Machiavelli() print dict gives, even with my posted patch to dictobject.c $ ./python crash2.py { Segmentation fault (core dumped) Any ideas what the above code should do? (Other than use the secret PSU website to hire a hitman and shoot whoever wrote the code, I mean). Cheers, M. -- Well, yes. I don't think I'd put something like "penchant for anal play" and "able to wield a buttplug" in a CV unless it was relevant to the gig being applied for... -- Matt McLeod, alt.sysadmin.recovery From mal@lemburg.com Fri Jun 1 20:01:38 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 21:01:38 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> Message-ID: <3B17E692.281A329B@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Hi friends, > > > > there is a script which generates encrypted passwords for > > Starship users. There is a series of marshal, zlib and base64 > > calls, which is reversed by the script. > > > > Is there a known bug in Marshal, or should I start the debugger now? > > The passwphrase for the attached script is "hey". > > Aehmmm... can it be that code objects are no longer compatible > between Python 2.0 and 2.1? Yes, not suprisingly though... AFAIK the pyc format changed in every single version between 1.5.2 and 2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 1 21:36:21 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 16:36:21 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: I suspect there are many ways to get the dict code to blow up, and always have been. I picked on dict compare a month or so ago mostly because nobody cares how fast that runs except in the == and != cases. Others are a real bitch; for example, the fundamental lookdict function caches dictentry *ep0 = mp->ma_table; at the start as if it were invariant -- but very unlikely sequences of collisions with identical hash codes combined with mutating comparisons can turn that into a bogus pointer. List objects used to have similar vulnerabilities during sorting (where comparison is the *norm*, not a one-in-a-billion freak occurrence), and no amount of slow-the-code paranoia sufficed to plug all conceivable holes. In the end we invented an internal "immutable list type", and replace the list object's type pointer for the duration of the sort (you can still try to mutate a list during a sort, but all the mutating list methods are redirected to raise an exception when you do). The dict code has even more holes and in more places, but they're generally much harder to provoke, so they've gone unnoticed for 10 years. All in all, seemed like a good tradeoff to me . From tim.one@home.com Fri Jun 1 23:08:32 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 18:08:32 -0400 Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: Cool! [Martin von Loewis] > ... > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The one that makes sense : delcare functions with the number of arguments they use. I don't care about needing to cast in the table: you do that once, but people read the *code* over and over, and an unused arg will be a mystery (or even a source of compiler warnings) every time you bump into one. The only way needing to cast could be "a problem" is if this remains an undocumented gimmick that developers have to reverse-engineer from staring at the (distributed all over the place) implementation. I like what the patch does, but I'd reject it just for continuing to leave this stuff Utterly Mysterious: please add comments saying what METH_NOARGS and METH_O *mean*: what's the point, why are these defined, how and when are you supposed to use them? That's where to explain the need to cast METH_NOARGS. From thomas@xs4all.net Fri Jun 1 23:42:35 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:42:35 +0200 Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org>; from barry@digicool.com on Fri, Jun 01, 2001 at 11:12:33AM -0400 References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <20010602004235.Q690@xs4all.nl> On Fri, Jun 01, 2001 at 11:12:33AM -0400, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > That is highly, highly nasty. Sounds to me like there ought to be an > emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if > necessary. Why bump 'my work' ? I'm just reviewing patches checked into the head. A fix for the above problems would fit in a patch release very nicely, and a release is a release. Besides, releasing 2.1.1 as 2.1 + dict fix would be a CVS nightmare. Unless you propose to keep it out of CVS, Barry ? :) > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) I'll channel Guido by saying he wouldn't even allow us to ship it with anything other than the PSF licence :) Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Fri Jun 1 23:47:16 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:47:16 +0200 Subject: [Python-Dev] Marshal bug in 2.1? In-Reply-To: <3B17E692.281A329B@lemburg.com>; from mal@lemburg.com on Fri, Jun 01, 2001 at 09:01:38PM +0200 References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> Message-ID: <20010602004716.R690@xs4all.nl> On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > Yes, not suprisingly though... AFAIK the pyc format changed > in every single version between 1.5.2 and 2.1. Worse, it's changed several times between each release :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry@digicool.com Sat Jun 2 00:12:30 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 19:12:30 -0400 Subject: [Python-Dev] another dict crasher References: <15127.45281.435849.822222@anthem.wooz.org> <20010602004235.Q690@xs4all.nl> Message-ID: <15128.8542.51241.192412@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> That is highly, highly nasty. Sounds to me like there ought to >> be an emergency 2.1.1 patch made for this, bumping Thomas's >> work to 2.1.2 if necessary. TW> Why bump 'my work' ? I'm just reviewing patches checked into TW> the head. A fix for the above problems would fit in a patch TW> release very nicely, and a release is a release. Besides, TW> releasing 2.1.1 as 2.1 + dict fix would be a CVS TW> nightmare. Unless you propose to keep it out of CVS, Barry ? TW> :) Oh no! You know me, I like to release those maintenance releases early and often. :) Anyway, that's why /you're/ the 2.1.1 czar. >> And if we can trojan in the NAIPL (New And Improved Python >> License), I wouldn't mind. :) TW> I'll channel Guido by saying he wouldn't even allow us to ship TW> it with anything other than the PSF licence :) :) TW> Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly TW> y'rs Where'd you get /that/ idea? :) -Barry From mwh@python.net Sat Jun 2 00:20:26 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 00:20:26 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Fri, 1 Jun 2001 16:36:21 -0400" References: Message-ID: "Tim Peters" writes: > The dict code has even more holes and in more places, but they're > generally much harder to provoke, so they've gone unnoticed for 10 > years. All in all, seemed like a good tradeoff to me . Are you suggesting that we should just leave these crashers in? They're not *particularly* hard to provoke if you know the implementation - and I was inspired to look for them by someone's report of actually running into one. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From tim.one@home.com Sat Jun 2 02:04:36 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 1 Jun 2001 21:04:36 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Are you suggesting that we should just leave these crashers in? > They're not *particularly* hard to provoke if you know the > implementation - and I was inspired to look for them by someone's > report of actually running into one. I certainly don't object to fixing ones that bite innocent users, but there are also costs of several kinds. In this case, I couldn't care less how long printing a dict takes -- go for it. When adversarial abuse starts interfering with the speed of crucial operations, though, I'm simply not a "safety at any cost" person. Guido is much more of one, although the number of holes remaining in Python could plausibly fill Albert Hall . short-of-50-easy-ways-to-crash-win98-just-think-hard-about-each-"+"-in- the-code-base-ly y'rs - tim From gstein@lyra.org Sat Jun 2 06:52:03 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:52:03 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 09:42:30PM -0400 References: <3B10D758.3741AC2F@lemburg.com> Message-ID: <20010601225203.R23560@lyra.org> On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote: >... > [Greg Ewing] > > I think it would be safe if: > > > > 1) it kept a reference to the underlying object, and > > That much it already does. > > > 2) it re-fetched the pointer and length info each time it was > > needed, using the underlying object's buffer interface. > > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Jun 2 06:54:23 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:54:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, May 26, 2001 at 02:44:04AM -0400 References: <3B0ED784.FC53D01@lemburg.com> Message-ID: <20010601225423.S23560@lyra.org> On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote: > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Sat Jun 2 06:55:33 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:55:33 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com>; from mal@lemburg.com on Sat, May 26, 2001 at 05:47:47PM +0200 References: <3B0FD023.C4588919@lemburg.com> Message-ID: <20010601225533.T23560@lyra.org> On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote: >... > Even the idea of replacing the usage of strings as data buffers > with buffer object didn't get very far; common habits are simply > hard to break. That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Sat Jun 2 07:13:49 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:13:49 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Actually this crash was dict_print (I always forget about tp_print...). We all should . > It's pretty easy to mend: > > *** dictobject.c Fri Jun 1 13:08:13 2001 > --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 > *************** > *** 793,795 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { > if (ep->me_value != NULL) { > --- 793,796 ---- > any = 0; > ! for (i = 0; i < mp->ma_size; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > *************** > *** 833,835 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { > if (ep->me_value != NULL) { > --- 834,837 ---- > any = 0; > ! for (i = 0; i < mp->ma_size && v; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > > I'm not sure this stops still more Machiavellian behaviour from > crashing the interpreter, Alas, it doesn't. You can't trust *anything* about a container you're iterating over across any call that may call back into Python. In these cases, the call to PyObject_Repr() can execute any code at all, including code that mutates the dict you're crawling over. In particular, calling PyObject_Repr() to format the key means the ep = &mp->ma_table[i] pointer may be trash by the time PyObject_Repr() is called again to format the value. See characterize() for the pain it takes to guard against everything, including encouraging comments like: if (cmp > 0 || i >= a->ma_size || a->ma_table[i].me_value == NULL) { /* Not the *smallest* a key; or maybe it is * but the compare shrunk the dict so we can't * find its associated value anymore; or * maybe it is but the compare deleted the * a[thiskey] entry. */ Py_DECREF(thiskey); continue; } It should really add "or maybe it just shuffled the dict around and the value at ma_table[i] is no longer associated with the key that *used* to be at ma_table[i], but since there's still *some* non-NULL pointer there we'll just pretend that didn't happen and press onward". > and you can certainly get items being printed more than once or not > at all. I'm not sure this last is a problem; Those don't matter: in a long tradition, we buy "safety" not only at the cost of bloating the code, but also by making the true behavior in case of mutation unpredictable & inexplicable. That's why I *really* liked the "immutable list" trick in list.sort(): even if we could have made the code bulletproof without it, we couldn't usefully explain what the heck it actually did. It's not Pythonic to blow up, but neither is it Pythonic to be incomprehensible. You simply can't win here. > if the user's being this contrary there's only so much we can > do to help him or her. I'd prefer a similar internal immutable-dict trick that raised an exception if the user was pushing Python into a corner where "blow up or do something baffling" were its only choices. That would render the original example illegal, of course. But would that be a bad thing? What *should* it mean when the user invokes an operation on a container and mutates the container during that operation? There's almost no chance that Jython does the same thing as CPython in all these cases, so it's effectively undefined behavior no matter how you plug the holes (short of raising an exception). From tim.one@home.com Sat Jun 2 07:34:43 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:34:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010601225203.R23560@lyra.org> Message-ID: [Tim] > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. [Greg] > Huh? I don't think it would be all that slow. It is just a function > call. And I don't think that the getitem slot is really used all that > frequently (in a loop) for buffer type objects. I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python. > I've been thinking that refetching the ptr/len is the right fix. So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle. > Part of the neglect is also based on Guido's ambivalence. Part is > that I haven't needed more from it. The day that I do, then I'll > code it up :-) But that doesn't help the "generic" case, > unfortunately. I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising . From mwh@python.net Sat Jun 2 08:09:07 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 08:09:07 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 02:13:49 -0400" References: Message-ID: "Tim Peters" writes: > [Michael Hudson] > > Actually this crash was dict_print (I always forget about tp_print...). > > We all should . > > > It's pretty easy to mend: [snip] > > I'm not sure this stops still more Machiavellian behaviour from > > crashing the interpreter, > > Alas, it doesn't. No, that's what my "dict[Machiavelli()] = Machiavelli()" example was demonstrating. If noone beats me to it, I'll post a better fix to sf next week, complete with test-cases and suitably "encouraging" comments. I can't easily see other examples of the problem; there certainly might be things you could do with comparisons that could trigger crashes, but that code's so hairy that it's almost impossible for me to be sure. There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare > > and you can certainly get items being printed more than once or not > > at all. I'm not sure this last is a problem; > > Those don't matter: in a long tradition, we buy "safety" not only at the > cost of bloating the code, but also by making the true behavior in case of > mutation unpredictable & inexplicable. This is what I thought. [snip] > > if the user's being this contrary there's only so much we can > > do to help him or her. > > I'd prefer a similar internal immutable-dict trick that raised an exception > if the user was pushing Python into a corner where "blow up or do something > baffling" were its only choices. That would render the original example > illegal, of course. But would that be a bad thing? It's hard to see how. > What *should* it mean when the user invokes an operation on a > container and mutates the container during that operation? I don't think there's a meaning you can attach to this kind of behaviour. The "immutable dict trick" looks better the more I think about it, but I guess that will have to wait until Guido gets back from the sun... Cheers, M. -- incidentally, asking why things are "left out of the language" is a good sign that the asker is fairly clueless. -- Erik Naggum, comp.lang.lisp From gstein@lyra.org Sat Jun 2 08:40:05 2001 From: gstein@lyra.org (Greg Stein) Date: Sat, 2 Jun 2001 00:40:05 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, Jun 02, 2001 at 02:34:43AM -0400 References: <20010601225203.R23560@lyra.org> Message-ID: <20010602004005.F23560@lyra.org> On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote: > [Tim] > > If after > > > > b = buffer(some_object) > > > > b.__getitem__ needed to refetch the info between > > > > b[i] > > and > > b[i+1] > > > > I expect it would be so slow even Greg wouldn't want it anymore. > > [Greg] > > Huh? I don't think it would be all that slow. It is just a function > > call. And I don't think that the getitem slot is really used all that > > frequently (in a loop) for buffer type objects. > > I expect they index into the buffer memory directly then, right? Then for > buffers obtained from mutable objects, any such loop is unsafe in the > absence of the GIL, or even in its presence if the loop contains code that > may call back into Python. Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common. > > I've been thinking that refetching the ptr/len is the right fix. > > So is calling __getitem__ all the time then, unless you want to dance on the > razor's edge. The idea that you can safely "borrow" memory from a mutable > object without copying it is brittle. Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake. > > Part of the neglect is also based on Guido's ambivalence. Part is > > that I haven't needed more from it. The day that I do, then I'll > > code it up :-) But that doesn't help the "generic" case, > > unfortunately. > > I take that as "yes" to my "nobody cares about it enough to maintain it?". > In that light, Guido's ambivalence is indeed surprising . Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one@home.com Sat Jun 2 09:17:39 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 04:17:39 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > ... > If noone beats me to it, I'll post a better fix to sf next week, > complete with test-cases and suitably "encouraging" comments. Ah, no need -- looks like I was doing that while you were writing this. Checked in already. So long as we're happy to settle for senseless results that simply don't blow up, the only other trick you really needed was to save away the value in a local vrbl and incref it across the key->string bit; then you don't have to worry about key->string deleting the value, or about the table entry it lived in going away (because you get the value from the (still-incref'ed) *local* vrbl later, not from the table again). > I can't easily see other examples of the problem; there certainly > might be things you could do with comparisons that could trigger > crashes, but that code's so hairy that it's almost impossible for me > to be sure. It's easy to be sure: any code that tries to remember anything about a dict (ditto any mutable object) across a "dangerous" call, other than the mere address of the object, is a place you *can* provoke a core dump. It may not be easy to provoke, and a given provoking test case may not fail across all platforms, or even every time you run it on a single platform, but it's "an obvious" hole all the same. From tismer@tismer.com Sat Jun 2 10:49:35 2001 From: tismer@tismer.com (Christian Tismer) Date: Sat, 02 Jun 2001 11:49:35 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> Message-ID: <3B18B6AE.88EA6926@tismer.com> Thomas Wouters wrote: > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > Yes, not suprisingly though... AFAIK the pyc format changed > > in every single version between 1.5.2 and 2.1. > > Worse, it's changed several times between each release :) But I didn't use .pyc at all, just a marshalled code object. There are no version headers or such. The same object worked in fact for Py 1.5.2 and 2.0, but no longer with 2.1 . I debugged the unmarshalling and saw what happened: The new code objects with their new scoping features were the problem. The new structures were simply added, and there is no way to skip these for older code objects, since there isn't any info. Some option for marshal to umarshal old-style code objects would ave helped. But then, I'm not sure if the opcodes are still assigned the same way in 2.1, or if there was some movement? This would kill it anyway. ciao - chris (now looking for another cheap way to do something invisible in Python without installing *anything* ) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mal@lemburg.com Sat Jun 2 12:09:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 02 Jun 2001 13:09:13 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> <3B18B6AE.88EA6926@tismer.com> Message-ID: <3B18C958.598A9891@lemburg.com> Christian Tismer wrote: > > Thomas Wouters wrote: > > > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > > > Yes, not suprisingly though... AFAIK the pyc format changed > > > in every single version between 1.5.2 and 2.1. > > > > Worse, it's changed several times between each release :) > > But I didn't use .pyc at all, just a marshalled code object. That's the point: the header in pyc files is meant to signal the incompatibility of the following code object. Perhaps we should moev this version information into the marshal format of code objects themselves... > There are no version headers or such. > The same object worked in fact for Py 1.5.2 and 2.0, but no > longer with 2.1 . > I debugged the unmarshalling and saw what happened: > The new code objects with their new scoping features were > the problem. The new structures were simply added, and there > is no way to skip these for older code objects, since there > isn't any info. > Some option for marshal to umarshal old-style code objects > would ave helped. > But then, I'm not sure if the opcodes are still assigned > the same way in 2.1, or if there was some movement? This would > kill it anyway. AFAIK, the assignments did not change, but several opcodes were added in 2.1, so code compiled in 2.1 will no run in 2.0. > ciao - chris > > (now looking for another cheap way to do something invisible in > Python without installing *anything* ) Why don't you use freeze or py2exe or Gordon's installer for these one file executables ? Alternatively, you should check the Python version and make sure that it matches the one used for compiling the byte code. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh@python.net Sat Jun 2 12:40:56 2001 From: mwh@python.net (Michael Hudson) Date: 02 Jun 2001 12:40:56 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 04:17:39 -0400" References: Message-ID: "Tim Peters" writes: > > I can't easily see other examples of the problem; there certainly > > might be things you could do with comparisons that could trigger > > crashes, but that code's so hairy that it's almost impossible for me > > to be sure. > > It's easy to be sure: any code that tries to remember anything about a dict > (ditto any mutable object) across a "dangerous" call, other than the mere > address of the object, is a place you *can* provoke a core dump. It may not > be easy to provoke, and a given provoking test case may not fail across all > platforms, or even every time you run it on a single platform, but it's "an > obvious" hole all the same. Ah, like this one: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0 dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one). Cheers, M. Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { Here's another test case to work out the second of those new if statements: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli3: def __init__(self, id): self.id = id def __eq__(self, other): if self.id == other.id: dict.clear() return 1 else: return 0 def __repr__(self): return "%s(%s)"%(self.__class__.__name__, self.id) def __hash__(self): return 0 dict[Machiavelli3(1)] = Machiavelli3(0) dict[Machiavelli3(2)] = Machiavelli3(0) print dict[Machiavelli3(2)] -- M-x psych[TAB][RETURN] -- try it From pedroni@inf.ethz.ch Sat Jun 2 19:58:55 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 2 Jun 2001 20:58:55 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? Message-ID: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Hi. Is this a case that only the BDFL could know and pronounce on ... or I'm missing somenthing ... Thanks for any feedback, Samuele Pedroni. ----- Original Message ----- From: Samuele Pedroni To: Sent: Friday, June 01, 2001 2:49 PM Subject: [Python-Dev] __xxxattr__ caching semantic > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). > > regards, Samuele Pedroni. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > From tim.one@home.com Sat Jun 2 23:57:57 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 18:57:57 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > Is this a case that only the BDFL could know and pronounce on ... > or I'm missing somenthing ... The referenced URL http://www.python.org/doc/current/ref/customization.html appears irrelevant to me, so unsure what you're asking about. Perhaps http://www.python.org/doc/current/ref/attribute-access.html was intended? If so, the these methods are cached in the class object at class definition time; therefore, they cannot be changed after the class definition is executed. there doesn't mean exactly what it says: it's trying to say that the __XXXattr__ methods *inherited from base classes* (if any) are cached in the class object at class definition time, so that changing them in the base classes later has no effect on the derived class. It should be clearer. A direct class setattr can still change them; indirect assignment via class.__dict__ is ineffective for the __dict__, __bases__, __name__, __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create a dict entry then, but class getattr doesn't look in the dict to get the value of these specific keys). Didn't understand the program snippet. Much of this is due to hoary optimizations and I agree is ill-documented. I hope Guido's current rework of all this stuff will leave the endcases more explainable. > ----- Original Message ----- > From: Samuele Pedroni > To: > Sent: Friday, June 01, 2001 2:49 PM > Subject: [Python-Dev] __xxxattr__ caching semantic > > > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). From pedroni@inf.ethz.ch Sun Jun 3 00:46:42 2001 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sun, 3 Jun 2001 01:46:42 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? References: Message-ID: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Hi. Thanks a lot for the answer, and sorry for the ill-formed question. [Tim Peters] > [Samuele Pedroni] > > Is this a case that only the BDFL could know and pronounce on ... > > or I'm missing somenthing ... > > The referenced URL > > http://www.python.org/doc/current/ref/customization.html > > appears irrelevant to me, so unsure what you're asking about. Perhaps > > http://www.python.org/doc/current/ref/attribute-access.html > > was intended? If so, the Yes, pilot error with browser and copy&pasted, I intented the latter. > these methods are cached in the class object at class > definition time; therefore, they cannot be changed after > the class definition is executed. > > there doesn't mean exactly what it says: it's trying to say that the > __XXXattr__ methods *inherited from base classes* (if any) are cached in the > class object at class definition time, so that changing them in the base > classes later has no effect on the derived class. It should be clearer. > > A direct class setattr can still change them; indirect assignment via > class.__dict__ is ineffective for the __dict__, __bases__, __name__, > __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create > a dict entry then, but class getattr doesn't look in the dict to get the > value of these specific keys). > This matches what I understood reading CPython C code (yes I did that too ), and what the snippets was trying to point out. And I see the problem with derived classes too. > Didn't understand the program snippet. Sorry it is not one snippet, but the 4 variants should be considered indipendently. > > Much of this is due to hoary optimizations and I agree is ill-documented. I > hope Guido's current rework of all this stuff will leave the endcases more > explainable. That will be a lot to work for porting it to jython . In any case the manual is really not clear (euphemism ) about this. The point is that jython implements the letter of the manual, and even extend the caching opt to some others __magic__ methods. I wanted to know the intended behaviour in order to fix that in jython. regards Samuele Pedroni. From tim.one@home.com Sun Jun 3 00:56:34 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 2 Jun 2001 19:56:34 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > ... > The point is that jython implements the letter of the manual, and even > extend the caching opt to some others __magic__ methods. I wanted to > know the intended behaviour in order to fix that in jython. You got that one right the first time: this requires BDFL pronouncement! As semantically significant optimizations (the only reason for caching __getattr__, e.g.) creep into the code but the docs lag behind, it gets more and more unclear what's mandatory behavior and what's implementation-defined. This came up a couple weeks ago again in the context of what, exactly, rich comparisons are supposed to do in all cases. After poking holes in everything Guido wrote, he turned it around and told me to write up what I think it should say (which I have yet to do, as it's time-consuming and it appears some of the current CPython behavior is at least partly accidental -- but unclear exactly which parts). So don't be surprised if the same trick gets played on you ... From tim.one@home.com Sun Jun 3 05:04:57 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 00:04:57 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Ah, like this one: > > dict = {} > > # let's force dict to malloc its table > for i in range(1,10): > dict[i] = i > > class Machiavelli2: > def __eq__(self, other): > dict.clear() > return 1 > def __hash__(self): > return 0 > > dict[Machiavelli2()] = Machiavelli2() > > print dict[Machiavelli2()] Told you it was easy . > I'll attach a patch, but it's another branch inside lookdict (though > not lookdict_string which is I guess the really performance sensitive > one). lookdict_string is crucial to Python's own performance. Dicts indexed by ints or class instances or ... are vital to other apps. > Index: dictobject.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v > retrieving revision 2.100 > diff -c -1 -r2.100 dictobject.c > *** dictobject.c 2001/06/02 08:27:39 2.100 > --- dictobject.c 2001/06/02 11:36:47 > *************** > *** 273,274 **** > --- 273,281 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { > *************** > *** 310,311 **** > --- 317,325 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { Then we have other problems. Note the comment before lookdict: Exceptions are never reported by this function, and outstanding exceptions are maintained. The patched code doesn't preserve that. Looking for "the first" unused or dummy slot isn't good enough either, as surely the user has the right to expect that after, e.g., d[m] = 1, d[m] retrieves 1. That is, picking a reusable slot "at random" doesn't respect the *semantics* of dict operations ("just because" the dict resized doesn't mean the key they're looking for went away!). It would be better in this case to go back to the top and start over. However, then an adversarial user can construct a case that never terminates. Unclear what to do. From tim.one@home.com Sun Jun 3 08:55:43 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 03:55:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010602004005.F23560@lyra.org> Message-ID: [Greg Stein] > ... > I was saying that, at the Python level, using a loop and doing b[i] into > a buffer/string/unicode object would seem to be relatively rare. b[0] > and stuff is reasonably common. Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed. >> I take that as "yes" to my "nobody cares about it enough to >> maintain it?". In that light, Guido's ambivalence is indeed >> surprising . > Eh? I'll maintain the thing, but you're confusing that with adding more > features into it. Different question. I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book . From gstein@lyra.org Sun Jun 3 10:10:36 2001 From: gstein@lyra.org (Greg Stein) Date: Sun, 3 Jun 2001 02:10:36 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, Jun 03, 2001 at 03:55:43AM -0400 References: <20010602004005.F23560@lyra.org> Message-ID: <20010603021036.U23560@lyra.org> On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote: > [Greg Stein] > > ... > > I was saying that, at the Python level, using a loop and doing b[i] into > > a buffer/string/unicode object would seem to be relatively rare. b[0] > > and stuff is reasonably common. > > Well, at the Python level buffer objects seem never to be used, probably I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often. > because all the people who know about them don't advertise it because it's > an easy way to provoke core dumps now. Easy? Depends on what you use them with. >... > >> I take that as "yes" to my "nobody cares about it enough to > >> maintain it?". In that light, Guido's ambivalence is indeed > >> surprising . > > > Eh? I'll maintain the thing, but you're confusing that with adding more > > features into it. Different question. > > I haven't asked for new features, just that what's already there get fixed: > Python-level buffer objects are unsafe, the docs remain incomplete, I'll fix the code. > there's > random stuff like file.readinto() that's not documented at all (could be > that's the only one -- it's certainly "discovered" on c.l.py often enough, > though), Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc. > and there are no buffer tests in the std test suite. The work to > introduce the type wasn't completed, nobody works on it, and finishing work > 3 years late doesn't count as "new feature" in my book . Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/ From dgoodger@bigfoot.com Sun Jun 3 15:39:42 2001 From: dgoodger@bigfoot.com (David Goodger) Date: Sun, 03 Jun 2001 10:39:42 -0400 Subject: [Python-Dev] new PEP candidates Message-ID: I have just posted three related PEP candidates to the Doc-SIG: - PEP: Docstring Processing System Framework http://mail.python.org/pipermail/doc-sig/2001-June/001855.html - PEP: DPS Generic Implementation Details http://mail.python.org/pipermail/doc-sig/2001-June/001856.html - PEP: Docstring Conventions http://mail.python.org/pipermail/doc-sig/2001-June/001857.html These are all part of the newly created Python Docstring Processing System project, http://docstring.sf.net. Barry: Please assign PEP numbers to these if possible. Once PEP numbers have been assigned, I will post to comp.lang.python. Thanks. A related project is the second draft of reStructuredText, a docstring markup syntax definition. The project is http://structuredtext.sf.net, and I've posted the following to Doc-SIG: - An Introduction to reStructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001858.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001859.html - reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001860.html - Python Extensions to the reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001861.html I am not seeking PEP status for reStructuredText at this time; I think it's one step too far removed from the Python language to warrant a PEP. If you think it *should* be a PEP, I will be happy to convert it. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net From mwh@python.net Sun Jun 3 22:47:48 2001 From: mwh@python.net (Michael Hudson) Date: 03 Jun 2001 22:47:48 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 00:04:57 -0400" References: Message-ID: "Tim Peters" writes: > It would be better in this case to go back to the top and start > over. Yes. What you checked in is obviously better. I'll stick to being the bearer of bad tidings... > However, then an adversarial user can construct a case that never > terminates. I seem to have done this - it was odd, though - it only loops when I bump the dict to fairly enormous preportions for reasons I don't really (want to) understand. > Unclear what to do. Not worrying about it seems entirely reasonable - I now have sitting on my hard drive the wierdest way of spelling "while 1: pass" *I've* ever seen. and-I'll-stop-poking-holes-now-ly y'rs m. -- The rapid establishment of social ties, even of a fleeting nature, advance not only that goal but its standing in the uberconscious mesh of communal psychic, subjective, and algorithmic interbeing. But I fear I'm restating the obvious. -- Will Ware, comp.lang.python From tim.one@home.com Mon Jun 4 00:03:31 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 19:03:31 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Tim] >> It would be better in this case to go back to the top and start >> over. [Michael Hudson] > Yes. What you checked in is obviously better. I'll stick to being > the bearer of bad tidings... Hey, if it's fun, do whatever what you want! If you hadn't provoked me, I would have let it slide. Guido only cares about the end result . >> However, then an adversarial user can construct a case that never >> terminates. > I seem to have done this - it was odd, though - it only loops when I > bump the dict to fairly enormous preportions for reasons I don't > really (want to) understand. Pass it on. I deliberately "started over" via a recursive call instead of a goto so that an offending program would eventually die with a stack fault instead of just running forever. So if you're seeing something run forever, it may be a different problem. >> Unclear what to do. > Not worrying about it seems entirely reasonable I don't think anyone is happy leaving an exploitable hole in Python -- we endure enormous pain to plug those. Except, I guess, for buffer objects . I simply haven't thought of a good and efficient way to plug this one. Implementing an "internal immutable dict" type appeals to me, but it conflicts with that the affected routines believe to the core of their souls that exceptions raised during comparisons are to be ignored -- and raising a "hey, you can't change the dict *now*!" exception doesn't do the user any good if they never see it. Would plug the hole, but an *innocent* user would never know why their program failed to work as (probably) expected. From tim.one@home.com Mon Jun 4 01:38:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 3 Jun 2001 20:38:53 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010603021036.U23560@lyra.org> Message-ID: [Tim] >> because all the people who know about them don't advertise it >> because it's an easy way to provoke core dumps now. [Greg Stein] > Easy? Depends on what you use them with. "Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend). >> I haven't asked for new features, just that what's already there get >> fixed: Python-level buffer objects are unsafe > I'll fix the code. Thank you! >> the docs remain incomplete, there's random stuff like file.readinto() >> that's not documented at all (could be that's the only one -- it's >> certainly "discovered" on c.l.py often enough, though), > Find another goat to screw for that one. I don't know anything about it. > > Hmm... Using the "annotate" feature of ViewCVS, I see that Guido > added it. Go blame him if you want to scream about that function and > its lack of doc. I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess . >> and there are no buffer tests in the std test suite. The work to >> introduce the type wasn't completed, nobody works on it, and >> finishing work 3 years late doesn't count as "new feature" in my book > Now you're just being bothersome. You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them. > You want all that stuff, then feel free. "All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards. > I'll volunteer to do the code. You can go beat some heads, or find other > volunteers. Anyone else care to chip in? > I'll do the code fixing just to placate you, and to get all this ranting > about the buffer object to quiet down, but not because I'm joyful > to do it. OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick . From Barrett@stsci.edu Mon Jun 4 14:22:14 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Mon, 04 Jun 2001 09:22:14 -0400 Subject: [Python-Dev] strop vs. string References: <3B1214B3.9A4C295D@lemburg.com> Message-ID: <3B1B8B86.68E99328@STScI.Edu> "M.-A. Lemburg" wrote: > > Tim Peters wrote: > > > > [Tim] > > > About combining strop and buffers and strings, don't forget > > > unicodeobject.c: that's got oodles of basically duplicate code too. > > > /F suggested dealing with the minor differences via maintaining one > > > code file that gets compiled multiple times w/ appropriate #defines. > > > > [MAL] > > > Hmm, that only saves us a few kB in source, but certainly not > > > in the object files. > > > > That's not the point. Manually duplicated code blocks always get out of > > synch, as people fix bugs in, or enhance, one of them but don't even know > > about the others. /F brought this up after I pissed away a few hours trying > > to repair one of these in all places, and he noted that strop.replace() and > > string.replace() are woefully inefficient anyway. > > Ok, so what we'd need is a bunch of generic low-level string > operations: one set for 8-bit and one for 16-bit code. > > Looking at unicodeobject.c it seems that the section "Helpers" would > be a good start, plus perhaps a few bits from the method implementations > refactored to form a low-level string template library. > > Perhaps we should move this code into > a file stringhelpers.h which then gets included by stringobject.c > and unicodeobject.c with appropriate #defines set up for > 8-bit strings and for Unicode. > > > > The better idea would be making the types subclass from a generic > > > abstract string object -- I just don't know how this will be > > > possible with Guido's type patches. We'll just have to wait, > > > I guess. >From the discussion so far, it appears that the buffer object is intended solely to support string-like objects. I've seen no mention of their use for binary data objects, such as multidimensional arrays and matrices. Will the buffer object also support these objects? If no, then I suggest it be renamed to one that is less generic and more descriptive. On the otherhand, if yes, then I think the buffer C/API needs to be reimplemented, because the current design/implementation falls far short of what I would expect for a buffer object. First, it is overly complex: the support for multiple buffers does not appear necessary. Second, the dangling pointer issue has not been resolved. I suggest the addition of lock flag which indicates that the data is currently inaccessible, ie. that data and/or data pointer is in the process of being modified. I would suggest the following structure to be much more useful for char and binary data: typedef struct { char* rf_pointer; int rf_length; int rf_access; /* read, write, etc. */ int rf_lock; /* data is in use */ int rf_flags; /* type of data; char, binary, unicode, etc. */ } PyBufferProcs; But I'm guessing my proposal is way off base. If I find some time, I'll prepare a PEP to air these issues, since they are very important to those of us working on and with multidimensional arrays. We find the current buffer API lacking. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From fdrake@acm.org Mon Jun 4 15:07:37 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 10:07:37 -0400 (EDT) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> References: <3B1214B3.9A4C295D@lemburg.com> <3B1B8B86.68E99328@STScI.Edu> Message-ID: <15131.38441.301314.46009@cj42289-a.reston1.va.home.com> Paul Barrett writes: > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. I've seen no mention > of their use for binary data objects, such as multidimensional arrays > and matrices. Will the buffer object also support these objects? If > no, then I suggest it be renamed to one that is less generic and more > descriptive. In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure. > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, because the current design/implementation falls far > short of what I would expect for a buffer object. First, it is overly > complex: the support for multiple buffers does not appear necessary. > Second, the dangling pointer issue has not been resolved. I suggest I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection. > the addition of lock flag which indicates that the data is currently > inaccessible, ie. that data and/or data pointer is in the process of > being modified. > > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@pobox.com (Skip Montanaro) Mon Jun 4 17:29:53 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 11:29:53 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist Message-ID: <15131.46977.861815.323386@beluga.mojam.com> I recently upgraded to Mandrake 8.0. I find that the readline module is no longer getting built. When building, it builds rgbimb followed immediately by crypt. Readline, which is tested for in between, is not built. Apparently, it can't find one of the libraries required to build it. On my system, both readline and termcap are in /lib. Neither has a static version available and neither as a plain .so file available. The .so file always has a version number tacked onto the end: % ls -l /lib/libtermcap* /lib/libreadline* lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 If I create the necessary .so symlinks it builds okay. Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first one), but if it is valid for shared libraries to be installed with only a version-numbered .so file, then it seems to me that distutils ought to handle that. There are several programs in /usr/bin on my machine that seem to be dynamically linked to libreadline. In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, which suggests that the .so-without version number is valid as far as ld is concerned. Skip From Greg.Wilson@baltimore.com Mon Jun 4 18:33:29 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:33:29 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> The 'struct' module allows packing and unpacking orders to be specified, but doesn't provide a hook to report on the order used by the machine the script is running on. As I'm likely going to be using this module in future runs of my course, I'd like to add 'struct.getorder()', which would return either "<" or ">" (the characters used to signal little-endian and big-endian respectively). Does this duplicate something in some other standard module? Does it seem like a sensible idea? Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From fdrake@acm.org Mon Jun 4 18:42:28 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 13:42:28 -0400 (EDT) Subject: [Python-Dev] struct.getorder() ? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> Message-ID: <15131.51332.73137.795543@cj42289-a.reston1.va.home.com> Greg Wilson writes: > The 'struct' module allows packing and unpacking > orders to be specified, but doesn't provide a hook > to report on the order used by the machine the Python 2.0 introduced sys.byteorder; check it out: http://www.python.org/doc/current/lib/module-sys.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Greg.Wilson@baltimore.com Mon Jun 4 18:41:45 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:41:45 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1E@nsamcanms1.ca.baltimore.com> > Python 2.0 introduced sys.byteorder; check it out: > http://www.python.org/doc/current/lib/module-sys.html Woo hoo! Thanks, Fred --- should've guessed someone would be ahead of me :-). Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From barry@scottb.demon.co.uk Mon Jun 4 19:00:05 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Mon, 4 Jun 2001 19:00:05 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: <000201c0ed20$2f295c30$060210ac@private> Eric wrote: > While I'm at it, I should note that the design of the 11 was ancestral > to both the 8088 and 68000 microprocessors, and thus to essentially > every new general-purpose computer designed in the last fifteen years. The key to PDP-11 and VAX was lots of registers all a like and rich addressing modes for the instructions. The 8088 is very far from this design, its owes its design more to 4004 then the PDP-11. However the 68000 is the closer, but not as nice to program as there are too many special cases in its instruction set for my liking. BArry From mwh@python.net Mon Jun 4 19:05:10 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 19:05:10 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 11:29:53 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I recently upgraded to Mandrake 8.0. I find that the readline > module is no longer getting built. When building, it builds rgbimb > followed immediately by crypt. Readline, which is tested for in > between, is not built. Apparently, it can't find one of the > libraries required to build it. On my system, both readline and > termcap are in /lib. Neither has a static version available and > neither as a plain .so file available. The .so file always has a > version number tacked onto the end: > > % ls -l /lib/libtermcap* /lib/libreadline* > lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 > -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 > lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 > -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 > > If I create the necessary .so symlinks it builds okay. > > Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first > one), but if it is valid for shared libraries to be installed with > only a version-numbered .so file, then it seems to me that distutils > ought to handle that. Hmm. Does compiling a proggie $ gcc foo.c -lreadline work? It doesn't here if I move libreadline.so & libreadline.a out of the way. If the C compiler isn't going to find readline, there ain't much point distutils trying to find it... > There are several programs in /usr/bin on my machine that seem to be > dynamically linked to libreadline. Those things will be directly linked to libreadline.so.whatever; I believe the libfoo.so files are only for the (compile time) linker's benefit. > In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, > which suggests that the .so-without version number is valid as far > as ld is concerned. ld != ld.so. Do you need a readline-devel package or something? Cheers, M. -- It's actually a corruption of "starling". They used to be carried. Since they weighed a full pound (hence the name), they had to be carried by two starlings in tandem, with a line between them. -- Alan J Rosenthal explains "Pounds Sterling" on asr From mwh@python.net Mon Jun 4 20:01:10 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 20:01:10 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 19:03:31 -0400" References: Message-ID: "Tim Peters" writes: > >> However, then an adversarial user can construct a case that never > >> terminates. > > > I seem to have done this - it was odd, though - it only loops when I > > bump the dict to fairly enormous preportions for reasons I don't > > really (want to) understand. > > Pass it on. I deliberately "started over" via a recursive call instead of a > goto so that an offending program would eventually die with a stack fault > instead of just running forever. So if you're seeing something run forever, > it may be a different problem. I left it running overnight, and it terminated! (with a KeyError). I can't say I really understand what's going on, but I'm in Exam Hell at the moment (for the last time! Yippee!), so don't have any spare cycles to think about it hard. Anyway, this is what I was running: dict = {} # let's force dict to malloc its table for i in range(1,10000): dict[i] = i hashcode = 0 class Machiavelli2: def __eq__(self, other): global hashcode d2 = dict.copy() dict.clear() hashcode += 1 for k,v in d2.items(): dict[k] = v return 1 def __hash__(self): return hashcode dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] If you thought my last test case was contrived, I look forward to you finding adjectives for this one... Cheers, M. -- (ps: don't feed the lawyers: they just lose their fear of humans) -- Peter Wood, comp.lang.lisp From barry@digicool.com Mon Jun 4 20:42:34 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 4 Jun 2001 15:42:34 -0400 Subject: [Python-Dev] Status of 2.0.1? Message-ID: <15131.58538.121723.671374@anthem.wooz.org> I've just fixed two buglets in the regression test suite for Python 2.0.1 (release20-maint branch). Now I get the following results from regrtest: 88 tests OK. 20 tests skipped: test_al test_audioop test_cd test_cl test_dbm test_dl test_gl test_imageop test_imgfile test_largefile test_linuxaudiodev test_minidom test_nis test_pyexpat test_rgbimg test_sax test_sunaudiodev test_timing test_winreg test_winsound Has anybody else tested out the 2.0.1 branch on anything? I'm going to run some quick tests with Mailman 2.0.x on Python 2.0.1 over the next hour or so. I'm just wondering what's left to do for this release, and how I can help out. -Barry From esr@thyrsus.com Mon Jun 4 21:11:14 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 16:11:14 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <000201c0ed20$2f295c30$060210ac@private>; from barry@scottb.demon.co.uk on Mon, Jun 04, 2001 at 07:00:05PM +0100 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> Message-ID: <20010604161114.A20979@thyrsus.com> Barry Scott : > Eric wrote: > > While I'm at it, I should note that the design of the 11 was ancestral > > to both the 8088 and 68000 microprocessors, and thus to essentially > > every new general-purpose computer designed in the last fifteen years. > > The key to PDP-11 and VAX was lots of registers all a like and rich > addressing modes for the instructions. > > The 8088 is very far from this design, its owes its design more to > 4004 then the PDP-11. Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, which was descended from the 11. Admiitedly, in the chain of transmission here were two stages of redesign so bad that the connection got really tenuous. -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From skip@pobox.com (Skip Montanaro) Mon Jun 4 21:49:07 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 15:49:07 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: <15131.62531.595208.65994@beluga.mojam.com> [my readline woes snipped] Michael> Hmm. Does compiling a proggie Michael> $ gcc foo.c -lreadline Michael> work? It doesn't here if I move libreadline.so & libreadline.a Michael> out of the way. Yup, it does: beluga:tmp% cc -o foo foo.c -lreadline -ltermcap beluga:tmp% ./foo >>sdfsdfsdf sdfsdfsdf (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) In this case, foo.c is #include #include #include main() { printf("%s\n", readline(">>" )); } Michael> Do you need a readline-devel package or something? Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" does list readline-devel as the provider. I just reinstalled it using --force. Now the .so symlinks are there. Go figure... Oh well, probably ought to drop it unless another Mandrake user complains. I'm really amazed at how many packages Mandrake chose *not* to install even though I selected all the groups during install and was installing into fresh / and /usr partitions. I've been dribbling various packages in bit-by-bit as I've discovered omissions. In the past I've also noticed files apparently not installed even though the packages that were supposed to provide them were installed. Skip From guido@digicool.com Mon Jun 4 22:03:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 04 Jun 2001 17:03:35 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: Your message of "Tue, 29 May 2001 02:15:07 EDT." References: Message-ID: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > > used to extend Idle. We've used this extensively, building entire > > "applications" as Idle extensions. > > > > Now that we're moving to Python 2.1, we find the same old directions > > for extending Idle (in extend.txt), but there appears to be no > > extend.py in Idle-0.8. > > > > Does anyone know how we can add extensions to Idle-0.8? It's simpler than before. Extensions are now loaded simply by being named in config.txt (or any of the other custom configuration files). For example, ZoomHeight.py is a very simple extension; it is loaded because of the line [ZoomHeight] somewhere in config.txt. The interface for extensions is the same as before; ZoomHeight.py hasn't changed since 1999. I'll update extend.txt. Can someone forward this to the original asker of the question, or to the list where it was posted? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Mon Jun 4 22:03:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 16:03:58 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> Message-ID: <15131.63422.695297.393477@beluga.mojam.com> Eric> Yes, but the 4004 was designed as a sort of lobotomized imitation Eric> of the 65xx, which was descended from the 11. Really? I was always under the impression the 4004 was considered the first microprocessor. The page below says that and gives a date of 1971 for it. I have no idea if the author is correct, just that what he says agrees with my memory. He does seem to have an impressive collection of old computer iron: http://www.piercefuller.com/collect/i4004/ I haven't found a statement about the origins of the 6502, but this page suggests that commercial computers were being made from 8080's before 6502's: http://www.speer.org/2backup/pcbs_pch.html Ah, wait a minute... This page: http://www.geocities.com/SiliconValley/Byte/6508/6502/english/versoes.htm says the 6502 was descended from the 6800. I'm getting less and less convinced that the 4004 somehow descended from the 65xx family. (Maybe we should shift this thread to the always entertaining folks at comp.arch... ;-) Skip From esr@thyrsus.com Mon Jun 4 22:19:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 17:19:08 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <15131.63422.695297.393477@beluga.mojam.com>; from skip@pobox.com on Mon, Jun 04, 2001 at 04:03:58PM -0500 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> Message-ID: <20010604171908.A21831@thyrsus.com> Skip Montanaro : > Really? I was always under the impression the 4004 was considered the first > microprocessor. The page below says that and gives a date of 1971 for it. First sentence is widely believed, but there was an earlier micro called the Star-8 designed at Burroughs that has been almost completely forgotten. I only know about it because I worked there in 1980 with one of the people who designed it. I think I had a brain fart and it's the Z80 that was descended from the 6502. I was going by a remark in some old lecture notes. I've got a copy of the definitive reference on history of computer architecture and will check. -- Eric S. Raymond "Extremism in the defense of liberty is no vice; moderation in the pursuit of justice is no virtue." -- Barry Goldwater (actually written by Karl Hess) From mwh@python.net Mon Jun 4 22:55:34 2001 From: mwh@python.net (Michael Hudson) Date: 04 Jun 2001 22:55:34 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 15:49:07 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: Skip Montanaro writes: > [my readline woes snipped] > > Michael> Hmm. Does compiling a proggie > > Michael> $ gcc foo.c -lreadline > > Michael> work? It doesn't here if I move libreadline.so & libreadline.a > Michael> out of the way. > > Yup, it does: > > beluga:tmp% cc -o foo foo.c -lreadline -ltermcap > beluga:tmp% ./foo > >>sdfsdfsdf > sdfsdfsdf > > (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) Odd. What does the output of $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose look like? In particular the bit at the end where you get things like: attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.so failed attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.a failed attempt to open /usr/i386-redhat-linux/lib/libreadline.so failed attempt to open /usr/i386-redhat-linux/lib/libreadline.a failed attempt to open /usr/bin/../lib/libreadline.so succeeded -lreadline (/usr/bin/../lib/libreadline.so) (this is more for my personal curiosity than any important reason). > Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" > does list readline-devel as the provider. I just reinstalled it using > --force. Now the .so symlinks are there. Go figure... No :-) > Oh well, probably ought to drop it unless another Mandrake user complains. Sounds reasonable. Cheers, M. -- After a heavy night I travelled on, my face toward home - the comma being by no means guaranteed. -- paraphrased from cam.misc From tim.one@home.com Mon Jun 4 22:58:48 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 4 Jun 2001 17:58:48 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can someone forward this to the original asker of the question, or to > the list where it was posted? Done. Thanks! From skip@pobox.com (Skip Montanaro) Tue Jun 5 02:01:01 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 20:01:01 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: <15132.12109.914981.110774@beluga.mojam.com> >> (This after deleting both /lib/libreadline.so and >> /lib/libhistory.so.) Michael> Odd. What does the output of Michael> $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose Michael> look like? Well, what it looks like is "Skip's a dunce...". Turns out there was a libreadline.so symlink /usr/lib also. It found that. When I deleted that it found /usr/lib/libreadline.a. Getting rid of that caused the link to (finally) fail. With just the version-based .so files cc apparently can't do the trick. Sorry to have wasted the bandwidth. Skip From skip@pobox.com (Skip Montanaro) Tue Jun 5 02:16:00 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 4 Jun 2001 20:16:00 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604171908.A21831@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> <20010604171908.A21831@thyrsus.com> Message-ID: <15132.13008.429800.585157@beluga.mojam.com> Eric> Skip Montanaro : >> Really? I was always under the impression the 4004 was considered >> the first microprocessor. The page below says that and gives a date >> of 1971 for it. Eric> First sentence is widely believed, but there was an earlier micro Eric> called the Star-8 designed at Burroughs that has been almost Eric> completely forgotten. There was also a GE-8 (I think that was the name) developed at GE's R&D Center in the early 1970's timeframe - long before my time there. It was apparently very competitive with the other microprocessors produced about that time but never saw the light of day. I suspect that was at least due in part to the fact that GE built mainframes back then. Skip From tim.one@home.com Tue Jun 5 05:07:27 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 00:07:27 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson, taking a break from exams] > I left it running overnight, and it terminated! (with a KeyError). I > can't say I really understand what's going on, but I'm in Exam Hell at > the moment (for the last time! Yippee!), so don't have any spare > cycles to think about it hard. Good luck! I really shouldn't tell you this now, but the real reason people dread turning 30, 40, 50, 60-- and so on --is that every 10th birthday starting at 30 they test you *again*! On every course you ever took. It's grueling. The penalty for failure is severe: flunk just one review exam, and they pick a date at random over the following 10 years for you to die. No point fighting it, it's just civilization's nasty little secret. This is why life expectancy correlates with education, but it does appear that the human limit for remembering both plane geometry and the names of hundreds of dead psychopaths is about 120 years. In the meantime, I built a test case to tickle stack overflow directly, and it does so quickly: class Yuck: def __init__(self): self.i = 0 def make_dangerous(self): self.i = 1 def __hash__(self): # direct to slot 4 in table of size 8; slot 12 when size 16 return 4 + 8 def __eq__(self, other): if self.i == 0: # leave dict alone pass elif self.i == 1: # fiddle to 16 slots self.__fill_dict(6) self.i = 2 else: # fiddle to 8 slots self.__fill_dict(4) self.i = 1 return 1 def __fill_dict(self, n): self.i = 0 dict.clear() for i in range(n): dict[i] = i dict[self] = "OK!" y = Yuck() dict = {y: "OK!"} z = Yuck() y.make_dangerous() print dict[z] It just arranges to move y to a different slot in a different-sized table each time __eq__ is invoked, alternating between slot 4 in a size-8 table and slot 12 in a size-16 table. However, if I stick "print self.i" at the start of __eq__, it dies with a KeyError instead! That's why I'm mentioning it -- could be the same misdirection you're seeing. I can't account for the KeyError in any rational way: under Windows, it's actually hitting a stack overflow in the bowels of the system malloc() then. Windows "recovers" from that and presses on. Everything that happens after appears to be an accident. win98-as-usual-ly y'rs - tim PS: You'll be tested on this, too . From greg@cosc.canterbury.ac.nz Tue Jun 5 06:00:30 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Jun 2001 17:00:30 +1200 (NZST) Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> "Eric S. Raymond" : > I think it's significant that MMX > instructions and so forth entered the Intel line to support *games*, > not Navier-Stokes calculations. But when version 1.0 of FlashFlood! comes out, requiring high-quality real-time hydrodynamics simulation, Navier-Stokes calculations will suddenly become very important... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Tue Jun 5 06:18:50 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:18:50 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: [Paul Barrett] > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface. > I've seen no mention of their use for binary data objects, I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g., >>> import array >>> a = array.array('i') >>> a.append(2) >>> a.append(3) >>> a array('i', [2, 3]) >>> b = buffer(a) >>> len(b) 8 >>> [b[i] for i in range(len(b))] ['\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00'] >>> While of *some* conceivable use, that's not exactly destined to become wildly popular . > such as multidimensional arrays and matrices. Since core Python has no such things, of course it doesn't use buffers for those either. > Will the buffer object also support these objects? In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot. > ... > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, Or do you mean redesigned? > because the current design/implementation falls far short of what I > would expect for a buffer object. First, it is overly complex: the > support for multiple buffers does not appear necessary. AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there. > Second, the dangling pointer issue has not been resolved. I expect Greg will fix that now. > I suggest the addition of lock flag which indicates that the data is > currently inaccessible, ie. that data and/or data pointer is in the > process of being modified. To sell that (but please save it for the PEP ) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!". > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; > > But I'm guessing my proposal is way off base. Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. A PEP is always a good idea. From aahz@rahul.net Tue Jun 5 06:41:28 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 4 Jun 2001 22:41:28 -0700 (PDT) Subject: [Python-Dev] strop vs. string In-Reply-To: from "Tim Peters" at Jun 05, 2001 01:18:50 AM Message-ID: <20010605054129.933C199C83@waltz.rahul.net> Tim Peters wrote: > > If you do pursue this, please please please boost the rf_length field! An > int is too small to hold real-life sizes anymore, and "large files" are > becoming common even on 32-bit boxes. Python needs to grow a wholly > supported way to pass 8-byte ints around (and it looks like I'll be adding > that to the struct module, possibly to the array module and marshal too). Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From tim.one@home.com Tue Jun 5 06:53:26 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:53:26 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > So maybe there's a market for 128-bit floats after all. I think very small. There's a much larger market for 128-bit float *registers*, though -- in the "treat it as 2 64-bit, or 4 32-bit, floats, and operate on them in parallel" sense. That's the baby vector register view, and is already happening. > I'm still skeptical about how likely those applications are to > influence the architecture of general-purpose processors. I saw a > study once that said heavy-duty scientific floating point only > accounts for about 2% of the computing market -- and I think it's > significant that MMX instructions and so forth entered the Intel > line to support *games*, not Navier-Stokes calculations. Heh. I used to wonder about that, but not any more: games may have no more than entertainment (sometimes disguised as education ) in mind, but what do the latest & greatest games do? Strive to simulate physical reality (sometimes with altered physical laws), just as closely as possible. Whether it's ray-tracing, effective motion-compression, or N-body simulations, games are easily as demanding as what computational chemists do. A difference is that general-purpose *compilers* aren't being taught how to use these "new" architectural gimmicks. All that new hardware sits unused unless you've got an app dipping into assembler, or into a hand-coded utility library written in assembler. The *general* market for pure floating-point can barely support what's left of the supercomputer industry anymore (btw, Cray never became a billion-dollar company even in its heyday, and what's left of them gets passed around for peanuts now). > That 2% will have to get a lot bigger before I can see Intel doubling > its word size again. It's not just the processor design; the word size > has huge implications for buses, memory controllers, and the whole > system architecture. Intel is just now getting its foot wet with with 64-bit boxes. That was old news to me 20 years ago. All I hope to see 20 years from now is that somewhere along the way I got smart enough to drop computers and get a real life . by-then-the-whole-system-will-exist-in-the-superposition-of-a- single-plutonium-atom's-states-anyway-ly y'rs - tim From tim.one@home.com Tue Jun 5 06:55:48 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:55:48 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010605054129.933C199C83@waltz.rahul.net> Message-ID: [Aahz] > Hey! Are you discriminating against 128-bit ints? Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early . From MarkH@ActiveState.com Tue Jun 5 08:10:53 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 5 Jun 2001 17:10:53 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: > complex: the support for multiple buffers does not appear necessary. I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark. From mwh@python.net Tue Jun 5 09:39:24 2001 From: mwh@python.net (Michael Hudson) Date: Tue, 5 Jun 2001 09:39:24 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: Haven't run your example yet as my machine's not on at the moment. On Tue, 5 Jun 2001, Tim Peters wrote: > However, if I stick "print self.i" at the start of __eq__, it dies > with a KeyError instead! That's why I'm mentioning it -- could be the > same misdirection you're seeing. I can't account for the KeyError in > any rational way: under Windows, it's actually hitting a stack > overflow in the bowels of the system malloc() then. Hmm. It's quite likely that PyMem_Malloc (or whatever) crapping out and returning NULL will get turned into a MemoryError, which will then get turned into a KeyError, isn't it? I could believe that malloc would set up some fancy sigsegv-type handlers for memory management purposes which then get called when it tramples all over the end of the stack. But I'm making this up as I go along... > Windows "recovers" from that and presses on. Everything that happens > after appears to be an accident. > > win98-as-usual-ly y'rs - tim Well, linux seems to be similarly inscrutable here. One problem is that this is a pig to run under the debugger - setting a breakpoint on lookdict isn't terribly interesting way to spend your time. I suppose you could just set the breakpoint on the recursive call... later. > PS: You'll be tested on this, too . Oh, piss off . Cheers, M. From guido@digicool.com Tue Jun 5 10:07:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 05:07:34 -0400 Subject: [Python-Dev] Happy event Message-ID: <200106050907.FAA08198@cj20424-a.reston1.va.home.com> I just wanted to send a note about a happy event in the Python family. Jeremy Hylton and his wife became the proud parents of twin girls on Sunday June 3rd. Please join Pythonlabs and Digital Creations in congratulating them, and wishing them much joy and luck. Also, don't expect Jeremy to be too responsive to email for the next 6-8 weeks. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji@fourthought.com Tue Jun 5 13:28:45 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:28:45 -0600 Subject: [Python-Dev] One more dict trick In-Reply-To: Message from Greg Ewing of "Tue, 05 Jun 2001 17:00:30 +1200." <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> Message-ID: <200106051228.f55CSjk18336@localhost.local> > "Eric S. Raymond" : > > > I think it's significant that MMX > > instructions and so forth entered the Intel line to support *games*, > > not Navier-Stokes calculations. > > But when version 1.0 of FlashFlood! comes out, requiring > high-quality real-time hydrodynamics simulation, > Navier-Stokes calculations will suddenly become very > important... Shoot, I thought that was what Microsoft Hailstorm was all about. Path integrals about the atmospheric isobars, and all that... -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Tue Jun 5 13:32:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:32:07 -0600 Subject: [Python-Dev] Happy event In-Reply-To: Message from Guido van Rossum of "Tue, 05 Jun 2001 05:07:34 EDT." <200106050907.FAA08198@cj20424-a.reston1.va.home.com> Message-ID: <200106051232.f55CW7618353@localhost.local> > I just wanted to send a note about a happy event in the Python family. > Jeremy Hylton and his wife became the proud parents of twin girls on > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > congratulating them, and wishing them much joy and luck. > > Also, don't expect Jeremy to be too responsive to email for the next > 6-8 weeks. :) *twin* girls? Try 6-8 years. Congrats and felicits of the highest order, of course, Jeremy. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Barrett@stsci.edu Tue Jun 5 13:53:46 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Tue, 05 Jun 2001 08:53:46 -0400 Subject: [Python-Dev] Happy event References: <200106051232.f55CW7618353@localhost.local> Message-ID: <3B1CD65A.595E8CD@STScI.Edu> Uche Ogbuji wrote: > > > I just wanted to send a note about a happy event in the Python family. > > Jeremy Hylton and his wife became the proud parents of twin girls on > > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > > congratulating them, and wishing them much joy and luck. > > > > Also, don't expect Jeremy to be too responsive to email for the next > > 6-8 weeks. :) > > *twin* girls? Try 6-8 years. > > Congrats and felicits of the highest order, of course, Jeremy. Actually girls are fine until about 13, after that I expect Jeremy won't be too responsive. Something about hormones and such. In any case, all the best, Jeremy! -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From aahz@rahul.net Tue Jun 5 15:41:10 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <3B1CD65A.595E8CD@STScI.Edu> from "Paul Barrett" at Jun 05, 2001 08:53:46 AM Message-ID: <20010605144110.DD90C99C84@waltz.rahul.net> Paul Barrett wrote: > Uche Ogbuji wrote: >> Guido: >>> >>> Also, don't expect Jeremy to be too responsive to email for the next >>> 6-8 weeks. :) >> >> *twin* girls? Try 6-8 years. > > Actually girls are fine until about 13, after that I expect Jeremy > won't be too responsive. Something about hormones and such. Are you trying to imply that there's a difference between girls and boys? compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr@thyrsus.com Tue Jun 5 15:55:59 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 10:55:59 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 07:41:10AM -0700 References: <3B1CD65A.595E8CD@STScI.Edu> <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: <20010605105559.A28963@thyrsus.com> Aahz Maruch : > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? Of course there's a difference. Girls, er, *mature* sooner. Congratulations, Jeremy! -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From Samuele Pedroni Tue Jun 5 16:05:03 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Tue, 5 Jun 2001 17:05:03 +0200 (MET DST) Subject: [Python-Dev] Happy event Message-ID: <200106051505.RAA24810@core.inf.ethz.ch> > Subject: Re: [Python-Dev] Happy event > To: Barrett@stsci.edu (Paul Barrett) > Cc: python-dev@python.org > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > From: aahz@rahul.net (Aahz Maruch) > X-BeenThere: python-dev@python.org > X-Mailman-Version: 2.0.5 (101270) > List-Help: > List-Post: > List-Subscribe: , > List-Id: Python core developers > List-Unsubscribe: , > List-Archive: > Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) > > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? > > compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs > -- The simple fact that we are still moving from the previous bad habit of considering them different to considering them equal just implies/evolves differences. A neutral view-point would be: the N/S ratio between gender-phisiological- differences and the overall interpersonal differences is very big, at least when considering the whole personality and not single aspects. There is no established truth, we are just longing for equiblibrium: in the actual transition phase boys and girls are under different kind of cultural tensions related to self-identification,etc ... this makes differences. regards, Samuele Pedroni. From aahz@rahul.net Tue Jun 5 16:17:38 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 08:17:38 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <20010605105559.A28963@thyrsus.com> from "Eric S. Raymond" at Jun 05, 2001 10:55:59 AM Message-ID: <20010605151739.3864199C83@waltz.rahul.net> Eric S. Raymond wrote: > Aahz Maruch : >> >> Are you trying to imply that there's a difference between girls and >> boys? > > Of course there's a difference. Girls, er, *mature* sooner. Not legally. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr@thyrsus.com Tue Jun 5 16:30:08 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 11:30:08 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605151739.3864199C83@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 08:17:38AM -0700 References: <20010605105559.A28963@thyrsus.com> <20010605151739.3864199C83@waltz.rahul.net> Message-ID: <20010605113008.A29236@thyrsus.com> Aahz Maruch : > Eric S. Raymond wrote: > > Aahz Maruch : > >> > >> Are you trying to imply that there's a difference between girls and > >> boys? > > > > Of course there's a difference. Girls, er, *mature* sooner. > > Not legally. My point was that the hormone thing is likely to be an issue sooner with twin girls. Hey, Jeremy...fraternal or identical? -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From guido@digicool.com Tue Jun 5 18:21:32 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 13:21:32 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106051721.f55HLW729400@odiug.digicool.com> While thinking about metatypes, I had an interesting idea. In PEP 252 and 253 (which still need much work, please bear with me!) I describe making classes and types more similar to each other. In particular, you'll be able to subclass built-in object types in much the same way as you can subclass user-defined classes today. One nice property of classes is that a class is a factory function for its instances; in other words, if C is a class, C() returns a C instance. Now, for built-in types, it makes sense to do the same. In my current prototype, after "from types import *", DictType() returns an empty dictionary and ListType() returns an empty list. It would be nice take this much further: IntType() could return an integer, TupleType() could return a tuple, StringType() could return a string, and so on. These are immutable types, so to make this useful, these constructors need to take an argument to specify a specific value. What should the type of such an argument be? It's not very interesting to require that int(x) takes an integer argument! Most of the popular standard types already have a constructor function that's named after their type: int(), long(), float(), complex(), str(), unicode(), tuple(), list() We could make the constructor take the same argument(s) as the corresponding built-in function. Now invoke the Zen of Python: "There should be one-- and preferably only one --obvious way to do it." So why not make these built-in functions *be* the corresponding types? Then instead of >>> int you would see >>> int but otherwise the behavior would be identical. (Note that I don't require that a factory function returns a *new* object each time.) If we did this for all built-in types, we'd have to add maybe a dozen new built-in names -- I think that's no big deal and actually helps naming types. The types module, with its awkward names and usage, can be deprecated. There are details to be worked out, e.g. - Do we really want to have built-in names for code objects, traceback objects, and other figments of Python's internal workings? - What should the argument to dict() be? A list of (key, value) pairs, a list of alternating keys and values, or something else? - What else? Comments? --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Tue Jun 5 18:34:35 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 5 Jun 2001 19:34:35 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <001301c0ede5$cb804a10$e46940d5@hagrid> guido wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? +1 from here. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? nope. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? how about supporting the following: d == dict(d.items()) d == dict(d.keys(), d.values()) and also: d = dict(k=v, k=v, ...) Cheers /F From ping@lfw.org Tue Jun 5 18:41:22 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 5 Jun 2001 12:41:22 -0500 (CDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > I'm all in favour of this. In fact, i had the impression that you were planning to do exactly this all along. I seem to recall some conversation about this a long time ago -- am i dreaming? > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. I would love this. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Perhaps we would only provide built-in names for objects that are commonly constructed. For things like code objects that are never user-constructed, their type objects could be set aside in a module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A list of (key, value) pairs. It's the only sensible choice, given that dict.items() is the obvious way to get all the information out of a dictionary into a list. -- ?!ng From aahz@rahul.net Tue Jun 5 18:40:27 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 10:40:27 -0700 (PDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> from "Guido van Rossum" at Jun 05, 2001 01:21:32 PM Message-ID: <20010605174027.17A4199C83@waltz.rahul.net> I'm +1 on the general concept; I think it will make explaining Python easier in the long run. I'm not competent to vote on the details, but I'll complain if something seems too confused to me. Currently in the Decimal class I'm working on, I can take any of the following types in the constructor: Decimal, tuple, string, int, float. I'm wondering whether that approach makes sense, that any "compatible" type should be accepted in an explicit constructor. So for your question about dict(), perhaps any sequence/iterator type that returns 2-element sequences would be be accepted. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From Donald Beaudry Tue Jun 5 18:50:34 2001 From: Donald Beaudry (Donald Beaudry) Date: Tue, 05 Jun 2001 13:50:34 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <200106051750.NAA25458@localhost.localdomain> Guido van Rossum wrote, > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? I like it! > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) Of course... singletons (which would also break that requirement) are quite useful. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I dont think so. Having easy access to these things might be good but since they are implementation specific it might be best to discourage their use by putting them somewhere more implementation specific, like the newmodule or even sys. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? At a minimum, I'd like to see a list of key/value tuples. I seem to find myself reconstructing dicts from the .items() of other dicts. For 'something else', I'd like to be able to pass keyword arguments to initialize the new dict. Going really crazy, I'd like to be able to pass a dict as an argument to dict()... just another way to spell copy, but combined with keywords, it would be more like copy followed by an update. > - What else? Well, since you are asking ;) I havnt read the PEP, so perhaps I shouldnt be commenting just yet, but. I'd hope that the built-in types are sub-classable from C as well as from Python. This is most interesting for types like instance, class, method, but I can imagine reasons for doing it to tuple, list, dict, and even int. > Comments? Fantastic! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...Will hack for sushi... From mal@lemburg.com Tue Jun 5 18:53:18 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 19:53:18 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3B1D1C8E.B7770419@lemburg.com> Guido van Rossum wrote: > > While thinking about metatypes, I had an interesting idea. > > In PEP 252 and 253 (which still need much work, please bear with me!) > I describe making classes and types more similar to each other. In > particular, you'll be able to subclass built-in object types in much > the same way as you can subclass user-defined classes today. One nice > property of classes is that a class is a factory function for its > instances; in other words, if C is a class, C() returns a C instance. > > Now, for built-in types, it makes sense to do the same. In my current > prototype, after "from types import *", DictType() returns an empty > dictionary and ListType() returns an empty list. It would be nice > take this much further: IntType() could return an integer, TupleType() > could return a tuple, StringType() could return a string, and so on. > These are immutable types, so to make this useful, these constructors > need to take an argument to specify a specific value. What should the > type of such an argument be? It's not very interesting to require > that int(x) takes an integer argument! > > Most of the popular standard types already have a constructor function > that's named after their type: > > int(), long(), float(), complex(), str(), unicode(), tuple(), list() > > We could make the constructor take the same argument(s) as the > corresponding built-in function. > > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > > > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) -1 While this looks cute, I think it would break a lot of introspection code or other code which special cases Python functions for some reason since type(int) would no longer return types.BuiltinFunctionType. If you don't like the names, why not take the change and create a new module which then exposes the Python class hierarchy (much like we did with the exceptions.py module before it was intregrated as C module) ?! > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Not really. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? As function, I'd say: take either a sequence of tuples or another dictionary as argument. mxTools already has such a function, BTW. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Tue Jun 5 19:12:09 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 13:12:09 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <15133.8441.983687.572159@beluga.mojam.com> Just catching up on a little c.l.py and I noticed the effbot's response to the Unicode degree inquiry. I tried to create and print one and got this: % python Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 Type "copyright", "credits" or "license" for more information. >>> u"\N{DEGREE SIGN}" u'\xb0' >>> print u"\N{DEGREE SIGN}" Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Shouldn't I be able to print arbitrary Unicode objects? What am I missing (this time)? Skip From mwh@python.net Tue Jun 5 19:16:52 2001 From: mwh@python.net (Michael Hudson) Date: 05 Jun 2001 19:16:52 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 13:12:09 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Just catching up on a little c.l.py and I noticed the effbot's response to > the Unicode degree inquiry. I tried to create and print one and got this: > > % python > Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) > [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> u"\N{DEGREE SIGN}" > u'\xb0' > >>> print u"\N{DEGREE SIGN}" > > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Shouldn't I be able to print arbitrary Unicode objects? What am I missing > (this time)? The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") ° Cheers, Skippy's little helper. -- In case you're not a computer person, I should probably point out that "Real Soon Now" is a technical term meaning "sometime before the heat-death of the universe, maybe". -- Scott Fahlman From guido@digicool.com Tue Jun 5 19:26:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:26:22 -0400 Subject: [Python-Dev] SourceForget Python Foundry needs help Message-ID: <200106051826.f55IQMS29540@odiug.digicool.com> The Python Foundry at SF could use a hand. If you're interested in helping out, please write to Chuck Esterbrook, below! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Tue, 05 Jun 2001 14:12:07 -0400 From: Chuck Esterbrook To: guido@python.org Subject: SourceForget Python Foundry Hi Guido, I'm one of the admins of the SourceForge Python Foundry. In case you're not familiar with them, foundries are simply SF web portals centered around a particular topic. Admins can customize the HTML text and graphics and SourceForge stats are integrated on the side. I haven't had much time to give the Python Foundry the attention it deserves. I was wondering if you knew of anyone who had the inclination, time and energy to join the Foundry as an admin and expand it. If it becomes strong enough, we could possibly get it featured on the sidebar of the main SF page, which would then bring more attention to Python and its related projects. The foundry is at: http://sourceforge.net/foundry/python-foundry/ - -Chuck ------- End of Forwarded Message From barry@digicool.com Tue Jun 5 19:31:12 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 14:31:12 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.9584.871074.255497@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Now invoke the Zen of Python: "There should be one-- and GvR> preferably only one --obvious way to do it." So why not make GvR> these built-in functions *be* the corresponding types? Then GvR> instead of >> int GvR> GvR> you would see >> int GvR> +1 GvR> but otherwise the behavior would be identical. (Note that I GvR> don't require that a factory function returns a *new* object GvR> each time.) GvR> If we did this for all built-in types, we'd have to add maybe GvR> a dozen new built-in names -- I think that's no big deal and GvR> actually helps naming types. The types module, with its GvR> awkward names and usage, can be deprecated. I'm a little concerned about this, since the names that would be added are probably in common use as variable and/or argument names. I.e. At one point `list' was a very common identifier in Mailman, and I'm sure `dict' is used quite often still. I guess this would be okay as long as working code doesn't break because of it. OTOH, I've had fewer needs for a dict builtin (though not non-zero), and easily zero needs for traceback objects, code objects, etc. GvR> There are details to be worked out, e.g. GvR> - Do we really want to have built-in names for code objects, GvR> traceback objects, and other figments of Python's internal GvR> workings? I'd say no. However, we could probably C-ify the types module, a la, the exceptions module, and that would be the logical place to put the type factories. GvR> - What should the argument to dict() be? A list of (key, GvR> value) pairs, a list of alternating keys and values, or GvR> something else? You definitely want to at least accept a sequence of key/value 2-tuples, so that d.items() can be retransformed into a dictionary object. -Barry From guido@digicool.com Tue Jun 5 19:38:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:38:23 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 14:31:12 EDT." <15133.9584.871074.255497@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> Message-ID: <200106051838.f55IcNk29624@odiug.digicool.com> > I'm a little concerned about this, since the names that would be added > are probably in common use as variable and/or argument names. I.e. At > one point `list' was a very common identifier in Mailman, and I'm sure > `dict' is used quite often still. I guess this would be okay as long > as working code doesn't break because of it. It would be hard to see how this would break code, since built-ins are searched *after* all variables that the user defines. --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn@worldonline.dk Tue Jun 5 19:46:04 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Tue, 05 Jun 2001 18:46:04 GMT Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3b1d2894.16564838@smtp.worldonline.dk> [Guido] >Now invoke the Zen of Python: "There should be one-- and preferably >only one --obvious way to do it." So why not make these built-in >functions *be* the corresponding types? Then instead of > > >>> int > > >you would see > > >>> int > > >but otherwise the behavior would be identical. (Note that I don't >require that a factory function returns a *new* object each time.) I think that it will be difficult to avoid creating a new object under jython because calling a type already directly calls the type's java constructor. >If we did this for all built-in types, we'd have to add maybe a dozen >new built-in names -- I think that's no big deal and actually helps >naming types. The types module, with its awkward names and usage, can >be deprecated. > >There are details to be worked out, e.g. > >- Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? > >- What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? Jython already interprets the arguments to the dict type as alternating key/values: >>> from types import DictType as dict >>> dict('a', 97, 'b', 98, 'c', 99) {'b': 98, 'a': 97, 'c': 99} >>> This behaviour isn't documented on the python side so it can be changed. However, it it is necessary to maintain this API on the java side and we have currently no way to prevent the type constructors from being visible and callable from python. Whatever is decided, I hope jython can keep the current semantics of its dict type. regards, finn From fdrake@acm.org Tue Jun 5 20:11:58 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 5 Jun 2001 15:11:58 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3b1d2894.16564838@smtp.worldonline.dk> References: <200106051721.f55HLW729400@odiug.digicool.com> <3b1d2894.16564838@smtp.worldonline.dk> Message-ID: <15133.12030.538647.295809@cj42289-a.reston1.va.home.com> Finn Bock writes: > >>> from types import DictType as dict > >>> dict('a', 97, 'b', 98, 'c', 99) > {'b': 98, 'a': 97, 'c': 99} > >>> > > This behaviour isn't documented on the python side so it can be changed. > However, it it is necessary to maintain this API on the java side and we > have currently no way to prevent the type constructors from being > visible and callable from python. This should not be a problem: If dict() is called with one arg, the new semantics can be used, but with an odd number of args, your existing semantics can be used. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@pobox.com (Skip Montanaro) Tue Jun 5 20:23:54 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 14:23:54 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: <15133.12746.666351.127286@beluga.mojam.com> Me> [what am I missing?] Michael> The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") =B0 Hmmm... I don't believe I've ever encountered an object in Python befor= e that you couldn't simply print. Are Unicode objects unique in this res= pect? Seems like a bug (or at least a feature) to me. Skip From mwh@python.net Tue Jun 5 20:31:33 2001 From: mwh@python.net (Michael Hudson) Date: 05 Jun 2001 20:31:33 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 14:23:54 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Me> [what am I missing?] > > Michael> The encoding: > > >>> print u"\N{DEGREE SIGN}".encode("latin1") > ° > > Hmmm... I don't believe I've ever encountered an object in Python before > that you couldn't simply print. Are Unicode objects unique in this respect? > Seems like a bug (or at least a feature) to me. Well, what would you have >>> print u"\N{DEGREE SIGN}" (or equivalently str(u"\N{DEGREE SIGN}") since we're eventually going to have to stuff an 8-bit string down stdout) do? I don't think >>> print u"\N{DEGREE SIGN}" u'\xb0' is really an option. This is old news. It must have been discussed here before 1.6, I'd have thought. Cheers, M. -- 58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From barry@digicool.com Tue Jun 5 20:46:54 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 15:46:54 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> Message-ID: <15133.14126.221568.235269@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm a little concerned about this, since the names that would >> be added are probably in common use as variable and/or argument >> names. I.e. At one point `list' was a very common identifier >> in Mailman, and I'm sure `dict' is used quite often still. I >> guess this would be okay as long as working code doesn't break >> because of it. GvR> It would be hard to see how this would break code, since GvR> built-ins are searched *after* all variables that the user GvR> defines. Wasn't there talk about issuing warnings for locals shadowing built-ins (or was that globals?). If not, fergitaboutit. If so, that would fall under the category of "breaking". -Barry From tim@digicool.com Tue Jun 5 20:56:59 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 15:56:59 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: Just to reduce this to its most trivial point , > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? the middle one (perhaps generalized to "iterable object alternately producing keys and values") is most useful in practice. Perl gets a lot of mileage of that, e.g. think of using re.findall() to build a list of mail-header field, value, field, value, ... thingies to feed to a dict. A list of (key, value) pairs is prettiest, but almost nothing *produces* such a list except for dict.items(); we don't need another way to spell dict.copy(). From guido@digicool.com Tue Jun 5 20:56:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 15:56:05 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 15:46:54 EDT." <15133.14126.221568.235269@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> Message-ID: <200106051956.f55Ju5130078@odiug.digicool.com> > >>>>> "GvR" == Guido van Rossum writes: > > >> I'm a little concerned about this, since the names that would > >> be added are probably in common use as variable and/or argument > >> names. I.e. At one point `list' was a very common identifier > >> in Mailman, and I'm sure `dict' is used quite often still. I > >> guess this would be okay as long as working code doesn't break > >> because of it. > > GvR> It would be hard to see how this would break code, since > GvR> built-ins are searched *after* all variables that the user > GvR> defines. > > Wasn't there talk about issuing warnings for locals shadowing > built-ins (or was that globals?). If not, fergitaboutit. If so, that > would fall under the category of "breaking". > > -Barry You may be thinking of this: >>> def f(int): def g(): int :1: SyntaxWarning: local name 'int' in 'f' shadows use of 'int' as global in nested scope 'g' >>> This warns you when you override a built-in or global *and* you use that same name in a nested function. This code will mean something different in 2.2 anyway (g's reference to int will become a reference to f's int because of nested scopes). But this does not cause a warning: >>> def g(): int = 12 >>> Nor does this: >>> int = 12 >>> So we're safe. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:01:47 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:01:47 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: <15133.15019.237484.605267@beluga.mojam.com> Michael> Well, what would you have >>>> print u"\N{DEGREE SIGN}" Michael> (or equivalently Michael> str(u"\N{DEGREE SIGN}") Michael> since we're eventually going to have to stuff an 8-bit string Michael> down stdout) do? How about if print calls the .encode("latin1") method for me it gets an ASCII encoding error? If "latin1" isn't a reasonable default choice, it could pick an encoding based on the current locale. Michael> I don't think >>>> print u"\N{DEGREE SIGN}" Michael> u'\xb0' Michael> is really an option. I agree. I'd like to see a little circle. Michael> This is old news. It must have been discussed here before 1.6, Michael> I'd have thought. Perhaps, but I suspect many people suffered from glazing over of the eyes reading all that the messages exchanged about Unicode arcana. I know I did. Skip From barry@digicool.com Tue Jun 5 21:01:29 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 16:01:29 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> <200106051956.f55Ju5130078@odiug.digicool.com> Message-ID: <15133.15001.19308.108288@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> You may be thinking of this: Yup. GvR> So we're safe. Cool! Count me as a solid +1 then. -Barry From aahz@rahul.net Tue Jun 5 21:10:06 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 13:10:06 -0700 (PDT) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <15133.15019.237484.605267@beluga.mojam.com> from "Skip Montanaro" at Jun 05, 2001 03:01:47 PM Message-ID: <20010605201006.15CAD99C83@waltz.rahul.net> Skip Montanaro wrote: > > Perhaps, but I suspect many people suffered from glazing over of the eyes > reading all that the messages exchanged about Unicode arcana. I know I did. Ditto. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From mal@lemburg.com Tue Jun 5 21:14:39 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:14:39 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> Message-ID: <3B1D3DAF.DAE727AE@lemburg.com> > > [Guido] > > Now invoke the Zen of Python: "There should be one-- and preferably > > only one --obvious way to do it." So why not make these built-in > > functions *be* the corresponding types? Then instead of > > > > >>> int > > > > > > you would see > > > > >>> int > > > > > > but otherwise the behavior would be identical. (Note that I don't > > require that a factory function returns a *new* object each time.) > > -1 > > While this looks cute, I think it would break a lot of introspection > code or other code which special cases Python functions for > some reason since type(int) would no longer return > types.BuiltinFunctionType. > > If you don't like the names, why not take the change and > create a new module which then exposes the Python class hierarchy > (much like we did with the exceptions.py module before it was > intregrated as C module) ?! Looks like I'm alone with my uncertain feeling about this move... oh well. BTW, we should consider having more than one contructor for an object rather than trying to stuff all possible options and parameters into one overloaded super-constructor. I've done this in many of my mx extensions and have so far had great success with it (better programming error detection, better docs, more intuitive interfaces, etc.). In that sense, more than one way to do something will actually help clarify what the programmer really wanted. Just a thought... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue Jun 5 21:16:02 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:16:02 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> Message-ID: <3B1D3E02.3C9AE1F4@lemburg.com> Skip Montanaro wrote: > > Michael> Well, what would you have > > >>>> print u"\N{DEGREE SIGN}" > > Michael> (or equivalently > > Michael> str(u"\N{DEGREE SIGN}") > > Michael> since we're eventually going to have to stuff an 8-bit string > Michael> down stdout) do? > > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. Please see Lib/site.py for details on how to enable all these goodies -- it's all there, just disabled and meant for super-users only ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Tue Jun 5 21:22:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 16:22:43 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 22:14:39 +0200." <3B1D3DAF.DAE727AE@lemburg.com> References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> Message-ID: <200106052022.f55KMhq30227@odiug.digicool.com> > > -1 > > > > While this looks cute, I think it would break a lot of introspection > > code or other code which special cases Python functions for > > some reason since type(int) would no longer return > > types.BuiltinFunctionType. > > Looks like I'm alone with my uncertain feeling about this move... > oh well. Well, I don't see how someone could be doing introspection on int and be confused when it's not a function -- either you (think you) know it's a function, so you use it as a function without introspecting it, and that continues to work; or you're open to all possibilities, and then you'll introspect it, and then you'll discover what it is. > BTW, we should consider having more than one contructor for an > object rather than trying to stuff all possible options and parameters > into one overloaded super-constructor. I've done this in many of > my mx extensions and have so far had great success with it (better > programming error detection, better docs, more intuitive interfaces, > etc.). In that sense, more than one way to do something will > actually help clarify what the programmer really wanted. Just > a thought... Yes, but the other ways are spelled as factory functions. Maybe, *maybe* the other factory functions could be class-methods, but don't hold your hopes high. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Tue Jun 5 21:30:18 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Jun 2001 22:30:18 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <200106052030.f55KUIu02762@mira.informatik.hu-berlin.de> > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. These are both bad ideas. First, there is no guarantee that your terminal is capable of displaying the circle at all. Maybe the typewriter connected to your computer doesn't even have a degree type. Further, maybe it does support displaying the degree sign, but then it likely fails for >>> print u"\N{EURO SIGN}" Or, worse, instead of displaying the EURO SIGN, it may just display the CURRENCY SIGN (since it may chose to use ISO-8859-15, but the terminal assumes ISO-8859-1). So unless you can come up with a really good way to find out what the terminal is capable of displaying (plus finding out how to make it display these things), I think Python is better off raising an exception than producing garbage output. In addition, what you see is the "default encoding", i.e. it doesn't just apply to print; it also applies to all places where Unicode objects are converted into byte strings. Assuming any default other than ASCII has been considered as a bad idea by the authors of the Unicode support. IMO, the next-most reasonable default would have been UTF-8, *not* Latin-1, since UTF-8 can represent the EURO SIGN and every other character in Unicode. Most likely, you terminal will have difficulties producing a circle symbol when it gets the UTF-8 representation of the DEGREE SIGN, though. So the best thing is still to give it into the hands of the application author. As MAL points out, the administrator can give a different default encoding in site.py. Since the default default is ASCII, applications assuming that the default is ASCII won't break on your system. OTOH, applications developed on your system may then break elsewhere, since the default in site.py might be different. Regards, Martin From sdm7g@Virginia.EDU Tue Jun 5 21:41:11 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Tue, 5 Jun 2001 16:41:11 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I would say to put all of the common constructors in __builtin__, and all of the odd ducks can go into the new module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A varargs list of (key,value) tuples would probably be most useful. Since most of these functions, before being classed as constructors, were considered coercion function, I wouldn't be against having it try to do something sensible with a variety of args. -- sdm From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:47:17 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:47:17 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1D3E02.3C9AE1F4@lemburg.com> References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> Message-ID: <15133.17749.390756.115544@beluga.mojam.com> mal> Please see Lib/site.py for details on how to enable all these mal> goodies -- it's all there, just disabled and meant for super-u= sers mal> only ;-) Okay, I found the encoding section. I changed the encoding variable assignment to be encoding =3D "latin1" and now the degree sign print works. What other side-effects will that= have besides on printed representations? It appears I can create (but not s= ee properly?) variable names containing latin1 characters: >>> =FCmlaut =3D "=FCmlaut" >>> print locals().keys() ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpa= t', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__nam= e__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'missi= on', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] I am having trouble printing some strings containing latin1 characters:= >>> print =FCmlaut mlaut >>> type("=FCmlaut") >>> type(string.letters) >>> print "=FCmlaut" mlaut >>> print string.letters abcdefghijklmnopqrstuvwxyz=B5=DF=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC= =ED=EE=EF=F0=F1=F2=F3=F4=F5=F6=F8=F9=FA=FB=FC=FD=FE=FFABCDEFGHIJKLMNOPQ= RSTUVWXYZ=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4= =D5=D6=D8=D9=DA=DB=DC=DD=DE >>> print string.letters[55:] =FC=FD=FE=FFABCDEFGHIJKLMNOPQRSTUVWXYZ=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA= =CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=D8=D9=DA=DB=DC=DD=DE The above was pasted from Python running in a shell session in XEmacs, = which is certainly latin1-aware. Why did I have trouble seeing the =FC in so= me situations, but not in others? Are the ramifications of all this encoding stuff documented somewhere? Skip From skip@pobox.com (Skip Montanaro) Tue Jun 5 21:56:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 5 Jun 2001 15:56:58 -0500 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.18330.910736.249838@beluga.mojam.com> Is the intent of using int and friends as constructors instead of just coercion functions that I should (eventually) be able to do this: class NonNegativeInt(int): def __init__(self, val): if int(val) < 0: raise ValueError, "Value must be >= 0" int.__init__(self, val) self.a = 47 ... ? Skip From tim@digicool.com Tue Jun 5 22:01:23 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:01:23 -0400 Subject: [Python-Dev] another dict crasher Message-ID: [Tim's dict-crasher dies w/ a stack overflow, but with a KeyError when he sticks a print inside __eq__] OK, I understand this now, at least on Windows. In PyObject_Print(), #ifdef USE_STACKCHECK if (PyOS_CheckStack()) { PyErr_SetString(PyExc_MemoryError, "stack overflow"); return -1; } #endif On Windows, PyOs_CheckStack() is __try { /* _alloca throws a stack overflow exception if there's not enough space left on the stack */ _alloca(PYOS_STACK_MARGIN * sizeof(void*)); return 0; } __except (EXCEPTION_EXECUTE_HANDLER) { /* just ignore all errors */ } return 1; The _alloca dies, so the __except falls thru and PyOs_CheckStack returns 1. PyObject_Print sets the "stack overflow" error and returns -1. This winds its way thru the rich comparison attempt, until lookdict() sees it and says, Hmm. I can't compare this thing without raising error. So this can't be the key I'm looking for. First I'll clear the error. Hmm. Can't find it anywhere else in the dict either. Hmm. There were no errors pending at the time I got called, so I'll leave things that way and return "not found". At that point about 15,000 levels of recursion unwind, and KeyError gets raised. I don't believe PyOS_CheckStack() is implemented on Unixoid systems (just Windows and Macs), so some other accident must account for the KeyError on Linux. Remains unclear what to do about it; the idea that all errors raised by dict lookup comparisons are ignorable is sure a tempting target. From mal@lemburg.com Tue Jun 5 22:00:23 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 23:00:23 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1D4866.A40AAB1C@lemburg.com> Skip Montanaro wrote: > > mal> Please see Lib/site.py for details on how to enable all these > mal> goodies -- it's all there, just disabled and meant for super-users > mal> only ;-) > > Okay, I found the encoding section. I changed the encoding variable > assignment to be > > encoding = "latin1" > > and now the degree sign print works. What other side-effects will that have > besides on printed representations? It appears I can create (but not see > properly?) variable names containing latin1 characters: > > >>> ümlaut = "ümlaut" Huh ? That should not be possible ! Python literals are still ASCII. >>> ümlaut = 'ümlaut' File "", line 1 ümlaut = 'ümlaut' ^ SyntaxError: invalid syntax > >>> print locals().keys() > ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] > > I am having trouble printing some strings containing latin1 characters: > > >>> print ümlaut > mlaut > >>> type("ümlaut") > > >>> type(string.letters) > > >>> print "ümlaut" > mlaut > >>> print string.letters > abcdefghijklmnopqrstuvwxyzµßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ > >>> print string.letters[55:] > üýþÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ > > The above was pasted from Python running in a shell session in XEmacs, which > is certainly latin1-aware. Why did I have trouble seeing the ü in some > situations, but not in others? No idea what's going on there... the encoding parameter should not have any effect on printing normal 8-bit strings. It only defines the standard encoding used in coercion and auto-conversion from Unicode to 8-bit strings and vice-versa. > Are the ramifications of all this encoding stuff documented somewhere? The basic things can be found in Misc/unicode.txt, on the i18n sig page and some resources on the web. I'll give a talk in Bordeaux about Unicode too, which will probably provide some additional help as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Tue Jun 5 22:14:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 17:14:07 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 16:59:01 EDT." References: Message-ID: <200106052114.f55LE7P30481@odiug.digicool.com> > Is the intent of using int and friends as constructors instead of just > coercion functions that I should (eventually) be able to do this: > > class NonNegativeInt(int): > def __init__(self, val): > if int(val) < 0: > raise ValueError, "Value must be >= 0" > int.__init__(self, val) > self.a = 47 > ... > > ? Yes, sort-of. The details will be slightly different. I'm not comfortable with letting a user-provided __init__() method change the value of self, so I am brooding on a work-around that separates allocation and one-time initialization from __init__(). Watch PEP 253. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@digicool.com Tue Jun 5 22:16:03 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:16:03 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: [MAL, to Skip] > Huh ? That should not be possible ! Python literals are still > ASCII. > > >>> ümlaut = 'ümlaut' > File "", line 1 > ümlaut = 'ümlaut' > ^ > SyntaxError: invalid syntax That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug . From gward@python.net Tue Jun 5 23:29:49 2001 From: gward@python.net (Greg Ward) Date: Tue, 5 Jun 2001 18:29:49 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com>; from guido@digicool.com on Tue, Jun 05, 2001 at 01:21:32PM -0400 References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <20010605182949.A7545@gerg.ca> On 05 June 2001, Guido van Rossum said: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 from me too. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. Cool! > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Probably not, as long as they are accessible somewhere. I could live with either a C-ified 'types' module or shoving these into the 'new' module, although I think I prefer the latter slightly. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? I love /F's suggestion dict(k=v, k=v, ...) but that's icing on the cake -- cool feature, looks pretty, etc. (And *finally* Python will have all the syntactic sugar that Perl programmers like to have. ;-) I think the real answer should be dict(k, v, k, v) like Jython. If both can be supported, that would be swell. Greg -- Greg Ward - Linux geek gward@python.net http://starship.python.net/~gward/ Does your DRESSING ROOM have enough ASPARAGUS? From barry@digicool.com Tue Jun 5 23:45:00 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 18:45:00 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <15133.24812.791796.557452@anthem.wooz.org> >>>>> "GW" == Greg Ward writes: GW> I love /F's suggestion GW> dict(k=v, k=v, ...) One problem with this syntax is that the `k's can only be valid Python identifiers, so you'd at least need /some/ other syntax to support construction with arbitrary hashable keys. -Barry From fredrik@pythonware.com Tue Jun 5 23:57:43 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 6 Jun 2001 00:57:43 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <011f01c0ee12$eeda9ba0$0900a8c0@spiff> greg wrote: > > - What should the argument to dict() be? A list of (key, value) > > pairs, a list of alternating keys and values, or something else? > > I love /F's suggestion > > dict(k=v, k=v, ...) > > but that's icing on the cake -- cool feature, looks pretty, etc. note that the python interpreter builds that dictionary for you if you use the METH_KEYWORDS flag... > I think the real answer should be > > dict(k, v, k, v) > > like Jython. given that Jython already gives a meaning to dict with more than one argument, I suggest: dict(d) # consistency dict(k, v, k, v, ...) # jython compatibility dict(*[k, v, k, v, ...]) # convenience dict(k=v, k=v, ...) # common pydiom and maybe: dict(d.items()) # symmetry > If both can be supported, that would be swell. how about: if (PyTuple_GET_SIZE(args)) { assert PyDict_GET_SIZE(kw) == 0 if (PyTuple_GET_SIZE(args) == 1) { args = PyTuple_GET_ITEM(args, 0); if (PyDict_Check(args)) dict = args.copy() else if (PySequence_Check(args)) dict = {} for k, v in args: dict[k] = v } else { assert (PySequence_Size(args) & 0) == 0 # maybe dict = {} for i in range(len(args)): dict[args[i]] = args[i+1] } } else { assert PyDict_GET_SIZE(kw) > 0 # probably dict = kw } From MarkH@ActiveState.com Wed Jun 6 00:13:27 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 6 Jun 2001 09:13:27 +1000 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: [Paul] > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. As a father of a 14 year old girl, I can relate to that!! [Aahz] > Are you trying to imply that there's a difference between girls and > boys? It would seem a safe assumption that you are not a parent of a teenager. :) Mark. From gward@python.net Wed Jun 6 02:03:33 2001 From: gward@python.net (Greg Ward) Date: Tue, 5 Jun 2001 21:03:33 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <011f01c0ee12$eeda9ba0$0900a8c0@spiff>; from fredrik@pythonware.com on Wed, Jun 06, 2001 at 12:57:43AM +0200 References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> <011f01c0ee12$eeda9ba0$0900a8c0@spiff> Message-ID: <20010605210333.B7687@gerg.ca> On 06 June 2001, Fredrik Lundh said: > given that Jython already gives a meaning to dict with more > than one argument, I suggest: > > dict(d) # consistency > dict(k, v, k, v, ...) # jython compatibility > dict(*[k, v, k, v, ...]) # convenience > dict(k=v, k=v, ...) # common pydiom Yikes. I still think that #2 is the "essential" spelling. I think Tim was speaking of #1 when he said we don't need another way to spell copy() -- I'm inclined to agree. I think the fact that you can say int(3) or str("foo") are not strong arguments in favour of dict({...}), because of mutability, because of the overhead of dicts, because we already have the copy module, maybe other factors as well. > and maybe: > > dict(d.items()) # symmetry I think this is massive overloading. Two interfaces to a single function ought to be enough. I for one have long wished for syntactic sugar like Perl's => operator, which lets you do this: %band = { geddy => "bass", alex => "guitar", neil => "drums" } ...and keyword arg syntax is really the natural thing here. Being able to say band = dict(geddy="bass", alex="guitar", neil="drums") would be good enough for me. And it's less mysterious than Perl's =>, which is just a magic comma that forces its LHS to be interpreted as a string. Weird. Greg -- Greg Ward - Linux geek gward@python.net http://starship.python.net/~gward/ If you and a friend are being chased by a lion, it is not necessary to outrun the lion. It is only necessary to outrun your friend. From mal@lemburg.com Wed Jun 6 09:03:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 10:03:13 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1DE3C1.90BA3DD6@lemburg.com> Tim Peters wrote: > > [MAL, to Skip] > > Huh ? That should not be possible ! Python literals are still > > ASCII. > > > > >>> ümlaut = 'ümlaut' > > File "", line 1 > > ümlaut = 'ümlaut' > > ^ > > SyntaxError: invalid syntax > > That was Guido's intent, and what the Ref Man says, but the tokenizer uses > C's isalpha() so in reality it's locale-dependent. I think at least one > German on Python-Dev has already threatened to kill him if he ever fixes > this bug . Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack@oratrix.nl Wed Jun 6 12:24:32 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:24:32 +0200 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: Message by "Eric S. Raymond" , Mon, 4 Jun 2001 17:19:08 -0400 , <20010604171908.A21831@thyrsus.com> Message-ID: <20010606112432.C4A43303181@snelboot.oratrix.nl> The early microcomputers (8008, 6800, 6502) are actually a lot more like the PDP-8 than the PDP-11: a single (or possibly double) accumulator register and a few special purpose registers hardwired to various instructions. The 68000, Z8000 and NS16032 were the first true successors of the PDP-11, sharing (to an extent) the unique characteristics of it's design with general purpose registers (with even SP and PC being general purpose registers with only very little magic attached to them) and an orthogonal design. The 68000 still had lots of little quirks in the instruction set, the latter two actually improved on the PDP-11 set (where a couple of instructions like XOR would only work with register-destination because it was added to the design in a stage where there weren't enough bits left in the instruction space, I guess). And the 8086 was just a souped-up 8080/8008: each register had a different function, no orthogonality, etc. Intel didn't get it "right" until the 386 32-bit instruction set (and even there some of the old baggage can still be seen). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Wed Jun 6 12:39:56 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:39:56 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Message by "Fredrik Lundh" , Tue, 5 Jun 2001 19:34:35 +0200 , <001301c0ede5$cb804a10$e46940d5@hagrid> Message-ID: <20010606113957.4A395303181@snelboot.oratrix.nl> For the dictionary initializer I would definitely want to be able to give an object that adheres to the dictionary protocol, so that I can to things like import anydbm f = anydbm.open("foo", "r") incore = dict(f) Hmm, I guess this goes for most types: list() and tuple() should take any iterable object, etc. The one question is what "dictionary protocol" mean. Should it support items()? Is only x.keys()/x[] good enough? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal@lemburg.com Wed Jun 6 19:36:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 20:36:48 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> <200106052022.f55KMhq30227@odiug.digicool.com> Message-ID: <3B1E7840.C93EA788@lemburg.com> Guido van Rossum wrote: > > > > -1 > > > > > > While this looks cute, I think it would break a lot of introspection > > > code or other code which special cases Python functions for > > > some reason since type(int) would no longer return > > > types.BuiltinFunctionType. > > > > Looks like I'm alone with my uncertain feeling about this move... > > oh well. > > Well, I don't see how someone could be doing introspection on int and > be confused when it's not a function -- either you (think you) know > it's a function, so you use it as a function without introspecting it, > and that continues to work; or you're open to all possibilities, and > then you'll introspect it, and then you'll discover what it is. Ok, let's put it in another way: The point is that your are changing the type of very basic building parts in Python and that is likely to cause failure in places which will most likely be hard to find to fix. Becides we don't really gain anything from replacing builtin functions with classes (to the contrary: we lose some, since we can no longer use the function call optimizations for builtins and have to go through all the generic call mechanism code instead). Also, have you considered the effects this has on restricted execution mode ? What will happen if someone replaces the builtins with special versions which hide some security relevant objects, e.g. open() is a prominent candidate for this. Why not put the type objects into a separate module instead of reusing the builtins ? > > BTW, we should consider having more than one contructor for an > > object rather than trying to stuff all possible options and parameters > > into one overloaded super-constructor. I've done this in many of > > my mx extensions and have so far had great success with it (better > > programming error detection, better docs, more intuitive interfaces, > > etc.). In that sense, more than one way to do something will > > actually help clarify what the programmer really wanted. Just > > a thought... > > Yes, but the other ways are spelled as factory functions. Maybe, > *maybe* the other factory functions could be class-methods, but don't > hold your hopes high. No... why make things complicated when simple functions work just fine as factories. Multilpe constructors on a class would make subclassing a pain... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Wed Jun 6 20:00:07 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 12:00:07 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1E7DB7.408BC089@ActiveState.com> Skip Montanaro wrote: > >... > > Okay, I found the encoding section. I changed the encoding variable > > assignment to be > > encoding = "latin1" Danger, Will Robinson! You can now write software that will work great on your version of Python and will crash on everyone else's. You haven't just changed the behavior of "print" but of EVERY attempted automatic coercion from Unicode to an 8-bit string. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim.one@home.com Wed Jun 6 20:27:59 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 6 Jun 2001 15:27:59 -0400 Subject: [Python-Dev] -U option? Message-ID: http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 python -U breaks import with 2.1 Anyone understand -U? Like, should it work, why is it there if it doesn't and isn't expected to, and are there docs for it beyond the "python -h" blurb? Last mention of it I found in c.l.py was """ Date: Tue, 06 Feb 2001 16:09:46 +0100 From: "M.-A. Lemburg" Subject: Re: [Python-Dev] Pre-PEP: Python Character Model ... Well, with -U on, Python will compile "" into u"", ... last I tried, Python didn't even start up :-( ... """ An earlier msg (08 Sep 2000) said: """ Note that many thing fail when Python is started with -U... that switch was introduced to be able to get an idea of which parts of the standard fail to work in a mixed string/Unicode environment. """ If this is just an internal development switch, python -h probably shouldn't advertise it. From barry@digicool.com Wed Jun 6 20:37:26 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 6 Jun 2001 15:37:26 -0400 Subject: [Python-Dev] -U option? References: Message-ID: <15134.34422.62060.936788@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Anyone understand -U? Like, should it work, why is it there TP> if it doesn't and isn't expected to, and are there docs for it TP> beyond the "python -h" blurb? Nope, except that /for me/ an installed Python 2.1 seems to start up just fine with -U. My uninstalled (i.e. run from the source tree) 2.2a0 fails when given -U: @anthem[[~/projects/python:1068]]% ./python Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1069]]% ./python -U 'import site' failed; use -v for traceback Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1070]]% ./python -U -v # ./Lib/site.pyc matches ./Lib/site.py import site # precompiled from ./Lib/site.pyc # ./Lib/os.pyc matches ./Lib/os.py import os # precompiled from ./Lib/os.pyc import posix # builtin # ./Lib/posixpath.pyc matches ./Lib/posixpath.py import posixpath # precompiled from ./Lib/posixpath.pyc # ./Lib/stat.pyc matches ./Lib/stat.py import stat # precompiled from ./Lib/stat.pyc # ./Lib/UserDict.pyc matches ./Lib/UserDict.py import UserDict # precompiled from ./Lib/UserDict.pyc 'import site' failed; traceback: Traceback (most recent call last): File "./Lib/site.py", line 91, in ? from distutils.util import get_platform ImportError: No module named distutils.util Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> # clear __builtin__._ # clear sys.path # clear sys.argv # clear sys.ps1 # clear sys.ps2 # clear sys.exitfunc # clear sys.exc_type # clear sys.exc_value # clear sys.exc_traceback # clear sys.last_type # clear sys.last_value # clear sys.last_traceback # restore sys.stdin # restore sys.stdout # restore sys.stderr # cleanup __main__ # cleanup[1] signal # cleanup[1] site # cleanup[1] posix # cleanup[1] exceptions # cleanup[2] stat # cleanup[2] posixpath # cleanup[2] UserDict # cleanup[2] os # cleanup sys # cleanup __builtin__ # cleanup ints: 1 unfreed int in 1 out of 3 blocks # cleanup floats -Barry From mal@lemburg.com Wed Jun 6 21:27:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 22:27:19 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1E9227.7F67971E@lemburg.com> Tim Peters wrote: > > http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 > python -U breaks import with 2.1 > > Anyone understand -U? Like, should it work, why is it there if it doesn't > and isn't expected to, and are there docs for it beyond the "python -h" > blurb? The -U option is there to be able to test drive Python into the Unicode age. As you and many others have noted, there's still a long way to go... > Last mention of it I found in c.l.py was > > """ > Date: Tue, 06 Feb 2001 16:09:46 +0100 > From: "M.-A. Lemburg" > Subject: Re: [Python-Dev] Pre-PEP: Python Character Model > > ... > Well, with -U on, Python will compile "" into u"", > ... > last I tried, Python didn't even start up :-( > ... > """ > > An earlier msg (08 Sep 2000) said: > > """ > Note that many thing fail when Python is started with -U... that > switch was introduced to be able to get an idea of which parts of > the standard fail to work in a mixed string/Unicode environment. > """ > > If this is just an internal development switch, python -h probably shouldn't > advertise it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Wed Jun 6 21:34:30 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 6 Jun 2001 22:34:30 +0200 Subject: [Python-Dev] -U option? Message-ID: <200106062034.f56KYUI02246@mira.informatik.hu-berlin.de> [Tim] > Anyone understand -U? Like, shoulQd it work, why is it there if it > doesn't and isn't expected to, and are there docs for it beyond the > "python -h" blurb? I'm not surprised it doesn't work, but I think it could be made working in many cases. I also think it would be worthwhile making that work; in the process, many places will be taught to accept Unicode strings which currently don't. [Barry] > Nope, except that /for me/ an installed Python 2.1 seems to start up > just fine with -U. [...] Sure, but it won't work martin@mira:~ > python -U [22:29] Python 2.2a0 (#336, May 29 2001, 09:28:57) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import string Traceback (most recent call last): File "", line 1, in ? ImportError: No module named string >>> import sys >>> sys.path ['', u'/usr/src/omni/lib/python', u'/usr/src/omni/lib/i586_linux_2.0_glibc2.1', u'/usr/ilu-2.0b1/lib', u'/home/martin', u'/usr/local/lib/python2.2', u'/usr/local/lib/python2.2/plat-linux2', u'/usr/local/lib/python2.2/lib-tk', u'/usr/local/lib/python2.2/lib-dynload', u'/usr/local/lib/python2.2/site-packages', u'/usr/local/lib/site-python'] The main problem (also with the SF bug report) seems to be that Unicode objects in sys.path are not accepted, but I think they should. Regards, Martin From tim.one@home.com Wed Jun 6 21:52:02 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 6 Jun 2001 16:52:02 -0400 Subject: [Python-Dev] -U option? In-Reply-To: <3B1E9227.7F67971E@lemburg.com> Message-ID: [MAL] > The -U option is there to be able to test drive Python into > the Unicode age. As you and many others have noted, there's > still a long way to go... That's cool. My question is why we're advertising (via -h) an option that end users have no chance of using successfully. From mal@lemburg.com Wed Jun 6 22:47:25 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 23:47:25 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1EA4ED.38BEB1AA@lemburg.com> Tim Peters wrote: > > [MAL] > > The -U option is there to be able to test drive Python into > > the Unicode age. As you and many others have noted, there's > > still a long way to go... > > That's cool. My question is why we're advertising (via -h) an option that > end users have no chance of using successfully. I guess I just added the flag to the -h message without thinking much about it... it was added in some alpha release. Anyway, these bug reports will keep hitting us which is good in the sense that it'll eventually push Python into the Unicode arena. We could need some funding for this, though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Thu Jun 7 00:00:52 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 16:00:52 -0700 Subject: [Python-Dev] urllib2 Message-ID: <3B1EB624.563DABE0@ActiveState.com> Tim asked me to look into test_urllib2 failure. I notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>: http://www.faqs.org/rfcs/rfc1738.html Anyhow, there are a variety of problems. :( First, test_urllib2 says: file_url = "file://%s" % urllib2.__file__ This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it. So that's problem #1, we aren't compatible with mildly broken Windows file URLs. Problem #2 is that the test program generates mildly broken URLs on Windows. That begs the question of what IS the right way to construct file urls in a cross-platform manner. I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this: """Convert a DOS path name to a file url. C:\foo\bar\spam.foo becomes ///C|/foo/bar/spam.foo """ is not really constructing a URL! And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry@scottb.demon.co.uk Thu Jun 7 00:31:51 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:31:51 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> Message-ID: <000a01c0eee0$dcfe9250$060210ac@private> Eric, As others have pointed out your time line is wrong... BArry p.s. I'm ex-DEC and old enough to have seen the introduction of the 6502 (got mine at university for $25 inc postage to the U.K.), Z80 and VAX (worked on product for V1.0 of VMS). Also for my sins argued with Gordon Bell and Dave Cutler about CPU architecture. > -----Original Message----- > From: Eric S. Raymond [mailto:esr@thyrsus.com] > Sent: 04 June 2001 21:11 > To: Barry Scott > Cc: python-dev (E-mail) > Subject: Re: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... > > > Barry Scott : > > Eric wrote: > > > While I'm at it, I should note that the design of the 11 was ancestral > > > to both the 8088 and 68000 microprocessors, and thus to essentially > > > every new general-purpose computer designed in the last fifteen years. > > > > The key to PDP-11 and VAX was lots of registers all a like and rich > > addressing modes for the instructions. > > > > The 8088 is very far from this design, its owes its design more to > > 4004 then the PDP-11. > > Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, > which was descended from the 11. Admiitedly, in the chain of transmission here > were two stages of redesign so bad that the connection got really tenuous. > -- > Eric S. Raymond > > ...Virtually never are murderers the ordinary, law-abiding people > against whom gun bans are aimed. Almost without exception, murderers > are extreme aberrants with lifelong histories of crime, substance > abuse, psychopathology, mental retardation and/or irrational violence > against those around them, as well as other hazardous behavior, e.g., > automobile and gun accidents." > -- Don B. Kates, writing on statistical patterns in gun crime > > From barry@scottb.demon.co.uk Thu Jun 7 00:57:11 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:57:11 +0100 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3B1E7840.C93EA788@lemburg.com> Message-ID: <000b01c0eee4$66f8a7e0$060210ac@private> Adding the atomic types of python as classes I'm +1 on. Perfomance is a problem for the parser to handle. If you have not already done so I suggest that you look at what MicroSoft .NET is doing in this area. In .NET, for example, int is a class and they have the technology to define the interface to an int and optimize the performace of the none derived cases. Barry From barry@scottb.demon.co.uk Thu Jun 7 01:03:54 2001 From: barry@scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 01:03:54 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: <001001c0eee5$571a8090$060210ac@private> > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! If you embrace the world then NO. If America is you world then maybe. Barry From paulp@ActiveState.com Thu Jun 7 01:42:03 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 17:42:03 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> Message-ID: <3B1ECDDB.F1E8B19D@ActiveState.com> Barry Scott wrote: > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > If you embrace the world then NO. If America is you world then maybe. Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From MarkH@ActiveState.com Thu Jun 7 02:09:51 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 7 Jun 2001 11:09:51 +1000 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <000b01c0eee4$66f8a7e0$060210ac@private> Message-ID: > If you have not already done so I suggest that you look at > what MicroSoft .NET is doing in this area. In .NET, for example, > int is a class and they have the technology to define the > interface to an int and optimize the performace of the none > derived cases. Actually, that is not completely true. There is a "value type" and a class version. The value type is just the bits. The VM has instructions that work in the value type. As far as I am aware, you can not use a derived class with these instructions. They also have the concept of "sealed" meaning they can not be subclassed. Last time I looked, strings were an example of sealed classes. Mark. From greg@cosc.canterbury.ac.nz Thu Jun 7 03:16:00 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:16:00 +1200 (NZST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <20010606113957.4A395303181@snelboot.oratrix.nl> Message-ID: <200106070216.OAA02594@s454.cosc.canterbury.ac.nz> Jack Jansen : > Should it support > items()? Is only x.keys()/x[] good enough? Check for items(), and fall back on x.keys()/x[] if necessary. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Jun 7 03:19:03 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:19:03 +1200 (NZST) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <200106070219.OAA02597@s454.cosc.canterbury.ac.nz> > if we were really going to embrace the world we'd need to > handle more than a few European languages! -1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Jun 7 03:22:33 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:22:33 +1200 (NZST) Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... Message-ID: <200106070222.OAA02600@s454.cosc.canterbury.ac.nz> Jack Jansen : > with even SP and PC being general purpose registers The PC is not a general purpose register in the 68000. I've heard that this was because DEC had a patent on the idea. > the latter two actually improved on the PDP-11 The 16032 was certainly extremely orthogonal. I wrote an assembler and a compiler for it once, and it was a joy after coming from the Z80! It wasn't quite perfect, though - its lack of a "top-of-stack-indirect" addressing mode was responsible for the one wart in my otherwise-beautiful code generation strategy. Also, it must have been the most CISCy instruction set the world has ever seen, with the possible exception of the VAX... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Thu Jun 7 05:54:42 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 00:54:42 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: <3B1EB624.563DABE0@ActiveState.com> Message-ID: [Paul Prescod] > Tim asked me to look into test_urllib2 failure. Wow! I'm going to remember that. Have to ask people to do things more often . > notice that Guido's name is in the relevant RFC so I guess he's the > real expert <0.5 wink>: > > http://www.faqs.org/rfcs/rfc1738.html > > Anyhow, there are a variety of problems. :( I'm going to add one more. The spec says this is a file URL: fileurl = "file://" [ host | "localhost" ] "/" fpath But on Windows, urllib2.urlopen() throws up even on URLs like: file:///c:/bootlog.txt and file://localhost/c:/bootlog.txt AFAICT, those conform to the spec (the first with an empty host, the second with the special reserved hostname), Windows has no problem with either of them (heck, in Outlook I can click on them while I'm typing this email -- works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which Windows has no idea what to do with. Hard to see why it should, either. > First, test_urllib2 says: > > file_url = "file://%s" % urllib2.__file__ > > This is not going to construct a strictly standards conforming URL on > Windows but that form is still common enough and obvious enough that > maybe we should support it. Common among what? > So that's problem #1, we aren't compatible with mildly broken Windows > file URLs. I haven't found a sense in which Windows file URLs are broken. test_urllib2 creates bad URLs on Windows, and urllib2 itself transforms legit file URLs into broken ones on Windows, but both of those appear to be our (Python's) fault. Until std stuff works, worrying about extensions to the std seems premature. > Problem #2 is that the test program generates mildly broken URLs > on Windows. Yup. > That begs the question of what IS the right way to construct file urls > in a cross-platform manner. The spec seems vaguely clear to me on this point (it's vaguely unclear to me whether a colon is allowed in an fpath -- the text seems to say one thing but the BNF another). > I would have thought that urllib.pathname2url was the way but I note > that it isn't documented. Plus it is poorly named. A function that > does this: > > """Convert a DOS path name to a file url. > > C:\foo\bar\spam.foo > > becomes > > ///C|/foo/bar/spam.foo > """ > > is not really constructing a URL! Or anything else recognizable . > And the semantics of the function on multiple platforms do not seem > to me to be identical. On Windows it adds a bunch of leading slashes > and mac and Unix seem not to. So you can't safely paste a "file:" or > "file://" on the front. I don't know how widely pathname2url has been > used even though it is undocumented....should we fix it and document > it or write a new function? Maybe it's just time to write urllib3.py <0.8 wink>. no-conclusions-from-me-ly y'rs - tim From tim@digicool.com Thu Jun 7 06:16:37 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 7 Jun 2001 01:16:37 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: [M.-A. Lemburg] > Wasn't me for sure... even in the Unicode age, I believe that > Python source code should maintain readability by not allowing > all alpha(numeric) characters for use in identifiers (there are > lots of them in Unicode). > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week ). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class ). From fredrik@pythonware.com Thu Jun 7 06:50:35 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 7 Jun 2001 07:50:35 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-) From tim.one@home.com Thu Jun 7 07:15:35 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 02:15:35 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Message-ID: [/F] > I don't get it. If people use non-ascii characters, they're clearly not > using Python. from the language reference: My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>. > ... > either change the specification, and break every single tool written by > anyone who actually bothered to read the specification [1], or add a > warning to 2.2. This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here). > ... > 1) I assume the specification didn't exist when GvR wrote the first > CPython implementation ;-) Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms. From mal@lemburg.com Thu Jun 7 09:29:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:29:52 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <3B1F3B80.DB8F4117@lemburg.com> Paul Prescod wrote: > > Barry Scott wrote: > > > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > > and 'A'...'Z' ?! (same for digits) ?! > > > > If you embrace the world then NO. If America is you world then maybe. > > Actually, if we were really going to embrace the world we'd need to > handle more than a few European languages! I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Thu Jun 7 09:42:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:42:40 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1F3E80.F8CC16D7@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Wasn't me for sure... even in the Unicode age, I believe that > > Python source code should maintain readability by not allowing > > all alpha(numeric) characters for use in identifiers (there are > > lots of them in Unicode). > > > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. OTOH, nobody would come to > its defense with a hearty "whew! I'm so glad *that* hole finally got > plugged!". I'm sure it would cause less trouble to take away <> as an > alternative spelling of != (except that Barry is actually close enough to > strangle Guido a few days each week ). Is it worth the hassle? I > don't know, but I'd *guess* Guido would rather endure the complaints for > something more substantial (like, say, breaking 10 lines of an expert's > obscure code that relies on int() being a builtin instead of a class > ). Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas@xs4all.net Thu Jun 7 13:03:20 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 7 Jun 2001 14:03:20 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1F3E80.F8CC16D7@lemburg.com>; from mal@lemburg.com on Thu, Jun 07, 2001 at 10:42:40AM +0200 References: <3B1F3E80.F8CC16D7@lemburg.com> Message-ID: <20010607140320.Z690@xs4all.nl> On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote: > still, it's funny sometimes how pydevs are willing to break perfectly > valid code in some areas while not considering pointing users to clean up > invalid code in other areas. Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh@python.net Thu Jun 7 13:54:55 2001 From: mwh@python.net (Michael Hudson) Date: Thu, 7 Jun 2001 13:54:55 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-24 - 2001-06-07 Message-ID: This is a summary of traffic on the python-dev mailing list between May 24 and Jun 7 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the ninth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 305 50 | [|] | [|] | [|] | [|] 40 | [|] | [|] | [|] | [|] [|] [|] 30 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-018-014-011-014-020-019-034-035-032-014-008-020-051-015 Thu 24| Sat 26| Mon 28| Wed 30| Fri 01| Sun 03| Tue 05| Fri 25 Sun 27 Tue 29 Thu 31 Sat 02 Mon 04 Wed 06 Another busy-ish fortnight. I've been in Exam Hell(tm) and am writing this when hungover, this so summary might be a bit sketchier than normal. Apologies in advance. * strop vs. string * Greg Stein leapt up to defend the slated-to-be-deprecated strop module by pointing out that it's functions work on any object that supports the buffer API, whereas the 1.6-era string.py only works with objects that sprout the right methods: The discussion quickly degenerated into the usual griping about the fact that the buffer API is flawed and undocumented and not really well understood by many people. * Special-casing "O" * As a followup to the discussion mentioned in the last summary, Martin von Loewis posted a patch to sf enabling functions written in C that expect zero or one object arguments to dispense with the time wasting call to PyArg_ParseTuple: The first version of the patch was criticized for being overly general, and for not being general enough . It seems the forces of simplicity have won, but I don't think the patch has been checked in yet. * the late, unlamented, yearly list.append panic * Tim Peters posted c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). And then ameliorated the worst-case behaviour. So that one was easy. * making dicts ... * You might think that as dictionaries are so central to Python that their implementation would be bulletproof and one the areas of the source that would be least likely to change. This might be true *now*; Tim Peters seems to have spent most of the last fortnight implementing performance improvements one after the other and fixing core-dumping holes in the implementation pointed out by Michael Hudson. The first improvement was to "using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play." If you don't understand what that means, ignore it because Tim came up with a more radical rewrite: which seems to be a win, but sadly removes the shock of finding comments about Galois theory in dictobject.c... Most of the discussion in the thread following Tim's patch was about whether we need 128-bit floats or ints, which is another way of saying everyone liked it :-) This one hasn't been checked in either. * ... and breaking dicts * Inspired by a post to comp.lang.python by Wolfgang Lipp and driven slightly insane by revision, Michael Hudson posted a short program that used a hole in the dict implementation to trigger a core dump: This got fixed, so he did it again: The cause of both problems was C code assuming things about dictionaries remained the same across calls to code that ended up executing arbitrary Python code, which could mutate the dict exactly as much as it pleased, which in turn caused pointers to dangle. This problem has a history in Python; the .sort() method on lists has to fight the same issues. These holes have been plugged, although it is still possible to crash Python with exceptionally contrived code: There's another approach, which is was the .sort() method uses: >>> list = range(10) >>> def c(x,y): ... del list[:] ... return cmp(x, y) ... >>> list.sort(c) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in c TypeError: a list cannot be modified while it is being sorted The .sort() method magically changes the type of the list being sorted to one that doesn't support mutation while it's sorting the list. This approach would have some merit to use with dictionaries too; for one thing we could lose all the contrived code in dictobject.c protecting against this sort of silliness... * arbitrary radix formatting * Greg Wilson made a plea for the addition of a "%b" formatting operator to display integers in binary, e.g: >>> print "%d %x %o %b"%(10,10,10,10) 10 a 12 1010 There was general support for the idea, but Tim Peters and Greg Ewing pointed out that it would be neater to invent a general format code that would enable one to format an integer into an arbitrary base, so that >>>> int("1111", 7) 400 has an inverse at long last. But no-one could think of a spelling that wasn't in general use, and the discussion died :-(. * quick poll * Guido asked if anyone would object violently to the builtin conversion functions becoming type objects on the descr-branch: in analogy to class objects. There was general support and only a few concerns, and the changes have begun to hit descr-branch. I'm sure I'm not the only one who wishes they had the time to understand what is going on in there... Cheers, M. From gmcm@hypernet.com Thu Jun 7 14:06:55 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 7 Jun 2001 09:06:55 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: References: <3B1EB624.563DABE0@ActiveState.com> Message-ID: <3B1F442F.26920.1ECC32A9@localhost> [Tim & Paul on file URLs] [Tim] > But on Windows, urllib2.urlopen() throws up even on URLs like: > > file:///c:/bootlog.txt Curiously enough, url = "file:///" + urllib.quote_plus(fnm) seems to work on Windows. It even seems to work on mac, if you first turn '/' into '%2f', then undo the double quoting (turn '%252f' back into '%2f' in the ensuing url). It even seems to work on mac directory names with Unicode characters in them (though I haven't looked too closely, in fear of jinxing it). eye-of-newt-considered-helpful-ly y'rs - Gordon From Samuele Pedroni Thu Jun 7 14:56:30 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Thu, 7 Jun 2001 15:56:30 +0200 (MET DST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106071356.PAA04511@core.inf.ethz.ch> Hi. [GvR] > > Is the intent of using int and friends as constructors instead of just > > coercion functions that I should (eventually) be able to do this: > > > > class NonNegativeInt(int): > > def __init__(self, val): > > if int(val) < 0: > > raise ValueError, "Value must be >= 0" > > int.__init__(self, val) > > self.a = 47 > > ... > > > > ? > > Yes, sort-of. The details will be slightly different. I'm not > comfortable with letting a user-provided __init__() method change the > value of self, so I am brooding on a work-around that separates > allocation and one-time initialization from __init__(). Watch PEP > 253. jython already supports vaguely this: from types import IntType as Int class NonNegInt(Int): def __init__(self,val,annot=None): if int(val)<0: raise ValueError,"val<0" Int.__init__(self,val) self._annot = annot def neg(self): return -self def __add__(self,b): if type(b) is NonNegInt: return NonNegInt(Int.__add__(self,b)) return Int.__add__(self,b) def annot(self): return self._annot Jython 2.0 on java1.3.0 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> from NonNegInt import NonNegInt >>> x=NonNegInt(-2) Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 5, in __init__ ValueError: val<0 >>> x=NonNegInt(2) >>> y=NonNegInt(3,"foo") >>> y._annot Traceback (innermost last): File "", line 1, in ? AttributeError: 'int' object has no attribute '_annot' >>> y.annot() Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 15, in annot AttributeError: 'int' object has no attribute '_annot' >>> x+y, type(x+y) (5, ) >>> x.neg() -2 >>> x+(-2),type(x+(-2)) (0, ) >>> As one can see, the semantic is not without holes. The support for this is mainly a side-effect of the fact that internally jython objects are instances of java classes and jython allows to subclass java classes. I have no idea whether someone is already using this kind of stuff, I just remember that someone reported a bug concerning subclassing ListType so ... By the way int, long being types seems nice and elegant to me. A more general note FYI: I have read the PEP drafts about descrs and type as classes, I have not played with the descr-branch yet. I think that the descr and metaclasses stuff can help on jython side to put a lot of things (dealing with java classes, subclassing from them, etc) in a more precise framework polishing up many design aspects and the code. First I suppose that backward compatibility on the jython side is not a real problem, this aspects are so much under-documented that there are no promises about them. On the other hand until we start coding things on jython side (it's complex stuff and jython internals are already complex) it will be really difficult to make constructive comments on possible problems for jython, or toward a design that better fits both jython and CPython needs. Given that we are still working on jython 2.1, maybe we will be able to start working on jython 2.2 only late in 2.2 release cycle when things are somehow fixed and we can only do our best to re-implemnt them. regards Samuele Pedroni. From Greg.Wilson@baltimore.com Thu Jun 7 17:03:44 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Thu, 7 Jun 2001 12:03:44 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Prompted in part by the comment in Michael Hudson's python-dev summary about this discussion having died, I'd like to summarize: 1. Most people who commented felt that a base-2 format would be useful, if only for teaching and debugging. With regard to questions about byte order: A. Integer values are printed as base-2 numbers, so byte order is irrelevant. B. Floating-point numbers are printed as: [sign] [mantissa] [exponent] The mantissa and exponent are shown according to rule A. 2. Inventing a format for converting to arbitrary bases is dubious hypergeneralization (to borrow a phrase). 3. Implementation should mirror octal and hexadecimal support, e.g. a 'bin()' function to go with 'oct()' and 'hex()'. 4. The desirability or otherwise of a "%b" format specifier has nothing to do with the relative merits of any early microprocessor :-). If no-one has strong objections, I'll put together a PEP on this basis. Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From greg@cosc.canterbury.ac.nz Fri Jun 8 01:55:05 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Jun 2001 12:55:05 +1200 (NZST) Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Message-ID: <200106080055.MAA02711@s454.cosc.canterbury.ac.nz> Greg Wilson : [good stuff about binary format support] > If no-one has strong objections, I'll put together a > PEP on this basis. Sounds okay to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Fri Jun 8 02:39:53 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 7 Jun 2001 21:39:53 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <20010607140320.Z690@xs4all.nl> Message-ID: [Thomas Wouters] > ... > I'm also not terribly worried about the use of non-ASCII characters in > identifiers in Python, though a warning for the next one or two releases > would be a good thing -- if anything, it should warn that that trick > won't work for people with different locale settings! Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first. From skip@pobox.com (Skip Montanaro) Fri Jun 8 15:51:27 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 8 Jun 2001 09:51:27 -0500 Subject: [Python-Dev] sys.modules["__main__"] in Jython Message-ID: <15136.58991.72069.433197@beluga.mojam.com> Would someone with Jython experience check to see if it interprets sys.modules["__main__"] in the same manner as Python? I'm interested to see if doctest's normal usage can be simplified slightly. The doctest documentation states: In normal use, end each module M with: def _test(): import doctest, M # replace M with your module's name return doctest.testmod(M) # ditto if __name__ == "__main__": _test() I'm wondering if this works for Jython as well as Python: def _test(): import doctest, sys return doctest.testmod(sys.modules["__main__"]) if __name__ == "__main__": _test() If so, then I think doctest.testmod's signature can be changed to def testmod(m=None, name=None, globs=None, verbose=None, isprivate=None, report=1): with the following extra code added to the start of the function: if m is None: import sys m = sys.modules["__main__"] That way the most common doctest usage can be changed to def _test(): import doctest return doctest.testmod() if __name__ == "__main__": _test() (I ran into a problem with a module that had initialization code that barfed if executed more than once.) Of course, these changes are ultimately Tim's decision. I'm just trying to knock down various potential hurdles. Thx, Skip From guido@digicool.com Fri Jun 8 17:06:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 08 Jun 2001 12:06:19 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: Your message of "Fri, 08 Jun 2001 12:01:37 EDT." References: Message-ID: <200106081606.f58G6Jj11829@odiug.digicool.com> > Prompted in part by the comment in Michael Hudson's > python-dev summary about this discussion having died, > I'd like to summarize: > > 1. Most people who commented felt that a base-2 format > would be useful, if only for teaching and debugging. > With regard to questions about byte order: > > A. Integer values are printed as base-2 numbers, so > byte order is irrelevant. > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > > The mantissa and exponent are shown according > to rule A. Why bother with floats at all? We can't print floats as hex either. If I were doing any kind of float-representation fiddling, I'd probably want to print it in hex anyway (I can read hex). But as I say, that's not for the general public. > 2. Inventing a format for converting to arbitrary > bases is dubious hypergeneralization (to borrow a > phrase). Agreed. > 3. Implementation should mirror octal and hexadecimal > support, e.g. a 'bin()' function to go with 'oct()' > and 'hex()'. > > 4. The desirability or otherwise of a "%b" format > specifier has nothing to do with the relative > merits of any early microprocessor :-). > > If no-one has strong objections, I'll put together a > PEP on this basis. Go for it. Or just submit a patch to SF -- this seems almost too small for a PEP to me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Fri Jun 8 17:10:50 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 8 Jun 2001 12:10:50 -0400 Subject: [Python-Dev] re: %b format (no, really) References: <200106081606.f58G6Jj11829@odiug.digicool.com> Message-ID: <15136.63754.927103.77358@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Go for it. Or just submit a patch to SF -- this seems almost GvR> too small for a PEP to me. :-) Since we all seem to agree, I'd agree. :) From Greg.Wilson@baltimore.com Fri Jun 8 17:14:14 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 12:14:14 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> > > Greg: > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > Guido: > Why bother with floats at all? For teaching purposes, which is what started me on this in the first place --- I would like an easy way to show people the bit patterns corresponding to basic types. > Guido: > Go for it. Or just submit a patch to SF -- this seems almost too > small for a PEP to me. :-) Thanks, Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr@snark.thyrsus.com Fri Jun 8 17:23:34 2001 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 12:23:34 -0400 Subject: [Python-Dev] Glowing endorsement of open source and Python Message-ID: <200106081623.f58GNYf22712@snark.thyrsus.com> It doesn't get much better than this: http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From mal@lemburg.com Fri Jun 8 18:08:53 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 08 Jun 2001 19:08:53 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <3B2106A5.FD16D95C@lemburg.com> "Eric S. Raymond" wrote: > > It doesn't get much better than this: > > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html I wonder what those MS Office XP ads are doing on that page... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 8 18:21:10 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:21:10 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> Message-ID: [Guido] > Why bother with floats at all? [Greg Wilson] > For teaching purposes, which is what started me on this > in the first place --- I would like an easy way to show > people the bit patterns corresponding to basic types. I'm confused by this: while for integers the bits correspond very clearly to what's stored in the machine, if you separate the mantissa and exponent for floats the result won't "look like" the storage at all. Please give an example first, like what do you intend to produce for print "%b" % 0.1 print "%b" % -42e300 ? You have to make decisions about whether or not to unbias the exponent for display (if you don't, it's incomprehensible; if you do, it's not really what's stored); whether or not to materialize the implicit most-significant mantissa bit in 754 normalized values (pretty much ditto); and what to do about Infs, NaNs, signed zeroes and denormal numbers. The kicker is that, to be truly useful for teaching floats, you need a way to select among all combinations of "yes" and "no" for each such decision. A single fixed set of answers will confound more than clarify; e.g., it's important to know what the "true exponent" is, but also to know what biased exponents look like inside the box. This is too much for %b -- write a float-format module instead. From Greg.Wilson@baltimore.com Fri Jun 8 18:34:13 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 13:34:13 -0400 Subject: [Python-Dev] RE: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> > [Guido] > > Why bother with floats at all? > > [Greg Wilson] > > For teaching purposes > [Tim Peters] > if you separate the mantissa and exponent > for floats the result won't "look like" the storage at all. > Please give an example first This is part of what was going to go into the PEP, along with what to do about character data (I've had a couple of emails from people who'd like to be able to look at 8-bit and Unicode characters as bit patterns). > This is too much for %b -- write a float-format module instead. How about a quick patch to do "%b" for int and long-int, and a PEP for a generic "format" module --- arbitrary radix, options for IEEE numbers, etc.? Any objections? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr@thyrsus.com Fri Jun 8 18:44:40 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 13:44:40 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Fri, Jun 08, 2001 at 01:34:13PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: <20010608134440.A23160@thyrsus.com> Greg Wilson : > How about a quick patch to do "%b" for int and long-int, and a > PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? I like it. -- Eric S. Raymond The people cannot delegate to government the power to do anything which would be unlawful for them to do themselves. -- John Locke, "A Treatise Concerning Civil Government" From tim.one@home.com Fri Jun 8 18:51:50 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:51:50 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > How about a quick patch to do "%b" for int and long-int, Don't know how quick it will be (it should cover type slots and bin() and __bin__ and 0b1101 notation too, right?), but +1 from me. That much is routinely requested. > and a PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? None here. From bckfnn@worldonline.dk Fri Jun 8 20:15:14 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Fri, 08 Jun 2001 19:15:14 GMT Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <15136.58991.72069.433197@beluga.mojam.com> References: <15136.58991.72069.433197@beluga.mojam.com> Message-ID: <3b212431.21754982@smtp.worldonline.dk> [Skip] >Would someone with Jython experience check to see if it interprets >sys.modules["__main__"] in the same manner as Python? To me it seems like Jython defines sys.modules["__main__"] in the same way as CPython. >I'm wondering if this works for Jython as well as Python: > > def _test(): > import doctest, sys > return doctest.testmod(sys.modules["__main__"]) > > if __name__ == "__main__": > _test() It works for Jython. regards, finn From thomas@xs4all.net Fri Jun 8 22:41:02 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 8 Jun 2001 23:41:02 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python In-Reply-To: <200106081623.f58GNYf22712@snark.thyrsus.com>; from esr@snark.thyrsus.com on Fri, Jun 08, 2001 at 12:23:34PM -0400 References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <20010608234102.B690@xs4all.nl> On Fri, Jun 08, 2001 at 12:23:34PM -0400, Eric S. Raymond wrote: > It doesn't get much better than this: > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html It's a nice (and very flattering!) piece, but it's a tad buzzword heavy. "[Python] supports XML for e-commerce and mobile applications" ? Well, shit, so *that*'s what XML is for :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Fri Jun 8 23:02:06 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 8 Jun 2001 18:02:06 -0400 Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <3b212431.21754982@smtp.worldonline.dk> Message-ID: [Finn Bock] > To me it seems like Jython defines sys.modules["__main__"] in the same > way as CPython. Thank you, Finn! doctest has always avoided introspection tricks for which Jython doesn't work "exactly the same way" as CPython. However, in the past it achieved this by not paying any attention , then ripping out bad ideas when a Jython user reported failure. But now that it's in the std library, I want to proceed more carefully. Skip's idea is much more attractive now that you've confirmed it will work there too. From tim.one@home.com Sun Jun 10 02:10:53 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 9 Jun 2001 21:10:53 -0400 Subject: [Python-Dev] Struct schizophrenia Message-ID: I'm adding "long long" integral types to struct (in native mode, "long long" or __int64 on platforms that have them; in standard mode, 64 bits). This is proving harder than it should be, because the code that's already there is schizophrenic across boundaries, so is failing as a base to build on (raises more questions than it answers). Like: >>> x = 256 >>> struct.pack("b", x) # complains about magnitude in native mode Traceback (most recent call last): File "", line 1, in ? struct.error: byte format requires -128<=number<=127 >>> struct.pack("=b", x) # but doesn't with native order + std align '\x00' >>> struct.pack(">> struct.pack(">> struct.pack("", line 1, in ? OverflowError: long int too large to convert >>> Much the same is true of other small int sizes: you can't predict what will happen without trying it; and once you get to ints, no range-checking is performed even in native mode. Surely this can't stand, but what do people *want*? My preference is to raise the same "byte format requires -128<=number<=127" exception in all these cases; OTOH, the code structure fights that, working with Python longs is clumsy in C, and there are other "undocumented features" here that may or may not be accidents: >>> struct.pack("B", 234.3) '\xea' >>> That is, did we *intend* to accept floats packed via integer typecodes? Feature or bug? In the other (unpack) direction, the docs say for 'I' (unsigned int): The "I" conversion code will convert to a Python long if the C int is the same size as a C long, which is typical on most modern systems. If a C int is smaller than a C long, an Python integer will be created instead. That's in a footnote. In another part, they say: For the "I" and "L" format characters, the return value is a Python long integer. The footnote is wrong -- but is the footnote what was intended (somebody went to a fair bit of work to write all the stuff )? From tim.one@home.com Sun Jun 10 05:25:51 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 10 Jun 2001 00:25:51 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb Message-ID: Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its extension language. but-then-what-doesn't-ly y'rs - tim -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Skip Montanaro Sent: Saturday, June 09, 2001 12:31 AM To: python-list@python.org Subject: printing Python stack info from gdb >From time to time I've wanted to be able to print the Python stack from gdb. Today I broke down and spent some time actually implementing something. set $__trimpath = 1 define ppystack set $__fr = 0 select-frame $__fr while !($pc > Py_Main && $pc < Py_GetArgcArgv) if $pc > eval_code2 && $pc < set_exc_info set $__fn = PyString_AsString(co->co_filename) set $__n = PyString_AsString(co->co_name) if $__n[0] == '?' set $__n = "" end if $__trimpath set $__f = strrchr($__fn, '/') if $__f set $__fn = $__f + 1 end end printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n end set $__fr = $__fr + 1 select-frame $__fr end select-frame 0 end Output looks like this (and dribbles out *quite slowly*): Text_Editor.py (147): apply_tag Text_Editor.py (152): apply_tag_by_name Script_GUI.py (302): push_help Script_GUI.py (113): put_help Script_GUI.py (119): focus_enter Signal.py (34): handle_signal Script_GUI.py (324): main Script_GUI.py (338): If you don't want to trim the paths from the filenames, set $__trimpath to 0. Warning: I've only tried this with a very recent CVS version of Python on a PIII-based Linux system with an interpreter compiled using gcc. I rely on the ordering of functions within the while loop to detect when to exit the loop and when the frame I'm examining is an eval_code2 frame. I'm sure there are plenty of people out there with more gdb experience than me. I welcome any feedback on ways to improve this little bit of code. -- Skip Montanaro (skip@pobox.com) (847)971-7098 -- http://mail.python.org/mailman/listinfo/python-list From tim.one@home.com Sun Jun 10 20:36:50 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 10 Jun 2001 15:36:50 -0400 Subject: [Python-Dev] FW: list-display semantics? Message-ID: I opened a bug on this: If anyone's keen to play with the grammar, have at it! Everyone at PythonLabs would +1 it. -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of jainweiwu Sent: Sunday, June 10, 2001 2:30 PM To: python-list@python.org Subject: list-display semantics? Hi all: I tried the one-line command in a interaction mode: [x for x in [1, 2, 3], y for y in [4, 5, 6]] and the result surprised me, that is: [[1,2,3],[1,2,3],[1,2,3],9,9,9] Who can explain the behavior? Since I expected the result should be: [[1,4],[1,5],[1,6],[2,4],...] -- Pary All Rough Yet. parywu@seed.net.tw -- http://mail.python.org/mailman/listinfo/python-list From dan@cgsoftware.com Sun Jun 10 21:30:24 2001 From: dan@cgsoftware.com (Daniel Berlin) Date: 10 Jun 2001 16:30:24 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb In-Reply-To: ("Tim Peters"'s message of "Sun, 10 Jun 2001 00:25:51 -0400") References: Message-ID: <87n17grsbj.fsf@cgsoftware.com> "Tim Peters" writes: > Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next > time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its > extension language. HP has patches to do this, actually. Works quite nicely. And trust me, i've tried to get them to do it more than once. As I pointed out to skip, if he can profile gdb and tell me where the slowness is, it's likely I can make it a ton faster. GDB could use major optimizations almost everywhere. And i've done quite a lot of them, they just haven't been reviewed/integrated yet. --Dan C++ support maintainer - GDB DWARF2 reader person - GDB Symbol table patch submitting weirdo - GDB etc > > but-then-what-doesn't-ly y'rs - tim > > -----Original Message----- > From: python-list-admin@python.org > [mailto:python-list-admin@python.org]On Behalf Of Skip Montanaro > Sent: Saturday, June 09, 2001 12:31 AM > To: python-list@python.org > Subject: printing Python stack info from gdb > > >>From time to time I've wanted to be able to print the Python stack from gdb. > Today I broke down and spent some time actually implementing something. > > set $__trimpath = 1 > define ppystack > set $__fr = 0 > select-frame $__fr > while !($pc > Py_Main && $pc < Py_GetArgcArgv) > if $pc > eval_code2 && $pc < set_exc_info > set $__fn = PyString_AsString(co->co_filename) > set $__n = PyString_AsString(co->co_name) > if $__n[0] == '?' > set $__n = "" > end > if $__trimpath > set $__f = strrchr($__fn, '/') > if $__f > set $__fn = $__f + 1 > end > end > printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n > end > set $__fr = $__fr + 1 > select-frame $__fr > end > select-frame 0 > end > > Output looks like this (and dribbles out *quite slowly*): > > Text_Editor.py (147): apply_tag > Text_Editor.py (152): apply_tag_by_name > Script_GUI.py (302): push_help > Script_GUI.py (113): put_help > Script_GUI.py (119): focus_enter > Signal.py (34): handle_signal > Script_GUI.py (324): main > Script_GUI.py (338): > > If you don't want to trim the paths from the filenames, set $__trimpath to > 0. > > Warning: I've only tried this with a very recent CVS version of Python on a > PIII-based Linux system with an interpreter compiled using gcc. I rely on > the ordering of functions within the while loop to detect when to exit the > loop and when the frame I'm examining is an eval_code2 frame. I'm sure > there are plenty of people out there with more gdb experience than me. I > welcome any feedback on ways to improve this little bit of code. > > -- > Skip Montanaro (skip@pobox.com) > (847)971-7098 > > -- > http://mail.python.org/mailman/listinfo/python-list > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- "I saw a man with a wooden leg, and a real foot. "-Steven Wright From greg@cosc.canterbury.ac.nz Mon Jun 11 03:44:54 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 11 Jun 2001 14:44:54 +1200 (NZST) Subject: [Python-Dev] FW: list-display semantics? In-Reply-To: Message-ID: <200106110244.OAA03090@s454.cosc.canterbury.ac.nz> parywu@seed.net.tw: > [x for x in [1, 2, 3], y for y in [4, 5, 6]] > and the result surprised me, that is: > [[1,2,3],[1,2,3],[1,2,3],9,9,9] Did you by any chance execute that in an environment where y was previously bound to 9? It will be parsed as [x for x in ([1, 2, 3], y) for y in [4, 5, 6]] which should give a NameError if y is previously unbound, since it will try to evaluate ([1, 2, 3], y) before y is bound by the inner loop. But executing y = 9 beforehand will give the results you got. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From gstein@lyra.org Mon Jun 11 12:31:59 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 11 Jun 2001 04:31:59 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jun 06, 2001 at 07:34:15AM -0700 References: Message-ID: <20010611043158.E26210@lyra.org> On Wed, Jun 06, 2001 at 07:34:15AM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv17474 > > Modified Files: > Tag: descr-branch > object.c > Log Message: > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > where __dict__ is stored in an object. The simplest case is to add > tp_dictoffset to the start of the object, but there are comlications: > tp_flags may tell us that tp_dictoffset is not defined, or the offset > may be negative: indexing from the end of the object, where > tp_itemsize may have to be taken into account. Why would you ever have a negative size in there? That seems like an unnecessary "feature". The offsets are easily set up by the compiler as positive values. (not even sure how you'd come up with a proper/valid negative value) Cheers, -g > > > Index: object.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v > retrieving revision 2.124.4.11 > retrieving revision 2.124.4.12 > diff -C2 -r2.124.4.11 -r2.124.4.12 > *** object.c 2001/06/06 14:27:54 2.124.4.11 > --- object.c 2001/06/06 14:34:13 2.124.4.12 > *************** > *** 1074,1077 **** > --- 1074,1111 ---- > } > > + /* Helper to get a pointer to an object's __dict__ slot, if any */ > + > + PyObject ** > + _PyObject_GetDictPtr(PyObject *obj) > + { > + #define PTRSIZE (sizeof(PyObject *)) > + > + long dictoffset; > + PyTypeObject *tp = obj->ob_type; > + > + if (!(tp->tp_flags & Py_TPFLAGS_HAVE_CLASS)) > + return NULL; > + dictoffset = tp->tp_dictoffset; > + if (dictoffset == 0) > + return NULL; > + if (dictoffset < 0) { > + dictoffset += tp->tp_basicsize; > + assert(dictoffset > 0); /* Sanity check */ > + if (tp->tp_itemsize > 0) { > + int n = ((PyVarObject *)obj)->ob_size; > + if (n > 0) { > + dictoffset += tp->tp_itemsize * n; > + /* Round up, if necessary */ > + if (tp->tp_itemsize % PTRSIZE != 0) { > + dictoffset += PTRSIZE - 1; > + dictoffset /= PTRSIZE; > + dictoffset *= PTRSIZE; > + } > + } > + } > + } > + return (PyObject **) ((char *)obj + dictoffset); > + } > + > /* Generic GetAttr functions - put these in your tp_[gs]etattro slot */ > > *************** > *** 1082,1086 **** > PyObject *descr; > descrgetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1116,1120 ---- > PyObject *descr; > descrgetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1097,1103 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject *dict = * (PyObject **) ((char *)obj + dictoffset); > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > --- 1131,1137 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > ! PyObject *dict = *dictptr; > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > *************** > *** 1129,1133 **** > PyObject *descr; > descrsetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1163,1167 ---- > PyObject *descr; > descrsetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1143,1149 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject **dictptr = (PyObject **) ((char *)obj + dictoffset); > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > --- 1177,1182 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Greg Stein, http://www.lyra.org/ From guido@digicool.com Mon Jun 11 13:57:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 08:57:18 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: Your message of "Mon, 11 Jun 2001 04:31:59 PDT." <20010611043158.E26210@lyra.org> References: <20010611043158.E26210@lyra.org> Message-ID: <200106111257.IAA03505@cj20424-a.reston1.va.home.com> > > Modified Files: > > Tag: descr-branch > > object.c > > Log Message: > > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > > where __dict__ is stored in an object. The simplest case is to add > > tp_dictoffset to the start of the object, but there are comlications: > > tp_flags may tell us that tp_dictoffset is not defined, or the offset > > may be negative: indexing from the end of the object, where > > tp_itemsize may have to be taken into account. > > Why would you ever have a negative size in there? That seems like an > unnecessary "feature". The offsets are easily set up by the compiler as > positive values. (not even sure how you'd come up with a proper/valid > negative value) When extending a type like tuple or string, the __dict__ has to be added to the end, after the last item, because we can't change the starting offset of the first item. This is not at a fixed offset from the start of the structure. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Jun 11 17:50:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:50:11 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <3B24F6C3.C911C0BF@lemburg.com> I would like to add a .decode() method to Unicode objects and also enable the builtin unicode() to accept Unicode object as input. The .decode() method will work just like the .encode() method except that it interfaces to the decode API of the codec in question. While this may seem useless for the currently available encodings, it does have some use for codecs which recode Unicode to Unicode, e.g. codecs which do XML escaping or Unicode compression. Any objections ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Mon Jun 11 17:57:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:57:12 +0200 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <3B24F868.A3DFA649@lemburg.com> Tamito KAJIYAMA recently announced that he changed the licenses on his Japanese codecs from GPL to a BSD variant. This is great news since this would allow adding the codecs to the Python core which would certainly attract more users to Python in Asia. The codecs are available at: http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ The codecs are 280kB when compressed as .tar.gz file. Thoughts ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From aahz@rahul.net Mon Jun 11 18:42:30 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 11 Jun 2001 10:42:30 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B24F868.A3DFA649@lemburg.com> from "M.-A. Lemburg" at Jun 11, 2001 06:57:12 PM Message-ID: <20010611174230.0625E99C8D@waltz.rahul.net> M.-A. Lemburg wrote: > > Tamito KAJIYAMA recently announced that he changed the licenses > on his Japanese codecs from GPL to a BSD variant. This is great > news since this would allow adding the codecs to the Python core > which would certainly attract more users to Python in Asia. > > The codecs are 280kB when compressed as .tar.gz file. +0 I like the idea, am uncomfortable with that amount of space. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From fdrake@cj42289-a.reston1.va.home.com Mon Jun 11 20:15:06 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 11 Jun 2001 15:15:06 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Substantial additional material on floating point arithmetic in the tutorial, written by Tim Peters to explain why FP can fail to reflect the decimal world presented to the user. Lots of additional updates and corrections. From guido@digicool.com Mon Jun 11 21:07:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 16:07:40 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline Message-ID: <200106112007.f5BK7eW22506@odiug.digicool.com> Please comment on the following. This came up a while ago in python-dev and I decided to follow through. I'm making this a PEP because of the risk of breaking code (which everybody on Python-dev seemed to think was acceptable). --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 259 Title: Omit printing newline after newline Version: $Revision: 1.1 $ Author: guido@python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 11-Jun-2001 Post-History: 11-Jun-2001 Abstract Currently, the print statement always appends a newline, unless a trailing comma is used. This means that if we want to print data that already ends in a newline, we get two newlines, unless special precautions are taken. I propose to skip printing the newline when it follows a newline that came from data. In order to avoid having to add yet another magic variable to file objects, I propose to give the existing 'softspace' variable an extra meaning: a negative value will mean "the last data written ended in a newline so no space *or* newline is required." Problem When printing data that resembles the lines read from a file using a simple loop, double-spacing occurs unless special care is taken: >>> for line in open("/etc/passwd").readlines(): ... print line ... root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin: daemon:x:2:2:daemon:/sbin: (etc.) >>> While there are easy work-arounds, this is often noticed only during testing and requires an extra edit-test roundtrip; the fixed code is uglier and harder to maintain. Proposed Solution In the PRINT_ITEM opcode in ceval.c, when a string object is printed, a check is already made that looks at the last character of that string. Currently, if that last character is a whitespace character other than space, the softspace flag is reset to zero; this suppresses the space between two items if the first item is a string ending in newline, tab, etc. (but not when it ends in a space). Otherwise the softspace flag is set to one. The proposal changes this test slightly so that softspace is set to: -1 -- if the last object written is a string ending in a newline 0 -- if the last object written is a string ending in a whitespace character that's neither space nor newline 1 -- in all other cases (including the case when the last object written is an empty string or not a string) Then, the PRINT_NEWLINE opcode, printing of the newline is suppressed if the value of softspace is negative; in any case the softspace flag is reset to zero. Scope This only affects printing of 8-bit strings. It doesn't affect Unicode, although that could be considered a bug in the Unicode implementation. It doesn't affect other objects whose string representation happens to end in a newline character. Risks This change breaks some existing code. For example: print "Subject: PEP 259\n" print message_body In current Python, this produces a blank line separating the subject from the message body; with the proposed change, the body begins immediately below the subject. This is not very robust code anyway; it is better written as print "Subject: PEP 259" print print message_body In the test suite, only test_StringIO (which explicitly tests for this feature) breaks. Implementation A patch relative to current CVS is here: http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From BPettersen@NAREX.com Mon Jun 11 21:20:38 2001 From: BPettersen@NAREX.com (Bjorn Pettersen) Date: Mon, 11 Jun 2001 14:20:38 -0600 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <6957F6A694B49A4096F7CFD0D900042F27D452@admin56.narex.com> > From: Guido van Rossum [mailto:guido@digicool.com] > > Subject: PEP 259: Omit printing newline after newline This would probably break most of the cgi scripts I did at my last job without giving any useful error message. But then again... why should I care ? -- bjorn From skip@pobox.com (Skip Montanaro) Mon Jun 11 21:20:33 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 11 Jun 2001 15:20:33 -0500 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> References: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> Message-ID: <15141.10257.487549.196538@beluga.mojam.com> Fred> Substantial additional material on floating point arithmetic in Fred> the tutorial, written by Tim Peters to explain why FP can fail to Fred> reflect the decimal world presented to the user. I took a quick look at that appendix. One thing that confused me a bit was that if 0.1 is approximated by something ever-so-slightly larger than 0.1, how is it that if you add ten of them together you wind up with a result that is ever-so-slightly less than 1.0? I didn't expect it to be exactly 1.0. Other floating point naifs may be confused in the same way: >>> "%.55f" % 0.5 '0.5000000000000000000000000000000000000000000000000000000' >>> "%.55f" % 0.1 '0.1000000000000000055511151231257827021181583404541015625' >>> "%.55f" % (0.5+0.1) '0.5999999999999999777955395074968691915273666381835937500' I guess the explanation is that not only can't most decimals be represented exactly, but that summing the same approximation multiple times doesn't always skew the error in the same direction either: >>> "%.55f" % (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1) '0.7999999999999999333866185224906075745820999145507812500' >>> "%.55f" % (0.8) '0.8000000000000000444089209850062616169452667236328125000' IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, Skip From mal@lemburg.com Mon Jun 11 21:55:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 22:55:13 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <3B253031.AB1954CB@lemburg.com> Guido van Rossum wrote: > > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 259 > Title: Omit printing newline after newline > ... > Scope > > This only affects printing of 8-bit strings. It doesn't affect > Unicode, although that could be considered a bug in the Unicode > implementation. It doesn't affect other objects whose string > representation happens to end in a newline character. I guess I should fix the Unicode stuff ;-) > Risks > > This change breaks some existing code. For example: > > print "Subject: PEP 259\n" > print message_body > > In current Python, this produces a blank line separating the > subject from the message body; with the proposed change, the body > begins immediately below the subject. This is not very robust > code anyway; it is better written as > > print "Subject: PEP 259" > print > print message_body > > In the test suite, only test_StringIO (which explicitly tests for > this feature) breaks. Hmm, I think the above is a very typical idiom for RFC822 style content and used in CGI scripts a lot. I'm not sure whether this change is worth getting the CGI crowd upset... Wouldn't it make sense to only use this technique in inter- active mode ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Mon Jun 11 23:00:54 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 00:00:54 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> > I would like to add a .decode() method to Unicode objects and also > enable the builtin unicode() to accept Unicode object as input. -1. What is this good for? > While this may seem useless for the currently available encodings, > it does have some use for codecs which recode Unicode to Unicode, > e.g. codecs which do XML escaping or Unicode compression. I still can see the value. If you think the codec API is good for such transformation, why not use it? I.e. enc,dec,_,_ = codecs.lookup("compress-form-foo") s = dec(s) Furthermore, this seems like a form of hypergeneralization. If you have this, why not also add s = s.decode("capitalize") # instead of s.capitalize() i = s.decode("int") # instead of int(s) > Any objections ? Yes, I think this should not be added. Regards, Martin From paulp@ActiveState.com Tue Jun 12 00:38:55 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Mon, 11 Jun 2001 16:38:55 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25568F.B766E00D@ActiveState.com> "Martin v. Loewis" wrote: > >... > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) IMO, there is a huge usability difference between the above and mystr.decode("base64"). I think that we've done a good job of providing better ways to get at codecs than the codecs.lookup function. I don't see how this is any different. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Tue Jun 12 00:51:55 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 11:51:55 +1200 (NZST) Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: <200106112351.LAA03197@s454.cosc.canterbury.ac.nz> Skip Montanaro : > One thing that confused me a bit was > that if 0.1 is approximated by something ever-so-slightly larger than 0.1, > how is it that if you add ten of them together you wind up with a result > that is ever-so-slightly less than 1.0? I think what's happening is that the exact binary result of adding 0.1_plus_a_little to itself has one more bit than there is room for, so it gets shifted right and one bit falls off the end. The amount you lose when that happens a few times ends up outweighing the extra that you would expect. Whether it's worth trying to explain *that* in the tutorial I don't know! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Jun 12 01:00:33 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 12:00:33 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Guido: > I propose to skip printing the newline when it follows a newline > that came from data. -1 There's too much magic in the way print handles spaces and newlines already. Making it even more magical and inconsistent seems like exactly the wrong direction to be going in. If there are to be any changes to the way print works, I would prefer to see one that removes the need for the softspace flag altogether. The behaviour of a given print should not depend on state left behind by some previous one. Neither should it depend on whether the characters being printed come directly from a string or not. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Tue Jun 12 03:17:24 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 11 Jun 2001 22:17:24 -0400 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: [Skip Montanaro, on the in-progess 2.2 Tutorial appendix] > I took a quick look at that appendix. One thing that confused me > a bit was that if 0.1 is approximated by something ever-so-slightly > larger than 0.1, how is it that if you add ten of them together you > wind up with a result that is ever-so-slightly less than 1.0? Good for you, Skip! In all the years I've been explaining this stuff, I only recall one other picking up on that immediately. I'm not writing a book here, though , and any intro numeric programming text emphasizes that n*x is a better bet than adding x together n times. >>> .1 * 10 1.0 >>> Greg Ewing put you on the right track, if you want to figure it out yourself (as Deep Throat said, "follow the bits, Skip -- follow the bits"). > I didn't expect it to be exactly 1.0. Other floating point naifs > may be confused in the same way: > > >>> "%.55f" % 0.5 > '0.5000000000000000000000000000000000000000000000000000000' > >>> "%.55f" % 0.1 > '0.1000000000000000055511151231257827021181583404541015625' > >>> "%.55f" % (0.5+0.1) > '0.5999999999999999777955395074968691915273666381835937500' Note that this output is platform-dependent. For example, the last on Windows is >>> "%.55f" % (0.5+0.1) '0.5999999999999999800000000000000000000000000000000000000' > ... > IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, All computer arithmetic is; and among binary fp systems, 754 has got to be the best-behaved there is. Know how many irksome bugs I've fixed in Python mucking with different sizes of integers across platforms, and what C does and doesn't guarantee about them? About 20x more than fp bugs. Of course there's 10000x as much integer code in Python too . god-created-the-integers-from-1-through-3-inclusive-and-that's-it-ly y'rs - tim From barry@digicool.com Tue Jun 12 04:00:52 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 11 Jun 2001 23:00:52 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Message-ID: <15141.34276.191510.708654@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> There's too much magic in the way print handles spaces and GE> newlines already. Making it even more magical and inconsistent GE> seems like exactly the wrong direction to be going in. I tend to agree. I'm sometimes bitten by the double newlines, but as I think Andrew brought up in c.l.py, I'd rather see a way to tell readlines() to strip the newlines than to add more magic to print. print-has-all-the-magic-it-needs-now-<>-ly y'rs, -Barry From fredrik@pythonware.com Tue Jun 12 07:21:55 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 08:21:55 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> guido wrote: > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). when was this discussed on python-dev? From mal@lemburg.com Tue Jun 12 08:09:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:09:05 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25C011.125B6462@lemburg.com> "Martin v. Loewis" wrote: > > > I would like to add a .decode() method to Unicode objects and also > > enable the builtin unicode() to accept Unicode object as input. > > -1. What is this good for? See below :) > > While this may seem useless for the currently available encodings, > > it does have some use for codecs which recode Unicode to Unicode, > > e.g. codecs which do XML escaping or Unicode compression. > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) Sure and that's the point. I would like to add the .decode() method to make this just as simple as encoding Unicode to UTF-8. Note that strings already have this method: str.encode() str.decode() uni.encode() #uni.decode() # still missing > Furthermore, this seems like a form of hypergeneralization. If you > have this, why not also add > > s = s.decode("capitalize") # instead of s.capitalize() > i = s.decode("int") # instead of int(s) No, that's not the intention. One very useful application for this method is XML unescaping which turns numeric XML entities into Unicode chars. Others are Unicode decompression (using the Unicode compression algorithm) and certain forms of Unicode normalization. The key argument for these interfaces is that they provide an extensible transformation mechanism for string and binary data. > > Any objections ? > > Yes, I think this should not be added. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Tue Jun 12 08:29:02 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 12 Jun 2001 03:29:02 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: [/F] > when was this discussed on python-dev? It wasn't -- it actually came up on one of the SourceForge mailing lists ... ah, of course, tried to search but "Geocrawler is down for nightly database maintenance". They sure have long nights . I'm guessing it's the python-iterators list. It spun off of a thread where Guido was wondering whether one of the new ways to spell "iterate over a file" should return lines without trailing \n, so that e.g. for line in sys.stdin: print line wasn't a surprise. I opined it would be better to make all ways of iterating a file do the same thing, but change print instead. We both agreed that couldn't happen. But then I couldn't find any code it would break, only code of the form print line, where the "," was trying to suppress the extra newline, and that would continue to work the same way even if print were changed. The notion that legions of people are using print line as an obscure way to get double-spacing is taking me by surprise. Nobody on the iterators list had this objection. win-some-lose-some-lose-some-lose-some-lose-some-ly y'rs - tim From mal@lemburg.com Tue Jun 12 08:35:08 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:35:08 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010611174230.0625E99C8D@waltz.rahul.net> Message-ID: <3B25C62C.969B40B3@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > > > Tamito KAJIYAMA recently announced that he changed the licenses > > on his Japanese codecs from GPL to a BSD variant. This is great > > news since this would allow adding the codecs to the Python core > > which would certainly attract more users to Python in Asia. > > > > The codecs are 280kB when compressed as .tar.gz file. > > +0 > > I like the idea, am uncomfortable with that amount of space. Tamito corrected me about the size (his file includes the .pyc byte code files): the correct size for the sources is 143kB -- almost half of what I initially wrote. If that should still be too much, there are probably some ways to further compress the size of the mapping tables which could be investigated. PS: Tamito is very thrilled about getting his codecs into the core and I am quite certain that he is also prepared to maintain them (I have put him on CC). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim@digicool.com Tue Jun 12 08:37:55 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 12 Jun 2001 03:37:55 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Include longobject.h,2.19,2.20 In-Reply-To: <3B25C116.3E65A32D@lemburg.com> Message-ID: [M.-A. Lemburg] > I have tried to compile longobject.c/h on a HP-UX box and am getting > warnings about MIN/MAX being redefined. Perhaps you should add > an #undef for these before the #define ?! I changed nothing relevant here. Are you certain this is a new problem? The MIN/MAX macros have been in longobject.c for a long time, and I didn't touch them. In any case, I'm not inclined to fiddle things on a box where I can't see a problem so can't know whether I'm fixing it or just creating new problems. If you can figure out why it's happening on that box, and it's a legit problem there, feel free to fix it. From SBrunning@trisystems.co.uk Tue Jun 12 09:25:19 2001 From: SBrunning@trisystems.co.uk (Simon Brunning) Date: Tue, 12 Jun 2001 09:25:19 +0100 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <31575A892FF6D1118F5800600846864D78BD25@intrepid> > From: Guido van Rossum [SMTP:guido@digicool.com] > In order to avoid having to add yet another magic variable to file > objects, I propose to give the existing 'softspace' variable an > extra meaning: a negative value will mean "the last data written > ended in a newline so no space *or* newline is required." Better another magic variable than a magic value for an old one, I think. Cheers, Simon Brunning TriSystems Ltd. sbrunning@trisystems.co.uk ----------------------------------------------------------------------- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. TriSystems Ltd. cannot accept liability for statements made which are clearly the senders own. From thomas@xs4all.net Tue Jun 12 09:33:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 10:33:30 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: ; from tim.one@home.com on Tue, Jun 12, 2001 at 03:29:02AM -0400 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <20010612103330.D690@xs4all.nl> On Tue, Jun 12, 2001 at 03:29:02AM -0400, Tim Peters wrote: > [/F] > > when was this discussed on python-dev? > It wasn't -- it actually came up on one of the SourceForge mailing lists ... > I'm guessing it's the python-iterators list. I'm guessing the same thing, because I *did* see the proposal somewhere. I recall thinking 'that might work' but not much else, anyway. > The notion that legions of people are using > print line > as an obscure way to get double-spacing is taking me by surprise. Bah, humbug! (And you can quote me on that.) Backward compatibility is not an issue -- that's why we have future-imports and warning mechanisms. Import smart-print from future to get the new behaviour, and warn whenever print *would* *have* printed one newline less otherwise. Regardless, I'm -1 on this change. Not because of backward compatibility problem, but because of what GregE said. Let's not make print even more magically unpredictably confusing than it already is, with comma's that do something magical, softspace to control that magic, and shifting the print operator to the right :-) Why can't we use for line in file: print line, to print all lines in a file ? Softspace doesn't seem to add a space (though I had to write a testcase to make sure ;) and 'explicit is better than implicit'. I'd also prefer special syntax to control the softspace behaviour, like say: print "spam:", "ham" : "and" : "eggs" to print 'spamandeggs' without a space inbetween. Too late for that, I 'spose :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 10:42:52 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 11:42:52 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: "mal@lemburg.com"'s message of Tue, 12 Jun 2001 09:09:05 +0200 Message-ID: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> > str.encode() > str.decode() > uni.encode() > #uni.decode() # still missing It's not missing. str.decode and uni.encode go through a single codec; that's easy. str.encode is somewhat more confusing, because it really is unicode(str).encode. Now, you are not proposing that uni.decode is str(uni).decode, are you? If not that, what else would it mean? And if it means something else, it is clearly not symmetric to str.encode, so it is not "missing". > One very useful application for this method is XML unescaping > which turns numeric XML entities into Unicode chars. Ok. Please show me how that would work. More precisely, please write a PEP describing the rationale for this feature, including use case examples and precise semantics of the proposed addition. > The key argument for these interfaces is that they provide > an extensible transformation mechanism for string and binary > data. That is too general for me to understand; I need to see detailed examples that solve real-world problems. Regards, Martin P.S. I don't think that unescaping XML characters entities into Unicode characters is a useful application in itself. This is normally done by the XML parser, which not only has to deal with character entities, but also with general entities and a lot of other markup. Very few people write XML parsers, and they are using the string methods and the sre module successfully (if the parser is written in Python - a C parser would do the unescaping before even passing the text to Python). From thomas@xs4all.net Tue Jun 12 11:02:03 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 12:02:03 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl>; from thomas@xs4all.net on Tue, Jun 12, 2001 at 10:33:30AM +0200 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> Message-ID: <20010612120203.E690@xs4all.nl> On Tue, Jun 12, 2001 at 10:33:30AM +0200, Thomas Wouters wrote: > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. Err. I meant "hamandeggs" with no space inbetween. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue Jun 12 11:13:21 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:13:21 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> Message-ID: <3B25EB41.807C2C51@lemburg.com> "Martin v. Loewis" wrote: > > > str.encode() > > str.decode() > > uni.encode() > > #uni.decode() # still missing > > It's not missing. str.decode and uni.encode go through a single codec; > that's easy. str.encode is somewhat more confusing, because it really > is unicode(str).encode. Now, you are not proposing that uni.decode is > str(uni).decode, are you? No. uni.decode() will (just like the other methods) directly interface to the codecs decoder -- there is no magic conversion involved. It is meant to be used by Unicode-Unicode codecs > If not that, what else would it mean? And if it means something else, > it is clearly not symmetric to str.encode, so it is not "missing". It is in the sense that strings support this method and Unicode currently doesn't. > > One very useful application for this method is XML unescaping > > which turns numeric XML entities into Unicode chars. > > Ok. Please show me how that would work. More precisely, please write a > PEP describing the rationale for this feature, including use case > examples and precise semantics of the proposed addition. There's no need for a PEP. This addition is much too simple to require a PEP on its own. As for use cases: I have already given a whole bunch of them (Unicode compression, normalization, escaping in various ways). Codecs are in no way constrained to only interface between strings and Unicode. There are many other possibilities for their usage out there. Just look at the latest checkins for a bunch of string-string codecs for examples of codecs which solve common real-life problems and do not interface to Unicode. > > The key argument for these interfaces is that they provide > > an extensible transformation mechanism for string and binary > > data. > > That is too general for me to understand; I need to see detailed > examples that solve real-world problems. > > Regards, > Martin > > P.S. I don't think that unescaping XML characters entities into > Unicode characters is a useful application in itself. This is normally > done by the XML parser, which not only has to deal with character > entities, but also with general entities and a lot of other markup. > Very few people write XML parsers, and they are using the string > methods and the sre module successfully (if the parser is written in > Python - a C parser would do the unescaping before even passing the > text to Python). True, but not all XML text out there is meant for XML parsers to read ;-). Preprocessing of e.g. XML text in Python is a rather common thing to do and this is what the direct codec access methods are meant for. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@pythonware.com Tue Jun 12 11:46:36 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:46:36 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> mal wrote: > > Ok. Please show me how that would work. More precisely, please write a > > PEP describing the rationale for this feature, including use case > > examples and precise semantics of the proposed addition. > > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. we'd been better off if you'd written a PEP before you started adding decode and encode stuff. what's currently implemented is ugly enough; adding more warts won't make it any prettier. -1 on anything except a PEP that covers *all* aspects of encode/decode (including things that are already implemented) From fredrik@pythonware.com Tue Jun 12 11:47:49 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:47:49 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> Message-ID: <00ba01c0f32d$208d4160$0900a8c0@spiff> Thomas Wouters wrote: > > print "spam:", "ham" : "and" : "eggs" > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. and "+" (or plain whitespace) instead of ":", right? From fredrik@pythonware.com Tue Jun 12 11:55:27 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:55:27 +0200 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline References: <31575A892FF6D1118F5800600846864D78BD25@intrepid> Message-ID: <00c301c0f32e$31cd7ed0$0900a8c0@spiff> simon wrote: > > > In order to avoid having to add yet another magic variable to file > > objects, I propose to give the existing 'softspace' variable an > > extra meaning: a negative value will mean "the last data written > > ended in a newline so no space *or* newline is required." > > Better another magic variable than a magic value for an old one, I think. many file-like C types (e.g. cStringIO) already have special code to deal with a softspace integer attribute. From mal@lemburg.com Tue Jun 12 11:57:32 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:57:32 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <3B25F59C.9AAF604A@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Ok. Please show me how that would work. More precisely, please write a > > > PEP describing the rationale for this feature, including use case > > > examples and precise semantics of the proposed addition. > > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > we'd been better off if you'd written a PEP before you started > adding decode and encode stuff. what's currently implemented > is ugly enough; adding more warts won't make it any prettier. Could you please be more specific about what is "ugly" in the current implementation ? The .encode/.decode methods are a direct interface to the codecs encoder and decoder APIs. I can't find anything ugly about this in general except maybe some of the constraints which were originally put into these interface on the grounds of using them for string/Unicode conversions -- I have already removed most of these and would like to clean this up completely before 2.2 gets out. > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Gee, Guido starts breaking code and nobody objects; I try to clean up some left-overs in the Unicode implementation and people start huge discussions about it. Something is backwards here... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 12:00:40 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 13:00:40 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B25EB41.807C2C51@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> > > > str.encode() > > > str.decode() > > > uni.encode() > > > #uni.decode() # still missing > > > > It's not missing. str.decode and uni.encode go through a single codec; > > that's easy. str.encode is somewhat more confusing, because it really > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > str(uni).decode, are you? > > No. uni.decode() will (just like the other methods) directly > interface to the codecs decoder -- there is no magic conversion > involved. It is meant to be used by Unicode-Unicode codecs When invoking "Hallo".encode("utf-8"), two conversions are executed: first the default decoding into Unicode, then the UTF-8 encoding. Of course, that is not the intended use (but then, is the intended use documented anywhere?): instead, people should write "Hallo".encode("base64") instead. This is an example I can understand, although I'm not sure why it is inherently better to write this instead of writing base64.encodestring("Hallo"). > > If not that, what else would it mean? And if it means something else, > > it is clearly not symmetric to str.encode, so it is not "missing". > > It is in the sense that strings support this method and Unicode > currently doesn't. The rationale for string.encode is weak: it argues that string->string conversions are frequent enough to justify this API, even though these conversions have nothing to do with coded character sets. So far, I can see *no* rationale for unicode.decode. > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. PEP 1 says: # We intend PEPs to be the primary mechanisms for proposing new # features, for collecting community input on an issue, and for # documenting the design decisions that have gone into Python. The # PEP author is responsible for building consensus within the # community and documenting dissenting opinions. So we have a proposal for a new feature, and we have dissenting opinions. Who are you to decide that this additions is too simple to require a PEP on its own? > As for use cases: I have already given a whole bunch of them > (Unicode compression, normalization, escaping in various ways). I was asking for specific examples: Names of specific codecs that you want to implement, and application code fragments using these specific codecs. I don't know how to use Unicode compression if I had such this proposed feature, for example. I know what XML escaping is, and I cannot see how this feature would help. > True, but not all XML text out there is meant for XML parsers to > read ;-). Preprocessing of e.g. XML text in Python is a rather common > thing to do and this is what the direct codec access methods are > meant for. Can you give an example of an application which processes XML without a parser, but with converting character entities (preferably open-source, so I can study its code)? I wonder whether they get CDATA sections right... MAL, I really mean that: Please don't make claims that something is common or useful without giving an *exact* example. Regards, Martin P.S. This insistence on adding Unicode and string methods makes it appear as if the author of the codecs module now thinks that the API of it sucks. From thomas@xs4all.net Tue Jun 12 12:16:05 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 13:16:05 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <00ba01c0f32d$208d4160$0900a8c0@spiff> References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> <00ba01c0f32d$208d4160$0900a8c0@spiff> Message-ID: <20010612131605.Q22849@xs4all.nl> On Tue, Jun 12, 2001 at 12:47:49PM +0200, Fredrik Lundh wrote: > Thomas Wouters wrote: > > > print "spam:", "ham" : "and" : "eggs" > > > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. > and "+" (or plain whitespace) instead of ":", right? Not really. That would only work for string-types. Print auto-converts, remember ? At least the ':' is unambiguous. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue Jun 12 12:42:31 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 13:42:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> Message-ID: <3B260027.7DD33246@lemburg.com> "Martin v. Loewis" wrote: > > > > > str.encode() > > > > str.decode() > > > > uni.encode() > > > > #uni.decode() # still missing > > > > > > It's not missing. str.decode and uni.encode go through a single codec; > > > that's easy. str.encode is somewhat more confusing, because it really > > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > > str(uni).decode, are you? > > > > No. uni.decode() will (just like the other methods) directly > > interface to the codecs decoder -- there is no magic conversion > > involved. It is meant to be used by Unicode-Unicode codecs > > When invoking "Hallo".encode("utf-8"), two conversions are executed: > first the default decoding into Unicode, then the UTF-8 encoding. Of > course, that is not the intended use (but then, is the intended use > documented anywhere?): instead, people should write > "Hallo".encode("base64") instead. This is an example I can understand, > although I'm not sure why it is inherently better to write this > instead of writing base64.encodestring("Hallo"). Please note that the conversion from string to Unicode is done by the codec, not the .encode() interface. > > > If not that, what else would it mean? And if it means something else, > > > it is clearly not symmetric to str.encode, so it is not "missing". > > > > It is in the sense that strings support this method and Unicode > > currently doesn't. > > The rationale for string.encode is weak: it argues that string->string > conversions are frequent enough to justify this API, even though these > conversions have nothing to do with coded character sets. You still don't get it: codecs can be used for much more than just character set conversion ! > So far, I can see *no* rationale for unicode.decode. > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > PEP 1 says: > > # We intend PEPs to be the primary mechanisms for proposing new > # features, for collecting community input on an issue, and for > # documenting the design decisions that have gone into Python. The > # PEP author is responsible for building consensus within the > # community and documenting dissenting opinions. > > So we have a proposal for a new feature, and we have dissenting > opinions. Who are you to decide that this additions is too simple to > require a PEP on its own? So you want a PEP for each and every small addition to in the core ?! (I am not talking about features which might break code !) > > As for use cases: I have already given a whole bunch of them > > (Unicode compression, normalization, escaping in various ways). > > I was asking for specific examples: Names of specific codecs that you > want to implement, and application code fragments using these specific > codecs. I don't know how to use Unicode compression if I had such this > proposed feature, for example. I know what XML escaping is, and I > cannot see how this feature would help. I think I have given enough examples in this thread already. See below for some more. > > True, but not all XML text out there is meant for XML parsers to > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > thing to do and this is what the direct codec access methods are > > meant for. > > Can you give an example of an application which processes XML without > a parser, but with converting character entities (preferably > open-source, so I can study its code)? I wonder whether they get CDATA > sections right... MAL, I really mean that: Please don't make claims > that something is common or useful without giving an *exact* example. Yes, I am using these feature in real code and no, I can't show it to you because it's closed source. XML is only one example where this would be useful, HTML is another text format which would benefit from it, URL encoding is yet another application. You basically find these applications in all situations where some form of escaping is needed. What I am trying to do here is simplify codec access and usage for the casual user. .encode() and .decode() are very intuitive ways to deal with data transformation, IMHO. > Regards, > Martin > > P.S. This insistence on adding Unicode and string methods makes it > appear as if the author of the codecs module now thinks that the API > of it sucks. No comment. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry@digicool.com Tue Jun 12 15:22:26 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:22:26 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <15142.9634.842402.241225@anthem.wooz.org> >>>>> "M" == M writes: M> Codecs are in no way constrained to only interface between M> strings and Unicode. There are many other possibilities for M> their usage out there. Just look at the latest checkins for a M> bunch of string-string codecs for examples of codecs which M> solve common real-life problems and do not interface to M> Unicode. Having just followed this thread tangentially, I do have to say it seems quite cool to be able to do something like the following in Python 2.2: >>> s = msg['from'] >>> parts = s.split('?') >>> if parts[2].lower() == 'q': ... name = parts[3].decode('quopri') ... elif parts[2].lower() == 'b': ... name = parts[3].decode('base64') ... -Barry From fredrik@pythonware.com Tue Jun 12 15:45:16 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 16:45:16 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> barry wrote: > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') uhuh? and how exactly is this cooler than being able to do something like the following: import quopri, base64 s = msg['from'] parts = s.split('?') if parts[2].lower() == 'q': name = quopri.decodestring(parts[3]) elif parts[2].lower() == 'b': name = base64.decodestring(parts[3]) (going through the codec registry is slower, and imports more modules, but what's so cool with that?) From barry@digicool.com Tue Jun 12 15:50:01 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:50:01 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <15142.11289.16053.424966@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> uhuh? and how exactly is this cooler than being able to do FL> something like the following: | import quopri, base64 | s = msg['from'] | parts = s.split('?') | if parts[2].lower() == 'q': | name = quopri.decodestring(parts[3]) | elif parts[2].lower() == 'b': | name = base64.decodestring(parts[3]) FL> (going through the codec registry is slower, and imports more FL> modules, but what's so cool with that?) -------------------- snip snip -------------------- Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import quopri >>> quopri.decodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'decodestring' >>> quopri.encodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'encodestring' -------------------- snip snip -------------------- Much cooler :) Okay, okay, so we /could/ add encodestring/decodestring to quopri.py, which isn't a bad idea. But it seems to me that the s.encode() s.decode() API is nicely universal for any supported encoding. but-what-do-i-know?-ly y'rs, -Barry From skip@pobox.com (Skip Montanaro) Tue Jun 12 16:32:11 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 12 Jun 2001 10:32:11 -0500 Subject: [Python-Dev] Re: metaclasses -- aka Don Beaudry hook/hack In-Reply-To: References: Message-ID: <15142.13819.477491.993419@beluga.mojam.com> James> Before I head too deeply into Zope dependencies, I would be James> interested in knowing whether or not "type(MyClass) == James> types.ClassType" and "isinstance(myInstance,MyClass)" work for James> classes derived from ExtensionClass. Straight from the horse's mouth: >>> type(gtk.GtkButton) >>> type(gtk.GtkButton) == types.ClassType 0 >>> isinstance(gtk.GtkButton(), gtk.GtkButton) 1 James> (And if so, why do these work for C extension classes using the James> Don Beaudry hook but not for Python classes using the same hook?) You'll have to ask someone with more subject knowledge. (Don would probably be a good start. ;-) I've cc'd python-dev because the experts in this area are all there. -- Skip Montanaro (skip@pobox.com) (847)971-7098 From skip@pobox.com (Skip Montanaro) Tue Jun 12 16:53:24 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 12 Jun 2001 10:53:24 -0500 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <15142.15092.57490.275201@beluga.mojam.com> Tim> The notion that legions of people are using Tim> print line Tim> as an obscure way to get double-spacing is taking me by surprise. Tim> Nobody on the iterators list had this objection. I suspect that most CGI scripts that didn't use any abstraction for HTTP responses suffer from this potential problem. I've been using one abstraction or another for quite awhile now, but I still have a few CGI scripts laying around that still use print to emit headers and bodies of HTTP responses. Skip From barry@digicool.com Tue Jun 12 17:06:53 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 12:06:53 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <15142.15092.57490.275201@beluga.mojam.com> Message-ID: <15142.15901.223641.151562@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> I suspect that most CGI scripts that didn't use any SM> abstraction for HTTP responses suffer from this potential SM> problem. I've been using one abstraction or another for quite SM> awhile now, but I still have a few CGI scripts laying around SM> that still use print to emit headers and bodies of HTTP SM> responses. Same here. From paulp@ActiveState.com Tue Jun 12 18:22:31 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:22:31 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <3B264FD7.86ACB034@ActiveState.com> "Barry A. Warsaw" wrote: > >... > > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... I think that the central point is that if code like the above is useful and supported then it needs to be the same for Unicode strings as for 8-bit strings. If the code above is NOT useful and should NOT be supported then we need to undo it before 2.2 ships. This unicode.decode argument is just a proxy for the real argument about the above. I don't feel strongly one way or another about this (ab?)use of the codecs concept, myself, but I do feel strongly that Unicode strings should behave as much as possible like 8-bit strings. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Tue Jun 12 18:31:54 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:31:54 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <3B26520A.C579D00C@ActiveState.com> Fredrik Lundh wrote: > >... > > uhuh? and how exactly is this cooler than being able to do > something like the following: > > import quopri, base64 >... > > (going through the codec registry is slower, and imports more > modules, but what's so cool with that?) One argument in favor is that the base64 and quopri modules are not standardized today. In fact, Python has a huge problem with standardization of access paradigms in the standard library. We get the best standardization (i.e. of the "file interface") when we force module authors to conform to a standard in order to get some "extra feature" of the standard library. A counter argument is that the conflation of the concept of Unicode encoding/decoding and other forms of encoding/decoding could be confusing. MAL would not have to keep pointing out that "codecs are for more than Unicode encoding/decoding" if it was obvious. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry@digicool.com Tue Jun 12 19:24:25 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:24:25 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <15142.24153.921774.610559@anthem.wooz.org> >>>>> "PP" == Paul Prescod writes: PP> I don't feel strongly one way or another about this (ab?)use PP> of the codecs concept, myself, but I do feel strongly that PP> Unicode strings should behave as much as possible like 8-bit PP> strings. I'd agree with both statements. time-to-add-{encode,decode}string()-to-quopri-ly y'rs, -Barry From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:00:19 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:00:19 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B260027.7DD33246@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> <3B260027.7DD33246@lemburg.com> Message-ID: <200106121800.f5CI0Jw00946@mira.informatik.hu-berlin.de> > > So we have a proposal for a new feature, and we have dissenting > > opinions. Who are you to decide that this additions is too simple to > > require a PEP on its own? > > So you want a PEP for each and every small addition to in the > core ?! (I am not talking about features which might break code !) No, additions that find immediate consent and come with complete patches (including documentation and test cases) don't need this overhead. Features that find resistance should go through the full process. > > I was asking for specific examples: Names of specific codecs that you > > want to implement, and application code fragments using these specific > > codecs. I don't know how to use Unicode compression if I had such this > > proposed feature, for example. I know what XML escaping is, and I > > cannot see how this feature would help. > > I think I have given enough examples in this thread already. See > below for some more. I haven't seen a single example involving actual Python code. > > > True, but not all XML text out there is meant for XML parsers to > > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > > thing to do and this is what the direct codec access methods are > > > meant for. > > > > Can you give an example of an application [...] > > Yes, I am using these feature in real code and no, I can't show it to > you because it's closed source. Not very convincing... If this is "a rather common thing to do", it shouldn't be hard to find examples in other people's code, shouldn't it? > XML is only one example where this would be useful, HTML is another > text format which would benefit from it, URL encoding is yet another > application. You basically find these applications in all situations > where some form of escaping is needed. These are all not specific examples. I'm still looking for a specific application that might use this feature, and specific codec names and implementations. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:08:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:08:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.9634.842402.241225@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... What is the type of parts[3] here? If it is a plain string, it is already possible: >>> 'SGVsbG8=\n'.decode("base64") 'Hello' I doubt you'd ever have a Unicode string that represents a base64-encoded byte string, and if you had, .decode would probably do the wrong thing: >>> import codecs >>> enc,dec,_,_ = codecs.lookup("base64") >>> dec(u'SGVsbG8=\n') ('Hello', 9) Note that this returns a byte string, not a Unicode string. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 19:18:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:18:45 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B264FD7.86ACB034@ActiveState.com> (message from Paul Prescod on Tue, 12 Jun 2001 10:22:31 -0700) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> > > Having just followed this thread tangentially, I do have to say it > > seems quite cool to be able to do something like the following in > > Python 2.2: > > > > >>> s = msg['from'] > > >>> parts = s.split('?') > > >>> if parts[2].lower() == 'q': > > ... name = parts[3].decode('quopri') > > ... elif parts[2].lower() == 'b': > > ... name = parts[3].decode('base64') > > ... > > I think that the central point is that if code like the above is useful > and supported then it needs to be the same for Unicode strings as for > 8-bit strings. Why is that? An encoding, by nature, is something that produces a byte sequence from some input. So you can only decode byte sequences, not character strings. > If the code above is NOT useful and should NOT be supported then we > need to undo it before 2.2 ships. This unicode.decode argument is > just a proxy for the real argument about the above. No, it isn't. The code is useful for byte strings, but not for Unicode strings. > I don't feel strongly one way or another about this (ab?)use of the > codecs concept, myself, but I do feel strongly that Unicode strings > should behave as much as possible like 8-bit strings. Not at all. Byte strings and character strings are as different as are byte strings and lists of DOM child nodes (i.e. the only common thing is that they are sequences). Regards, Martin From barry@digicool.com Tue Jun 12 19:35:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:35:10 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> Message-ID: <15142.24798.941322.762791@anthem.wooz.org> >>>>> "MvL" == Martin v Loewis writes: MvL> What is the type of parts[3] here? If it is a plain string, MvL> it is already possible: >> 'SGVsbG8=\n'.decode("base64") MvL> 'Hello' But only in Python 2.2a0 currently, right? And yes, the type is plain string. MvL> I doubt you'd ever have a Unicode string that represents a MvL> base64-encoded byte string, and if you had, .decode would MvL> probably do the wrong thing: >> import codecs enc,dec,_,_ = codecs.lookup("base64") >> dec(u'SGVsbG8=\n') MvL> ('Hello', 9) MvL> Note that this returns a byte string, not a Unicode string. I trust you on that. ;) I've only played with this tangentially since this thread cropped up. -Barry From paulp@ActiveState.com Tue Jun 12 19:51:25 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 11:51:25 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> Message-ID: <3B2664AD.B560D685@ActiveState.com> "Martin v. Loewis" wrote: > >... > > Why is that? An encoding, by nature, is something that produces a byte > sequence from some input. So you can only decode byte sequences, not > character strings. According to this logic, it is not logical to "encode" a Unicode string into a base64'd Unicode string or "decode" a Unicode string from a base64'd Unicode string. But I have seen circumstances where one XML document is base64'd into another. In that circumstance, it would be useful to say node.nodeValue.decode("base64"). Let me turn the argument around? What would the *harm* in having 8-bit strings and Unicode strings behave similarly in this manner? >... > Not at all. Byte strings and character strings are as different as are > byte strings and lists of DOM child nodes (i.e. the only common thing > is that they are sequences). 8-bit strings are not purely byte strings. They are also "character strings". That's why they have methods like "capitalize", "isalpha", "lower", "swapcase", "title" and so forth. DOM nodes and byte strings have virtually no methods in common. We could argue angels on the head of a pin until the cows come home but 90% of all Python users think of 8-bit strings as strings of characters. So arguments based on the idea that they are not "really" character strings are wishful thinking. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From martin@loewis.home.cs.tu-berlin.de Tue Jun 12 21:01:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 22:01:39 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.24798.941322.762791@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> <15142.24798.941322.762791@anthem.wooz.org> Message-ID: <200106122001.f5CK1de01350@mira.informatik.hu-berlin.de> > MvL> What is the type of parts[3] here? If it is a plain string, > MvL> it is already possible: > > >> 'SGVsbG8=\n'.decode("base64") > MvL> 'Hello' > > But only in Python 2.2a0 currently, right? Exactly, since MAL's last patch. If people think that byte strings must behave exactly as Unicode strings, I'd rather prefer to back out this patch instead of adding unicode.decode. Personally, I think the status quo is fine and should not be changed. Regards, Martin From aahz@rahul.net Wed Jun 13 00:48:14 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 12 Jun 2001 16:48:14 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B25C62C.969B40B3@lemburg.com> from "M.-A. Lemburg" at Jun 12, 2001 09:35:08 AM Message-ID: <20010612234815.2C90599C82@waltz.rahul.net> M.-A. Lemburg wrote: > Aahz Maruch wrote: >> M.-A. Lemburg wrote: >>> >>> Tamito KAJIYAMA recently announced that he changed the licenses >>> on his Japanese codecs from GPL to a BSD variant. This is great >>> news since this would allow adding the codecs to the Python core >>> which would certainly attract more users to Python in Asia. >>> >>> The codecs are 280kB when compressed as .tar.gz file. >> >> +0 >> >> I like the idea, am uncomfortable with that amount of space. > > Tamito corrected me about the size (his file includes the .pyc > byte code files): the correct size for the sources is 143kB -- > almost half of what I initially wrote. That makes me +0.5, possibly a bit higher. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From greg@cosc.canterbury.ac.nz Wed Jun 13 00:57:35 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 11:57:35 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl> Message-ID: <200106122357.LAA03316@s454.cosc.canterbury.ac.nz> Thomas Wouters : > I'd also prefer special syntax to control the softspace > behaviour... Too late for that, I 'spose Maybe not. I'd suggest spelling "don't add a newline or a space after this" as: print a, b, c... This could coexist with the current softspace behaviour, and the use of a trailing comma could be deprecated. After a suitable warning period, the softspace flag could then be removed. > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. I don't think it's so important to have a special syntax for that, since it can be accomplished in other ways without too much difficulty, e.g. print "%s: %s%s%s" % ("spam", "ham", "and", "eggs")... The main thing I'd like is to get rid of the statefulness of the current behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jun 13 01:02:40 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 12:02:40 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Particularly, it should clearly explain why we need a completely new and separate namespace mechanism for these codec things, and provide a firm rationale for deciding whether any proposed new form of encoding or decoding should be placed in this namespace or the module namespace. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From paulp@ActiveState.com Wed Jun 13 01:32:17 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 17:32:17 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B26B491.CA8536BD@ActiveState.com> Aahz Maruch wrote: > >.... > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We really shouldn't consider the Japanese without Chinese and Korean. And those both seem *larger* than the Japanese. :( What if we add them to CVS and formally maintain them as part of the core but distribute them as a separate download? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Wed Jun 13 03:25:23 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:25:23 -0700 Subject: [Python-Dev] Pure Python strptime Message-ID: <3B26CF13.2A337AC6@ActiveState.com> Should this strptime implementation be added to the standard library? http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/56036 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp@ActiveState.com Wed Jun 13 03:41:53 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:41:53 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> Message-ID: <3B26D2F1.8840FB1A@ActiveState.com> Greg Ewing wrote: > > > -1 on anything except a PEP that covers *all* aspects of > > encode/decode (including things that are already implemented) > > Particularly, it should clearly explain why we need a > completely new and separate namespace mechanism for these > codec things, I don't know whether MAL will write the PEP or not but the rationale for a new namespace is trivial. The namespace exists and is maintained by the Internet Assigned Names Association. You can't work with Unicode without working with names from this list: http://www.iana.org/assignments/character-sets MAL is basically exending it to include names from this list: http://www.iana.org/assignments/transfer-encodings and others. > and provide a firm rationale for deciding > whether any proposed new form of encoding or decoding > should be placed in this namespace or the module namespace. *My* answer would be that any function that has strings (8-bit or Unicode) as both domain and range is potentially a codec. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Wed Jun 13 05:45:36 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 16:45:36 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <200106130445.QAA03370@s454.cosc.canterbury.ac.nz> Paul Prescod : > The namespace exists and is maintained by > the Internet Assigned Names Association. Hmmm... so, is the only reason that we're not using the module namespace the fact that these names can contain non-alphanumeric characters? Or is there more to it than that? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com (Skip Montanaro) Wed Jun 13 06:09:38 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 00:09:38 -0500 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B26B491.CA8536BD@ActiveState.com> References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <15142.62866.180570.158325@beluga.mojam.com> Paul> What if we add them to CVS and formally maintain them as part of Paul> the core but distribute them as a separate download? That seems to make sense to me. I suspect most Linux distributions (for example) bundle Python into multiple pieces already. My Mandrake system splits the core into (I think) four pieces. It also bundles several other RPMs for PIL, NumPy, Postgres and RPM. Adding another package for a set of codecs doesn't seem like a big deal. Skip From mal@lemburg.com Wed Jun 13 08:02:05 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:02:05 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B270FED.8E2A4ECB@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > Aahz Maruch wrote: > >> M.-A. Lemburg wrote: > >>> > >>> Tamito KAJIYAMA recently announced that he changed the licenses > >>> on his Japanese codecs from GPL to a BSD variant. This is great > >>> news since this would allow adding the codecs to the Python core > >>> which would certainly attract more users to Python in Asia. > >>> > >>> The codecs are 280kB when compressed as .tar.gz file. > >> > >> +0 > >> > >> I like the idea, am uncomfortable with that amount of space. > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We will be working on reducing the size of the mapping tables. Can't promise anything, but I believe that Tamito can squeeze them into under 100k using some compression technique (which one is yet to be determined ;). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 08:05:31 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:05:31 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <3B2710BB.CFD8215@lemburg.com> Paul Prescod wrote: > > Aahz Maruch wrote: > > > >.... > > > > > > Tamito corrected me about the size (his file includes the .pyc > > > byte code files): the correct size for the sources is 143kB -- > > > almost half of what I initially wrote. > > > > That makes me +0.5, possibly a bit higher. > > We really shouldn't consider the Japanese without Chinese and Korean. > And those both seem *larger* than the Japanese. :( Unfortunately, these aren't available under a usable (=non-GPL) license yet. > What if we add them to CVS and formally maintain them as part of the > core but distribute them as a separate download? Good idea. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 08:17:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:17:14 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <3B27137A.E7BFC4EC@lemburg.com> Paul Prescod wrote: > > Greg Ewing wrote: > > > > > -1 on anything except a PEP that covers *all* aspects of > > > encode/decode (including things that are already implemented) > > > > Particularly, it should clearly explain why we need a > > completely new and separate namespace mechanism for these > > codec things, > > I don't know whether MAL will write the PEP or not With the kind of attitude towards the proposed extensions which I am currently getting in this forum, I'd rather spend my time on something more useful. > but the rationale for > a new namespace is trivial. The namespace exists and is maintained by > the Internet Assigned Names Association. You can't work with Unicode > without working with names from this list: > > http://www.iana.org/assignments/character-sets > > MAL is basically exending it to include names from this list: > > http://www.iana.org/assignments/transfer-encodings > > and others. Right. Since these codecs live in the encoding package, I don't think we have a namespace problem here. Codecs which are hooked into the codec registry by the encoding package's search function will have to provide a getregentry() entry point. If this API is not available, the codec won't load. Since the encoding package's search function is using standard Python imports for loading the codecs, we can also benefit from a nice side-effect: codec names can use Python's dotted names (which then map to standard Python packages). This allows codec writers like Tamito to place their codecs into Python package thereby avoiding any conflict with other authors of codecs with similar names. > > and provide a firm rationale for deciding > > whether any proposed new form of encoding or decoding > > should be placed in this namespace or the module namespace. > > *My* answer would be that any function that has strings (8-bit or > Unicode) as both domain and range is potentially a codec. Right. (Hey, the first time *we* agree on something ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed Jun 13 13:53:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 14:53:50 +0200 Subject: [Python-Dev] Weird message to stderr Message-ID: <3B27625E.F18046F7@lemburg.com> Running Python 2.1 using a .pyc file I get these weird messages printed to stderr: run_pyc_file: nested_scopes: 0 These originate in pythonrun.c: static PyObject * run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) { PyCodeObject *co; PyObject *v; long magic; long PyImport_GetMagicNumber(void); magic = PyMarshal_ReadLongFromFile(fp); if (magic != PyImport_GetMagicNumber()) { PyErr_SetString(PyExc_RuntimeError, "Bad magic number in .pyc file"); return NULL; } (void) PyMarshal_ReadLongFromFile(fp); v = PyMarshal_ReadLastObjectFromFile(fp); fclose(fp); if (v == NULL || !PyCode_Check(v)) { Py_XDECREF(v); PyErr_SetString(PyExc_RuntimeError, "Bad code object in .pyc file"); return NULL; } co = (PyCodeObject *)v; v = PyEval_EvalCode(co, globals, locals); if (v && flags) { if (co->co_flags & CO_NESTED) flags->cf_nested_scopes = 1; fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", flags->cf_nested_scopes); } Py_DECREF(co); return v; } Is this is left over debug printf or should I be warned in some way ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed Jun 13 15:41:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 10:41:37 -0400 Subject: [Python-Dev] Re: Adding .decode() method to Unicode In-Reply-To: Your message of "Tue, 12 Jun 2001 22:40:01 EDT." References: Message-ID: <200106131441.KAA16557@cj20424-a.reston1.va.home.com> Wow, this almost looks like a real flamefest. ("Flame" being defined as the presence of metacomments.) (In the following, s is an 8-bit string, u is a Unicode string, and e is an encoding name.) The original design of the encode() methods of string and Unicode objects (in 2.0 and 2.1) is asymmetric, and clearly geared towards Unicode codecs only: to decode an 8-bit string you *have* to use unicode(s, encoding) while to encode a Unicode string into a specific 8-bit encoding you *have* to use u.encode(e). 8-bit strings also have an encode() method: s.encode(e) is the same as unicode(s).encode(e). (This is useful since code that expects Unicode strings should also work when it is passed ASCII-encoded 8-bit strings.) I'd say there's no need for s.decode(e), since this can already be done with unicode(s, e) -- and to me that API looks better since it clearly states that the result is Unicode. We *could* have designed the encoding API similarly: str(u, e) is available, symmetric with unicode(s, e), and a logical extension of str(u) which uses the default encoding. But I accept the argument that u.encode(e) is better because it emphasizes the encoding action, and because it means no API changes to str(). I guess what I'm saying here is that 'str' does not give enough of a clue that an encoding action is going on, while 'unicode' *does* give a clue that a decoding action is being done: as soon as you read "Unicode" you think "Mmm, encodings..." -- but "str" is pretty neutral, so u.encode(e) is needed to give a clue. Marc-Andre proposes (and has partially checked in) changes that stretch the meaning of the encode() method, and add a decode() method, to be basically interfaces to anything you can do with the codecs module. The return type of encode() and decode() is now determined by the codec (formerly, encode() always returned an 8-bit string). Some new codecs have been added that do things like gzip and base64. Initially, I liked this, and even contributed a codec. But questions keep coming up. What is the problem being solved? True, the codecs module has a clumsy interface if you just want to invoke a codec on some data. But that can easily be remedied by adding convenience functions encode() and decode() to codecs.py -- which would have the added advantage that it would work for other datatypes that support the buffer interface, e.g. codecs.encode(myPILobject, "base64"). True, the "codec" pattern can be used for other encodings than Unicode. But it seems to me that the entire codecs architecture is rather strongly geared towards en/decoding Unicode, and it's not clear how well other codecs fit in this pattern (e.g. I noticed that all the non-Unicode codecs ignore the error handling parameter or assert that it is set to 'strict'). Is it really right that x.encode("gzip") and x.encode("utf-8") look similar, while the former requires an 8-bit string and the latter only makes sense if x is a Unicode string? Another (minor) issue is that Unicode encoding names are an IANA namespace. Is it wise to add our own names to this? I'm not forcing a decision here, but I do ask that we consider these issues before forging ahead with what might be a mistake. A PEP would be most helpful to focus the discussion. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Jun 13 16:19:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 11:19:03 -0400 Subject: [Python-Dev] Releasing 2.0.1 Message-ID: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> I think it's now or never with the 2.0.1 release. Moshe seems to have disappeared from the face of the earth. His last mail to me (May 23) suggested that it was good to go except for the SRE checkin and the NEWS file. I did the SRE checkin today (making it identical to what's in 2.1, per /F's recommendation) and added a note about that to the NEWS file -- I wouldn't know what else would be needed there. So I think it's good to go now. I can release a 2.0.1c1 this week (indicating a release candidate) and a final 2.0.1 next week. If you know a good reason why I should hold off on releasing this, or if you have a patch that absolutely should make it into 2.0.1, please let me know NOW! This project is way overdue. (Thomas is ready to release 2.1.1 as soon as this goes out, I believe. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed Jun 13 16:29:19 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 17:29:19 +0200 Subject: [Python-Dev] Releasing 2.0.1 References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <023f01c0f41d$9dfb87b0$0900a8c0@spiff> guido wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 From skip@pobox.com (Skip Montanaro) Wed Jun 13 16:49:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 10:49:58 -0500 Subject: [Python-Dev] on announcing point releases Message-ID: <15143.35750.837420.376281@beluga.mojam.com> (Just thinking out loud) I wonder if it would help gain wider distribution for the point releases if explicit announcements were sent to the various Linux distributors so they could create updated packages (RPMs, debs, whatever) for their users. On a related note, I see one RedHat email address on python-dev (and one Debian address on python-list). Are there other Linux distributions that are heavy Python users (as opposed to simply packaging it up for inclusion)? If so, perhaps they should be invited to join python-dev. Skip From niemeyer@conectiva.com Wed Jun 13 16:54:08 2001 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 13 Jun 2001 12:54:08 -0300 Subject: [Python-Dev] sre improvements Message-ID: <20010613125408.W13940@tux.distro.conectiva> I'm forwarding this to the dev list.. probably somebody here knows about this... -------------- Hi there!! I have looked into sre, and was wondering if somebody is working to implement more features in it. I'd like, for example, to see the (?(1)blah) operator, available in perl, working. Should I care about this? Should I write some code?? Anybody working in sre currently? Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From skip@pobox.com (Skip Montanaro) Wed Jun 13 17:03:58 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 13 Jun 2001 11:03:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <20010613125408.W13940@tux.distro.conectiva> References: <20010613125408.W13940@tux.distro.conectiva> Message-ID: <15143.36590.447465.657241@beluga.mojam.com> Gustavo> I'd like, for example, to see the (?(1)blah) operator, Gustavo> available in perl, working. Gustavo, For the non-Perl-heads on the list, can you explain what the (?(1)blah) operator does? -- Skip Montanaro (skip@pobox.com) (847)971-7098 From gregor@hoffleit.de Wed Jun 13 17:13:17 2001 From: gregor@hoffleit.de (Gregor Hoffleit) Date: Wed, 13 Jun 2001 18:13:17 +0200 Subject: [Python-Dev] on announcing point releases In-Reply-To: <15143.35750.837420.376281@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 10:49:58AM -0500 References: <15143.35750.837420.376281@beluga.mojam.com> Message-ID: <20010613181317.B30006@mediasupervision.de> On Wed, Jun 13, 2001 at 10:49:58AM -0500, Skip Montanaro wrote: > I wonder if it would help gain wider distribution for the point releases if > explicit announcements were sent to the various Linux distributors so they > could create updated packages (RPMs, debs, whatever) for their users. > > On a related note, I see one RedHat email address on python-dev (and one > Debian address on python-list). Are there other Linux distributions that > are heavy Python users (as opposed to simply packaging it up for inclusion)? > If so, perhaps they should be invited to join python-dev. Rest assured that Debian is present on python-dev as well, and nervously looking forward to the maintenance releases ;-) I hope 2.1.1 will make it out in time as well for our next release (being aware that 'before the next Debian release happens' is no very tight timeframe ;-). Gregor From guido@digicool.com Wed Jun 13 17:16:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 12:16:42 -0400 Subject: [Python-Dev] Re: PEP 259: Omit printing newline after newline Message-ID: <200106131616.MAA17468@cj20424-a.reston1.va.home.com> OK, OK, PEP 259 is dead. It seemed a nice idea at the time. :-) Alex and others, if you're serious about implementing print as __print__(), why don't you write a PEP? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Jun 13 17:21:20 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Jun 2001 12:21:20 -0400 (EDT) Subject: [Python-Dev] on announcing point releases In-Reply-To: <20010613181317.B30006@mediasupervision.de> References: <15143.35750.837420.376281@beluga.mojam.com> <20010613181317.B30006@mediasupervision.de> Message-ID: <15143.37632.758887.966026@cj42289-a.reston1.va.home.com> Gregor Hoffleit writes: > looking forward to the maintenance releases ;-) I hope 2.1.1 will make it > out in time as well for our next release (being aware that 'before the next Personally, I see no reason for Thomas to wait for the 2.0.1 release if he doesn't want to. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fredrik@pythonware.com Wed Jun 13 17:32:13 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 18:32:13 +0200 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <007801c0f426$84d1f220$4ffa42d5@hagrid> skip wrote: > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? conditionals: (?(cond)true) (?(cond)true|false) where cond is a group number (true if defined) or an assertion pattern, and true/false are patterns. (imo, whoever invented that needs help ;-) From akuchlin@mems-exchange.org Wed Jun 13 17:39:58 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 13 Jun 2001 12:39:58 -0400 Subject: [Python-Dev] sre improvements Message-ID: >For the non-Perl-heads on the list, can you explain what the (?(1)blah) >operator does? Conditionals. From http://www.perl.com/pub/doc/manual/html/pod/perlre.html, (...)(?(1)A|B) will match 'A' if group 1 matched, and B if it didn't. I'm not sure how "matched" is defined, as the Perl docs are vague; judging from the example, it means 'matched something of nonzero length'. Perl 5.6 introduced a bunch of new regex features, but I'm not sure how much we actually *care* about them; they're no doubt useful if regexes are the only tool you've got and you try to do full parsers using them, but they're also complicated to explain and will make the compiler messier. For example, lookaheads can also go into the conditional, not just an integer. (?i) now obeys the scoping from parens, and you can turn it off with (?-i). If Gustavo wants to implement these features and /F approves of his patches, then sure, put them in. But if either of those conditions fails, little will be lost. --amk From dmitry.antipov@auriga.ru Wed Jun 13 17:46:09 2001 From: dmitry.antipov@auriga.ru (dmitry.antipov@auriga.ru) Date: Wed, 13 Jun 2001 20:46:09 +0400 Subject: [Python-Dev] Why not Lisp-like list-related functions ? Message-ID: <3B2798D1.16F832A3@auriga.ru> Hello all, I'm new to Python but quite familiar with Lisp. So my question is about Python list-related functions. Why append(), extend(), sort(), reverse() etc. doesn't return a reference to it's own (modified) argument ? IMHO (I'm tweaking Python 2.1 to allow first example possible), >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) [9, 13, 19, 21, 8, 3, 6] >>> looks much better (and more "functional") than >>> x = [5, 8, 9, 3] >>> x.sort() >>> x = [3 + x * 2 for x in x] >>> y = [6, 3, 8] >>> y.reverse() >>> x.extend(y) >>> x [9, 13, 19, 21, 8, 3, 6] >>> Python designers and fans, please explain it to me :-). Any comments are welcome. Thanks and reply to me directly if possible, Dmitry Antipov From guido@digicool.com Wed Jun 13 18:01:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 13:01:34 -0400 Subject: [Python-Dev] Weird message to stderr Message-ID: <200106131701.NAA17619@cj20424-a.reston1.va.home.com> > Running Python 2.1 using a .pyc file I get these weird messages > printed to stderr: > > run_pyc_file: nested_scopes: 0 > > These originate in pythonrun.c: > > static PyObject * > run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, > PyCompilerFlags *flags) > { [...] > if (v && flags) { > if (co->co_flags & CO_NESTED) > flags->cf_nested_scopes = 1; > fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", > flags->cf_nested_scopes); > } > Py_DECREF(co); > return v; > } > > Is this is left over debug printf or should I be warned > in some way ? I'll channel Jeremy... Looks like a debug message -- this code isn't tested by the standard test suite. Feel free to get rid of the fprintf() statement (and no, you don't have to write a PEP for this :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed Jun 13 18:06:52 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 19:06:52 +0200 Subject: [Python-Dev] Why not Lisp-like list-related functions ? References: <3B2798D1.16F832A3@auriga.ru> Message-ID: <012d01c0f42b$45453b30$4ffa42d5@hagrid> Dmitry wrote: > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? doesn't Lisp have a FAQ? ;-) http://www.python.org/doc/FAQ.html#6.20 Q. Why doesn't list.sort() return the sorted list? ... basically, operations that modify an object generally don't return the object itself, to avoid mistakes like: for item in list.reverse(): print item # backwards ... for item in list.reverse(): print item # backwards, or? a slightly more pythonic way would be to add sorted, extended, reversed (etc) -- but that leads to method bloat. in addition, based on studying huge amounts of python code, I doubt cascading list operations would save the world that much typing... followups to python-list@python.org From paulp@ActiveState.com Wed Jun 13 18:22:09 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 13 Jun 2001 10:22:09 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> Message-ID: <3B27A141.6C69EC55@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > > > We really shouldn't consider the Japanese without Chinese and Korean. > > And those both seem *larger* than the Japanese. :( > > Unfortunately, these aren't available under a usable (=non-GPL) > license yet. Frank Chen has agreed to make them available under a Python-style license. > > What if we add them to CVS and formally maintain them as part of the > > core but distribute them as a separate download? > > Good idea. All in favour? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From aahz@rahul.net Wed Jun 13 18:32:24 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 13 Jun 2001 10:32:24 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B27A141.6C69EC55@ActiveState.com> from "Paul Prescod" at Jun 13, 2001 10:22:09 AM Message-ID: <20010613173224.0FFB999C87@waltz.rahul.net> >>> What if we add them to CVS and formally maintain them as part of the >>> core but distribute them as a separate download? >> >> Good idea. > > All in favour? +1 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gward@python.net Wed Jun 13 19:53:20 2001 From: gward@python.net (Greg Ward) Date: Wed, 13 Jun 2001 14:53:20 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <007801c0f426$84d1f220$4ffa42d5@hagrid>; from fredrik@pythonware.com on Wed, Jun 13, 2001 at 06:32:13PM +0200 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> Message-ID: <20010613145320.G5114@gerg.ca> On 13 June 2001, Fredrik Lundh said: > conditionals: > > (?(cond)true) > (?(cond)true|false) > > where cond is a group number (true if defined) or an assertion > pattern, and true/false are patterns. > > (imo, whoever invented that needs help ;-) I think I'd have to agree with /F on this one... somewhere around Perl 5.003 or 5.004, regexes in Perl went from being a powerful and really cool facility to being a massively overgrown language-within-a-language. I *tried* to use some of the fancy new features a few times out of curiosity, but could never get them to work. (At the time, I think I was a pretty sharp Perl programmer, although I've dulled since then.) Greg -- Greg Ward - Unix bigot gward@python.net http://starship.python.net/~gward/ No animals were harmed in transmitting this message. From jepler@inetnebr.com Wed Jun 13 17:09:58 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Wed, 13 Jun 2001 11:09:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <15143.36590.447465.657241@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 11:03:58AM -0500 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <20010613110957.C29405@inetnebr.com> On Wed, Jun 13, 2001 at 11:03:58AM -0500, Skip Montanaro wrote: > > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > Gustavo, > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? from perlre(1): (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero- width assertion. Say, m{ ( \( )? [^()]+ (?(1) \) ) }x matches a chunk of non-parentheses, possibly included in parentheses themselves. Jeff From tim.one@home.com Thu Jun 14 07:12:48 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 14 Jun 2001 02:12:48 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B2664AD.B560D685@ActiveState.com> Message-ID: [Paul Prescod] > ... > We could argue angels on the head of a pin until the cows come home but > 90% of all Python users think of 8-bit strings as strings of characters. Actually, if you count me, make that 92%. some-things-were-easier-when-python-had-50-users-and-i-was-two- of-them-ly y'rs - tim From paulp@ActiveState.com Thu Jun 14 08:30:19 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 00:30:19 -0700 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> Message-ID: <3B28680B.A46CF171@ActiveState.com> Greg Ward wrote: > >... > > I think I'd have to agree with /F on this one... somewhere around Perl > 5.003 or 5.004, regexes in Perl went from being a powerful and really > cool facility to being a massively overgrown language-within-a-language. > I *tried* to use some of the fancy new features a few times out of > curiosity, but could never get them to work. (At the time, I think I > was a pretty sharp Perl programmer, although I've dulled since then.) I would rather see us try a new approach to regular expressions. I've seen a few proposals for more verbose-but-readable syntaxes. I think one was from Greg Ewing? And maybe one from Ping? For those of us who use regular expressions only once in a while (i.e. the lucky ones), the current syntax is a holy terror. Which characters are magical again? In what contexts? With how many levels of backslashing? Upper case W versus lower case W? Obviously we can never abandon the tried and true Perl5 RE module, but I think we could have another syntax on top. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From arigo@ulb.ac.be Thu Jun 14 09:58:48 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Thu, 14 Jun 2001 10:58:48 +0200 (MET DST) Subject: [Python-Dev] Special-casing "O" Message-ID: Hello everybody, For comparison purposes, I implemented the idea of optimizing PyArg_ParseTuple calls by modifying the C code itself. Here is the result: http://homepages.ulb.ac.be/~arigo/pyarg_pp.tgz I did not upload this as a patch at SourceForge for several reasons. The most fundamental is that it raises bootstrapping issues: how can we compile the Python interpreter if we first have to run a Python script on the source files ? Fixing this would make the Makefiles significantly more complex. The other reason is that the METH_O solution is probably still faster, as it often completely avoids to build the 1-tuple of arguments. More serious performance tests might be needed, however. A bientot, Armin. From thomas@xs4all.net Thu Jun 14 12:10:01 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 14 Jun 2001 13:10:01 +0200 Subject: [Python-Dev] Releasing 2.0.1 In-Reply-To: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <20010614131001.B1659@xs4all.nl> On Wed, Jun 13, 2001 at 11:19:03AM -0400, Guido van Rossum wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 here. > If you know a good reason why I should hold off on releasing this, or > if you have a patch that absolutely should make it into 2.0.1, please > let me know NOW! This project is way overdue. (Thomas is ready to > release 2.1.1 as soon as this goes out, I believe. :-) Well, not quite, but I can put in a couple of allnighters (I want to do a review of all log-messages since 2.1-final, to see if I missed any checkin messages, and I want to update the NEWS file with a list of bugs fixed) and have it ready in a week or two. I don't think 2.1.1 should be released *that* soon after 2.0.1 anyway. I noticed this in the LICENCE file, by the way: Python 2.1 is a derivative work of Python 1.6.1, as well as of Python 2.0. and 8. By copying, installing or otherwise using Python 2.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Thu Jun 14 12:14:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:14:22 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? Message-ID: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> > Hello all, > > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? IMHO (I'm tweaking Python 2.1 to allow first example > possible), > > >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) > [9, 13, 19, 21, 8, 3, 6] > >>> > > looks much better (and more "functional") than > > >>> x = [5, 8, 9, 3] > >>> x.sort() > >>> x = [3 + x * 2 for x in x] > >>> y = [6, 3, 8] > >>> y.reverse() > >>> x.extend(y) > >>> x > [9, 13, 19, 21, 8, 3, 6] > >>> > > Python designers and fans, please explain it to me :-). > Any comments are welcome. > > Thanks and reply to me directly if possible, > Dmitry Antipov Funny, to me your first form is much harder to read than your second. With the first form, I have to stop and think and look carefully at where the brackets are to see in which order the operations are executed, while in the second form it's obvious, because it's broken down in smaller chunks. So I guess that's the real reason: Python users have a procedural brain, not a functional brain, and we don't like Lispish code. Maybe we also have a smaller brain than the typical Lisper -- I would say, that would make us more normal, and if Python caters to people with a closer-to-average brain size, that would mean more people will be able to program in Python. History will decide... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Jun 14 12:31:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:31:16 -0400 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +1, as long as they're not in the CVS subtree that's normally extracted for a regular source distribution. I propose this location in the CVS tree: python/dist/encodings/... (So 'encodings' would be a sibling of 'src', which has been pretty lonely ever since I started using CVS. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Jun 14 16:19:28 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 14 Jun 2001 11:19:28 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <200106141114.HAA25430@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jun 14, 2001 at 07:14:22AM -0400 References: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> Message-ID: <20010614111928.A4560@ute.cnri.reston.va.us> On Thu, Jun 14, 2001 at 07:14:22AM -0400, Guido van Rossum wrote: >Maybe we also have a smaller brain than the typical Lisper -- I would >say, that would make us more normal, and if Python caters to people >with a closer-to-average brain size, that would mean more people will >be able to program in Python. History will decide... I thought it already has, pretty much. --amk From tim@digicool.com Thu Jun 14 17:49:07 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 14 Jun 2001 12:49:07 -0400 Subject: [Python-Dev] PEP 255: Simple Generators Message-ID: You can view an HTML version of PEP 255 here: http://python.sourceforge.net/peps/pep-0255.html Discussion should take place primarily on the Python Iterators list: mailto:python-iterators@lists.sourceforge.net If replying directly to this message, please remove (at least) Python-Dev and Python-Announce. PEP: 255 Title: Simple Generators Version: $Revision: 1.3 $ Author: nas@python.ca (Neil Schemenauer), tim.one@home.com (Tim Peters), magnus@hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators@lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 Post-History: 14-Jun-2001 Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. Specification A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase this in. [XXX spell this out] The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). When a return statement is encountered, nothing is returned, but a StopIteration exception is raised, signalling that the iterator is exhausted. The same is true if control flows off the end of the function. Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions. Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print Q & A Q. Why a new keyword? Why not a builtin function instead? A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new keyword makes that easy. Reference Implementation A preliminary patch against the CVS Python source is available[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html Copyright This document has been placed in the public domain. From guido@digicool.com Thu Jun 14 18:30:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 13:30:42 -0400 Subject: [Python-Dev] Python 2.0.1c1 - GPL-compatible release candidate Message-ID: <200106141730.f5EHUgX03621@odiug.digicool.com> With a sigh of relief I announce Python 2.0.1c1 -- the first Python release in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Note that this is a release candidate. We don't expect any problems, but we're being careful nevertheless. We're planning to do the final release of 2.0.1 a week from now; expect it to be identical to the release candidate except for some dotted i's and crossed t's. Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=39267 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Thu Jun 14 12:46:25 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:46:25 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <02db01c0f4c7$a491c620$0900a8c0@spiff> during a late hacking pass, I was perplexed to realized that r"[\u0000-\uffff]" didn't match any unicode character, and reported it as bug #420011. but a few minutes later, I realized that SRE doesn't support \u and \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works as expected. should I close the bug report, or turn it into a feature request? From fredrik@pythonware.com Thu Jun 14 12:52:26 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:52:26 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> Message-ID: <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Paul wrote: > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +0.5 I still think adding them to the core is okay, but that's me. Cheers /F From gward@python.net Thu Jun 14 21:11:49 2001 From: gward@python.net (Greg Ward) Date: Thu, 14 Jun 2001 16:11:49 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <3B28680B.A46CF171@ActiveState.com>; from paulp@ActiveState.com on Thu, Jun 14, 2001 at 12:30:19AM -0700 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> <3B28680B.A46CF171@ActiveState.com> Message-ID: <20010614161149.C9884@gerg.ca> On 14 June 2001, Paul Prescod said: > I would rather see us try a new approach to regular expressions. I've > seen a few proposals for more verbose-but-readable syntaxes. I think one > was from Greg Ewing? And maybe one from Ping? I remember Ping's from a few year's back. It was pretty cool, but awfully verbose. I *like* the compactness of the One True Regex Language (ie. the one implemented by Perl 5, PCRE, and SRE). > For those of us who use regular expressions only once in a while (i.e. > the lucky ones), the current syntax is a holy terror. Which characters > are magical again? In what contexts? With how many levels of > backslashing? Upper case W versus lower case W? Wow, you should try keeping grep vs. egrep vs. sed vs. awk (which version again?) vs. emacs straight. I generally don't bother: as soon as a problem gets too hairy for grep/sed/awk/etc., I whip out my trusty old friend "perl -e" and all is well again. Unless I'm already coding in Python of course, in which case I whip out my trusty old friend re.compile(), and everything just works. I guess I just have a good memory for line noise. > Obviously we can never abandon the tried and true Perl5 RE module, but I > think we could have another syntax on top. Yeah, I s'pose it could be useful. Yet another great teaching tool, at any rate. Greg -- Greg Ward - Python bigot gward@python.net http://starship.python.net/~gward/ Quick!! Act as if nothing has happened! From greg@cosc.canterbury.ac.nz Fri Jun 15 01:56:50 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 12:56:50 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <20010614161149.C9884@gerg.ca> Message-ID: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Paul Prescod: > I think one > was from Greg Ewing? And maybe one from Ping? I can't remember what my first proposal (many years ago now) was like, but you might like to look at what I'm using in my Plex module: http://www.cosc.canterbury.ac.nz/~greg/python/Plex Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From paulp@ActiveState.com Fri Jun 15 02:36:13 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 18:36:13 -0700 Subject: [Python-Dev] sre improvements References: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Message-ID: <3B29668D.ADFB3C22@ActiveState.com> Greg Ewing wrote: > > Paul Prescod: > > > I think one > > was from Greg Ewing? And maybe one from Ping? > > I can't remember what my first proposal (many years ago > now) was like, but you might like to look at what I'm > using in my Plex module: > > http://www.cosc.canterbury.ac.nz/~greg/python/Plex I would be interested in *both* your regular expression library and your lexer for the Python standard library. But separately. Maybe we need two short PEPs that point to the documentation and suggest how the two packages could be integrated into the standard library. What do you think? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg@cosc.canterbury.ac.nz Fri Jun 15 02:49:04 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 13:49:04 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <3B29668D.ADFB3C22@ActiveState.com> Message-ID: <200106150149.NAA03631@s454.cosc.canterbury.ac.nz> > I would be interested in *both* your regular expression library and your > lexer for the Python standard library. But separately. Well, the regular expressions aren't really a separable part of Plex. I mentioned it as a possible source of ideas for anyone working on a new syntax for the regexp stuff. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mal@lemburg.com Fri Jun 15 08:58:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 09:58:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Message-ID: <3B29C037.FB1DB6B8@lemburg.com> Fredrik Lundh wrote: > > Paul wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +0.5 > > I still think adding them to the core is okay, but that's me. What would be the threshold for doing so ? Tamito is actively working on reducing the table sizes of the the codecs and after what I have seen you do on these sort of tables I am pretty sure Tamito can turn these tables into shared libs which are smaller than 200k. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From MarkH@ActiveState.com Fri Jun 15 09:05:26 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Fri, 15 Jun 2001 18:05:26 +1000 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B29C037.FB1DB6B8@lemburg.com> Message-ID: > > I still think adding them to the core is okay, but that's me. > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. But isn't this set only one of the many possible Asian codecs? I would have no objection to one 200k module, but if we really wanted to handle "asian codecs" I believe this is only the start. For this reason, I would give a -0 to adding these to the core, and a +1 to adding them to the directory structure proposed by Guido. Mark. From guido@digicool.com Fri Jun 15 17:59:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 12:59:40 -0400 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106151659.MAA30396@cj20424-a.reston1.va.home.com> > during a late hacking pass, I was perplexed to realized that > r"[\u0000-\uffff]" didn't match any unicode character, and reported > it as bug #420011. > > but a few minutes later, I realized that SRE doesn't support \u and > \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works > as expected. > > should I close the bug report, or turn it into a feature request? > > You meant ur"[\u0000-\uffff]", right? (It works the same -- Unicode raw strings still do \u expansion, although the rationale escapes me at the moment -- as does the rationale for why ru"..." is a syntax error...) Looks like a feature request to me. Since \000 and \x00 work in that context, \u0000 would be expected to work. And suppose someone uses u"[\u0000-\u005d]"... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jun 15 20:00:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 15:00:26 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch Message-ID: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> I've checked in Neil's latest generator patch into a branch of the CVS tree. That makes it (hopefully) easier for folks to play with. Tim, can you update the PEP to point to this branch? (There's some boilerplate code about branches in PEP 252 or 253 that you could adapt.) I had to change the code in ceval.c because of recent conflicting changes there. The test suite runs (except test_inspect), but I'd appreciate it if someone (Neil?) could make sure that I didn't overlook anything. (I should probably check the CVS logs. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. If you saw a checkin of Grammar/Grammar in the *head* branch, that was a mistake, and I've already corrected it. From paulp@ActiveState.com Fri Jun 15 20:19:08 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 15 Jun 2001 12:19:08 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> Message-ID: <3B2A5FAC.C5089CC2@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. Don't forget Chinese (Taiwan and mainland) and Korean! I guess I don't see the big deal in making them separate downloads. We can use distutils to make them easy to install .exe's for Reference Python and PPM for ActivePython. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal@lemburg.com Fri Jun 15 21:05:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 22:05:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> <3B2A5FAC.C5089CC2@ActiveState.com> Message-ID: <3B2A6A9B.AC156262@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > What would be the threshold for doing so ? > > > > Tamito is actively working on reducing the table sizes of the the > > codecs and after what I have seen you do on these sort of tables I > > am pretty sure Tamito can turn these tables into shared libs which are > > smaller than 200k. > > Don't forget Chinese (Taiwan and mainland) and Korean! > > I guess I don't see the big deal in making them separate downloads. We > can use distutils to make them easy to install .exe's for Reference > Python and PPM for ActivePython. Ok. BTW, how come www.python.org no longer provides precompiled (contributed) binaries for the various OSes out there ? The FTP server only has these for Python <= 1.5.2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri Jun 15 22:39:42 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 15 Jun 2001 17:39:42 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch In-Reply-To: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I've checked in Neil's latest generator patch into a branch of the CVS > tree. That makes it (hopefully) easier for folks to play with. It will for me, and I thank you. > Tim, can you update the PEP to point to this branch? Done. From martin@loewis.home.cs.tu-berlin.de Fri Jun 15 23:17:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 16 Jun 2001 00:17:49 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> > should I close the bug report, or turn it into a feature request? I think the bug report can be closed. Myself, I found it sufficient that you can write normal \u escapes in strings, in particular as you can also use them in raw strings: >>> ur"Ha\u006Clo" u'Hallo' Perhaps not very intuitive, and perhaps even a bug (how do you put a backslash in front of a "u" in a raw unicode string), but useful in this context. Regards, Martin From guido@digicool.com Sat Jun 16 16:46:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 11:46:14 -0400 Subject: [Python-Dev] 2.0.1's GPL-compatibility is official! Message-ID: <200106161546.LAA05521@cj20424-a.reston1.va.home.com> Richard Stallman, Eben Moglen and the FSF agree: Python 2.0.1 is compatible with the GPL. They've updated the text about the Python license on http://www.gnu.org/philosophy/license-list.html, stating in particular: GPL-Compatible, Free Software Licenses [...] The License of Python 1.6a2 and earlier versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that newer versions of Python are under other licenses (see below). The License of Python 2.0.1, 2.1.1, and newer versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that intermediate versions of Python (1.6b1, through 2.0 and 2.1) are under a different license (see below). I would like to emphasize and clarify (again!) that Python is *not* released under the GPL, so if you think the GPL is a bad thing, you don't have to worry about Python being contaminated. The GPL compatibility is important for folks who distribute Python binaries: e.g. the new license makes it okay to release Python binaries linked with GNU readline and other GPL-covered libraries. We'll release the final release of 2.0.1 within a week; so far we've had only one bug reported in the release candidate. I expect that we won't have to wait long for 2.1.1, which will have the same GPL-compatible license as 2.0.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat Jun 16 17:10:27 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 12:10:27 -0400 Subject: [Python-Dev] contributed binaries (was: Adding Asian codecs...) Message-ID: <200106161610.MAA05684@cj20424-a.reston1.va.home.com> > BTW, how come www.python.org no longer provides precompiled > (contributed) binaries for the various OSes out there ? > The FTP server only has these for Python <= 1.5.2. There are some binaries for newer versions, mostly Linux RPMs, but these are in different places. I agree the FTP download area is a mess. I propose to give up on the FTP area and start over on the new Zope-based web server, if and when it's ready. Not enough people are helping out, so it's going slowly. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sat Jun 16 19:59:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 16 Jun 2001 20:59:52 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions References: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> Message-ID: <3B2BACA7.CDA96737@lemburg.com> "Martin v. Loewis" wrote: > > > should I close the bug report, or turn it into a feature request? > > I think the bug report can be closed. Myself, I found it sufficient > that you can write normal \u escapes in strings, in particular as you > can also use them in raw strings: > > >>> ur"Ha\u006Clo" > u'Hallo' > > Perhaps not very intuitive, and perhaps even a bug (how do you put a > backslash in front of a "u" in a raw unicode string), but useful in > this context. >>> print ur"backslash in front of an 'u': \u005cu" backslash in front of an 'u': \u A double backslash is easier to have: >>> print ur"double backslash in front of an 'u': \\u" double backslash in front of an 'u': \\u Python uses C's convention for \uXXXX where \u is only interpreted as Unicode escape of it is used with an odd number of backslashes in front of it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Mon Jun 18 01:57:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 17 Jun 2001 20:57:53 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <20010614111928.A4560@ute.cnri.reston.va.us> Message-ID: [Guido] > Maybe we also have a smaller brain than the typical Lisper -- I would > say, that would make us more normal, and if Python caters to people > with a closer-to-average brain size, that would mean more people will > be able to program in Python. History will decide... [Andrew Kuchling] > I thought it already has, pretty much. OK, I've kept quiet for days, but can't bear it any longer: Andrew, are you waiting for someone to *force* you to immortalize this exchange in your Python Quotes collection? If so, the PSU knows where you liv From mal@lemburg.com Mon Jun 18 11:14:04 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 18 Jun 2001 12:14:04 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> Message-ID: <3B2DD46C.EEC20857@lemburg.com> Guido van Rossum wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +1, as long as they're not in the CVS subtree that's normally > extracted for a regular source distribution. I propose this location > in the CVS tree: > > python/dist/encodings/... > > (So 'encodings' would be a sibling of 'src', which has been pretty > lonely ever since I started using CVS. ;-) Ok. When Tamito has completed his work on the codecs (he is currently reimplementing them in C), I'll check them in under the new directory. BTW, how should we ship these codecs ? I'd propose to provide a distutils setup.py file which wraps up all codecs under encodings and can be used to create a standard Python add-on "Python-X.X Encoding Add-on". The generated files should then ideally be published right next to the Python source/binary links on the python.org web-pages to achieve high visibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Mon Jun 18 13:25:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 18 Jun 2001 08:25:35 -0400 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: Your message of "Mon, 18 Jun 2001 12:14:04 +0200." <3B2DD46C.EEC20857@lemburg.com> References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> <3B2DD46C.EEC20857@lemburg.com> Message-ID: <200106181225.IAA15518@cj20424-a.reston1.va.home.com> > Ok. When Tamito has completed his work on the codecs (he is currently > reimplementing them in C), I'll check them in under the new directory. Excellent! > BTW, how should we ship these codecs ? > > I'd propose to provide a distutils setup.py file which wraps up > all codecs under encodings and can be used to create a standard > Python add-on "Python-X.X Encoding Add-on". Sounds like a good plan. > The generated files should then ideally be published right next > to the Python source/binary links on the python.org web-pages to > achieve high visibility. Sure, for some defininition of "right next to" :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Mon Jun 18 15:35:12 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 18 Jun 2001 16:35:12 +0200 Subject: [Python-Dev] Moshe Message-ID: <20010618163512.D8098@xs4all.nl> Just FYI: Moshe has been sighted, alive and well. He's been caught up in personal matters, apparently. He apologized and said he'd mail python-dev with an update soonish. Don't-you-wish-you-lurked-on-#python-too-ly y'rs ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From m.favas@per.dem.csiro.au Mon Jun 18 22:28:23 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 05:28:23 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? Message-ID: <3B2E7277.D6109E7E@per.dem.csiro.au> [Platform: Tru64 Unix, Compaq C compiler) The current CVS of 2.2a0 fails test_struct for me with: test test_struct failed -- pack('>i', -2147483649) did not raise error more extensively, trying std iI on -2147483649 == 0xffffffff7fffffff Traceback (most recent call last): File "Lib/test/test_struct.py", line 367, in ? t.run() File "Lib/test/test_struct.py", line 353, in run self.test_one(x) File "Lib/test/test_struct.py", line 269, in test_one any_err(pack, ">" + code, x) File "Lib/test/test_struct.py", line 38, in any_err raise TestFailed, "%s%s did not raise error" % ( test_support.TestFailed: pack('>i', -2147483649) did not raise error A 64-bit platform issue? Also, the current imap.py causes "make test" (test___all__ and test_sundry) to fail with: "exceptions.TabError: inconsistent use of tabs and spaces in indentation (imaplib.py, line 576)" - untested checkin ? -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim@digicool.com Mon Jun 18 23:04:06 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 18 Jun 2001 18:04:06 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: [Mark Favas] > [Platform: Tru64 Unix, Compaq C compiler) > The current CVS of 2.2a0 fails test_struct for me with: > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > more extensively, > trying std iI on -2147483649 == 0xffffffff7fffffff > Traceback (most recent call last): > File "Lib/test/test_struct.py", line 367, in ? > t.run() > File "Lib/test/test_struct.py", line 353, in run > self.test_one(x) > File "Lib/test/test_struct.py", line 269, in test_one > any_err(pack, ">" + code, x) > File "Lib/test/test_struct.py", line 38, in any_err > raise TestFailed, "%s%s did not raise error" % ( > test_support.TestFailed: pack('>i', -2147483649) did not raise error > > A 64-bit platform issue? In test_struct.py, please change this line (right after "class IntTester"): BUGGY_RANGE_CHECK = "bBhHIL" to BUGGY_RANGE_CHECK = "bBhHiIlL" and try again. I suspect you're bumping into a pre-existing bug that simply wasn't checked before (and, yes, there's A Reason it *may* screw up on a 64-bit box but not a 32-bit one). Note that since in standard mode, "i" is considered to be a 4-byte int regardless of platform, we really *should* bitch about trying to pack -2147483649 under "i" (but we don't -- and in general no codes except the new q/Q reliably bitch about out-of-range errors in the standard modes). > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? Leaving that to some loser who cares about whitespace . From m.favas@per.dem.csiro.au Mon Jun 18 23:11:37 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 06:11:37 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? References: Message-ID: <3B2E7C99.E9BEFC3C@per.dem.csiro.au> [Tim Peters suggests] > > [Mark Favas] > > [Platform: Tru64 Unix, Compaq C compiler) > > The current CVS of 2.2a0 fails test_struct for me with: > > > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > In test_struct.py, please change this line (right after "class IntTester"): > > BUGGY_RANGE_CHECK = "bBhHIL" > > to > > BUGGY_RANGE_CHECK = "bBhHiIlL" > > and try again. Yep, passes with this change. > > Also, the current imap.py causes "make test" (test___all__ and > > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > > tabs and spaces in indentation (imaplib.py, line 576)" - untested > > checkin ? > > Leaving that to some loser who cares about whitespace . Guess we'll have to advertise widely, then . -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From barry@digicool.com Mon Jun 18 23:28:21 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 18 Jun 2001 18:28:21 -0400 Subject: [Python-Dev] Bogosities in quopri module? Message-ID: <15150.32901.611349.524220@yyz.digicool.com> I've been playing a bit with the quopri module (trying to support RFC 2047 in mimelib), and I've run across a few bogosities that I'd like to fix. Fixing some of them could break code, so I wanted to see what people think first. First, quopri should have encodestring() and decodestring() functions which take a string and return a string. This would make it more consistent API-wise with e.g. base64. One difference is that quopri.encodestring() should probably take a default argument quotetabs (defaulted to 1) for passing to the encode() function. This shouldn't be very controversial. I think there are two problems with encode(). First, it always tacks on an extra \n character, such that an encode->decode roundtrip is not idempotent. I propose fixing this so that encode() doesn't add the extra newline, but this can break code that expects that newline to be present. Third, I think that encode()'s quotetabs flag should also apply to spaces. RFC 1521 says that both ASCII tabs and spaces may be encoded, and I don't think it's worthwhile that there be a separate flag to independently choose to encode tabs or spaces. Lastly, if you buy the extra-newline solution above, then encode() has to be fixed w.r.t. trailing spaces and tabs. Currently, an encode->decode roundtrip for, e.g. "hello " returns "hello =\n", but what it should really return is "hello=20". Likewise "hello\t" should return "hello=09". The patches must take multiline strings into account though, so that it doesn't chomp newlines out of """hello great big world """ I haven't worked up a patch yet, but when I do I'll upload it to SF to get some feedback. I think there are a few other things in the module that could be cleaned up. I also plan to add a test_quopri.py. Comments? -Barry From see@my.signature Tue Jun 19 07:21:14 2001 From: see@my.signature (Greg Ewing) Date: Tue, 19 Jun 2001 18:21:14 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Something is bothering me about this. In fact, it's bothering me a LOT. In the following, will f() work as a generator-function: def f(): for i in range(5): g(i) def g(i): for j in range(10): yield i,j If I understand PEP255 correctly, this will *not* work. But it seems entirely reasonable to me that it *should* work. It *has* to work, otherwise how am I to write generators that are too complicated to fit into a single function? Someone please tell me I'm wrong about this! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From jepler@inetnebr.com Tue Jun 19 14:25:23 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Tue, 19 Jun 2001 08:25:23 -0500 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619082522.A12200@inetnebr.com> On Tue, Jun 19, 2001 at 06:21:14PM +1200, Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. But it seems entirely reasonable to me that > it *should* work. It *has* to work, otherwise how > am I to write generators that are too complicated > to fit into a single function? The following similar code seems to produce the results you have in mind. def f(): for i in range(5): #g(i) #yield g(i) for x in g(i): yield x def g(i): for j in range(10): yield i, j It would be nice to have a succinct way to say 'for dummy in iterator: yield dummy'. Maybe 'yield from iterator'? Then f would become: def f(): for i in range(5): yield from g(i) Jeff PS I noticed that the generator branch got merged into the trunk. Cool! From fdrake@acm.org Tue Jun 19 14:24:46 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 09:24:46 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 Message-ID: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> I built GCC 3.0 last night, and Python built and passed the regression tests. I've not done any further comparisons, but using --with-cxx=... failed; the C++ ABI changed and a new version of the C++ runtime is required before that will work. I didn't want to install that over my working installation, just in case. ;-) I'll report more as I find out more. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas@python.ca Tue Jun 19 15:00:39 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 07:00:39 -0700 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619070039.A13712@glacier.fnational.com> Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. No, it will not work. The title of PEP 255 is "Simple Generators". What you want will require something like stackless in order to get the C stack out of the way. That's a major change to the Python internals. To make your example work you need to do: def f(): for i in range(5): for j in g(i): yield j def g(i): for j in range(10): yield i,j Stackless may still be in Python's future but no for 2.2. Neil From barry@digicool.com Tue Jun 19 15:19:58 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 19 Jun 2001 10:19:58 -0400 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> Message-ID: <15151.24462.400930.295658@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I built GCC 3.0 last night, and Python built and passed Fred> the regression tests. Hey, you were actually able to download it!? :) I couldn't get an ftp connection for the longest time and finally gave up. It'd be interesting to see if there are any performance improvements, esp. on x86 boxen. -Barry From fdrake@acm.org Tue Jun 19 16:07:48 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 11:07:48 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.24462.400930.295658@anthem.wooz.org> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> Message-ID: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Barry A. Warsaw writes: > It'd be interesting to see if there are any performance > improvements, esp. on x86 boxen. GCC 2.95.3: cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.58 This machine benchmarks at 6329.11 pystones/second 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (280major+241minor)pagefaults 0swaps GCC 3.0: cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.65 This machine benchmarks at 6060.61 pystones/second 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (307major+239minor)pagefaults 0swaps There is a little variation with multiple run, but it varies less than 5% from the numbers above. Bumping up the LOOPS constant in pystone.py changes the numbers a small bit, but the relationship remains constant. This is one a Linux-Mandrake 7.2 installation with non-cooker updates installed, and still using the Linux 2.2 kernel: cj42289-a(.../python/linux-gcc-3.0); uname -a Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dan@cgsoftware.com Tue Jun 19 17:19:14 2001 From: dan@cgsoftware.com (Daniel Berlin) Date: 19 Jun 2001 12:19:14 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> ("Fred L. Drake, Jr."'s message of "Tue, 19 Jun 2001 11:07:48 -0400 (EDT)") References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <87vglsbfy5.fsf@cgsoftware.com> "Fred L. Drake, Jr." writes: > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. Except, I bet you didn't use one of the "optimize for a given cpu" switches. Try adding -mpentiumpro -march=pentiumpro to your compiler flags. Otherwise, it's scheduling for a 386. And the old x86 backend wasn't all that bad at scheduling for the 386. Hell, i'm not that bad at scheduling for a 386. :) --Dan > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown > > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Digital Creations > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- "If all the nations in the world are in debt, where did all the money go? "-Steven Wright From mal@lemburg.com Tue Jun 19 17:55:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 18:55:47 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <3B2F8413.77F40494@lemburg.com> "Fred L. Drake, Jr." wrote: > > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown Note that if you really want to see a speedup for x86 boxes then you should take a look at PGCC, the Pentium GCC compiler group: http://www.goof.com/pcg/ You can then adjust the compiler to various x86 CPUs and take advantage of some special optimizations they have intergrated into 2.95.2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Tue Jun 19 18:44:47 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 19 Jun 2001 12:44:47 -0500 Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <15151.36751.406758.577420@beluga.mojam.com> I am trying to add a module interface to some of the bits missing from PyGtk2. Some functions I'm interested in have varargs signatures, e.g.: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) >From Python, this would be called something like bs = gtk.GtkBindingSet("somename") bs.add_signal(gtk.GDK.K_Up, 0, "scroll_vertical", gtk.TYPE_ENUM, gtk.SCROLL_STEP_BACKWARD, gtk.TYPE_FLOAT, 0.0) with n_args inferred from the number of arguments following the "scroll_vertical" parameter. I'm a bit stumped how to handle this with the PyArg_Parse* routines. (I'll worry about calling gtk_binding_entry_add_signal after I figure out how to marshal the args.) The only place in the standard modules I saw that processed a truly arbitrary number of arguments is the struct_pack method of the struct module, and it doesn't use PyArg_Parse* to process them. Can someone point me to an example of marshalling arbitrary numbers of arguments then calling a varargs function? Thx, Skip From fdrake@acm.org Tue Jun 19 20:04:18 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 15:04:18 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <87vglsbfy5.fsf@cgsoftware.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> Message-ID: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Daniel Berlin writes: > Except, I bet you didn't use one of the "optimize for a given cpu" > switches. No, I hadn't. My main interest was in the GCC team's claim that the generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" did not make much difference at all. M.-A. Lemburg writes: > Note that if you really want to see a speedup for x86 boxes then > you should take a look at PGCC, the Pentium GCC compiler group: > > http://www.goof.com/pcg/ > > You can then adjust the compiler to various x86 CPUs and > take advantage of some special optimizations they have intergrated > into 2.95.2.1. If they have any improved optimizations for recent x86 chips, I'd like to see them folded into GCC. I'd hate to see another egcs-style split. It doesn't look like I can just download a single source package from them and wait 3 hours for it to build, so I won't plan on pursuing this further. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim@digicool.com Tue Jun 19 20:14:10 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:14:10 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] > GCC 2.95.3: > This machine benchmarks at 6329.11 pystones/second > ... > GCC 3.0: > This machine benchmarks at 6060.61 pystones/second > ... > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 > 13:16:08 CEST 2000 i686 unknown This is a good place to note that the single biggest "easy win" for pystone is to run it with -O (that is, Python's -O). Yields a 10% boost on Fred's box, and about 7% on MSVC6+Win2K. pystone is more sensitive to -O than most "real Python apps", probably because it's masses of very simple operations on scalar types -- no real classes, no dicts, no lists except to simulate fixed-size C arrays, lots of globals, and so on. The dynamic frequency of SET_LINENO is high, and the avg work per other opcode is low. OTOH, that's typical of *some* Python apps, and typical of *parts* of almost all Python apps. So it would be worth getting ridding of SET_LINENO even in non- -O runs. Note that SET_LINENO isn't needed to get correct line numbers in tracebacks (and hasn't been needed for years), it's "just" there to support tracing now. Vladimir had what looked to be a workable scheme for doing that a different way, and that would be a cool project for someone to revive (IMO -- Guido's may differ, but he's too busy to notice what we're doing ). From michel@digicool.com Tue Jun 19 20:12:14 2001 From: michel@digicool.com (Michel Pelletier) Date: Tue, 19 Jun 2001 12:12:14 -0700 (PDT) Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: On Tue, 19 Jun 2001, Mark Favas wrote: > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? I submitted a patch right on this line the other day that Guido applied, but I tested it and niether test___all__ nor test_sundry fail for me today. -Michel From mal@lemburg.com Tue Jun 19 20:28:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 21:28:14 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <3B2FA7CE.DD1602F7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Daniel Berlin writes: > > Except, I bet you didn't use one of the "optimize for a given cpu" > > switches. > > No, I hadn't. My main interest was in the GCC team's claim that the > generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" > did not make much difference at all. > > M.-A. Lemburg writes: > > Note that if you really want to see a speedup for x86 boxes then > > you should take a look at PGCC, the Pentium GCC compiler group: > > > > http://www.goof.com/pcg/ > > > > You can then adjust the compiler to various x86 CPUs and > > take advantage of some special optimizations they have intergrated > > into 2.95.2.1. > > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. > It doesn't look like I can just download a single source package > from them and wait 3 hours for it to build, so I won't plan on > pursuing this further. Oh, it's fairly easy to get a pgcc compiler: all you have to do is apply their small set of patches to the gcc source before compiling it. And then you should set your OPT environment variable to e.g. OPT="-g -O3 -Wall -Wstrict-prototypes -mcpu=k6" This will cause the pgcc compiler to use these settings in pretty much all compiles you ever do without having to think about it every time. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim@digicool.com Tue Jun 19 20:36:41 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:36:41 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: Message-ID: [Michel Pelletier] > I submitted a patch right on this line the other day that Guido applied, > but I tested it and niether test___all__ nor test_sundry fail for me > today. Not to worry! I fixed all this stuff yesterday. imaplib.py had an ambiguous mix of hard tabs and spaces, which Guido "should have" caught before checking in, and that Python itself complained about when run with -tt (which is how Mark ran the test suite). There's no problem anymore. From nas@python.ca Tue Jun 19 21:37:18 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 13:37:18 -0700 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 19, 2001 at 03:04:18PM -0400 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <20010619133718.A14814@glacier.fnational.com> Fred L. Drake, Jr. wrote: > Compiling with "make OPT='-mcpu=i686 -O3'" did not make much > difference at all. Try OPT="-m486 -O2". That gave me the best results last time I played with this stuff. > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. Some people say you should avoid PGCC since it generates buggy code. I don't know if that's true or not. Neil From thomas@xs4all.net Tue Jun 19 22:04:46 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 19 Jun 2001 23:04:46 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message-ID: <20010619230446.E8098@xs4all.nl> On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > The test used int(time.time()) to get a random number, but this doesn't > work on the mac (where times are bigger than ints). Changed to > int(time.time()%1000000). Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be degrading the sequentiality of this particularly unrandom random number on platforms where ints really are big enough to hold times :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis@informatik.hu-berlin.de Tue Jun 19 22:25:26 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 19 Jun 2001 23:25:26 +0200 (MEST) Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> > The only place in the standard modules I saw that processed a truly > arbitrary number of arguments is the struct_pack method of the > struct module, and it doesn't use PyArg_Parse* to process them. Can > someone point me to an example of marshalling arbitrary numbers of > arguments then calling a varargs function? In a true varargs function, you cannot use PyArg_Parse*. Instead, you have to iterate over the argument tuple with PyTuple_GetItem, fetching one argument after another. Another example of such a function is builtin_max. > (I'll worry about calling gtk_binding_entry_add_signal after I > figure out how to marshal the args.) I'd worry about this first: In C, it is not possible to call a true varargs function in a portable way if the caller doesn't statically (i.e. in source code) know the number of arguments. Only the callee can be variable, not the caller. A slight exception is that you are allowed to pass-through va_list objects from one function to another. However, that requires that the callee expects a va_list argument, i.e. is not a varargs function, plus there is no portable way to create a va_list object from scratch. If you absolutely need to call such a function, you can use the Cygnus libffi function, which, for a certain number of microprocessors and C ABIs, allows to call arbitrary function pointers. However, I'd rather recommend to look for alternatives to gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall accepts a GSList*, which is a chained list of arguments, instead of being varargs. This you can call in a C module - the other one is out of reach. Regards, Martin From skip@pobox.com (Skip Montanaro) Tue Jun 19 22:32:50 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 19 Jun 2001 16:32:50 -0500 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <20010619133718.A14814@glacier.fnational.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> <20010619133718.A14814@glacier.fnational.com> Message-ID: <15151.50434.297860.277726@beluga.mojam.com> Neil> Some people say you should avoid PGCC since it generates buggy Neil> code. I don't know if that's true or not. If nothing else, PGCC almost certainly gets a lot less exercise than the mainstream GCC code. Given the statement in the PGCC FAQ that typical speedups are on the range of 5%: http://www.goof.com/pcg/pgcc-faq.html#SEC0119 it doesn't seem like it would be worth the effort to use it in any critical applications. Better to just wait for PGCC optimizations to trickle into GCC itself. Skip From jack@oratrix.nl Tue Jun 19 22:56:43 2001 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 19 Jun 2001 23:56:43 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message by Thomas Wouters , Tue, 19 Jun 2001 23:04:46 +0200 , <20010619230446.E8098@xs4all.nl> Message-ID: <20010619215648.B2A7CE267B@oratrix.oratrix.nl> Recently, Thomas Wouters said: > On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > > > The test used int(time.time()) to get a random number, but this doesn't > > work on the mac (where times are bigger than ints). Changed to > > int(time.time()%1000000). > > Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be > degrading the sequentiality of this particularly unrandom random number on > platforms where ints really are big enough to hold times :) I think the last sentence should be "... platforms where time before 1970 doesn't exist so they can fit it in a measly 32 bits":-) But anyway: I haven't a clue whether the sequentiality is important, it doesn't really seem to be from a quick glance. If you want to fix it: allez votre corridor. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip@pobox.com (Skip Montanaro) Tue Jun 19 23:01:13 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 19 Jun 2001 17:01:13 -0500 Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> Message-ID: <15151.52137.623119.852524@beluga.mojam.com> >> The only place in the standard modules I saw that processed a truly >> arbitrary number of arguments is the struct_pack method of the struct >> module, and it doesn't use PyArg_Parse* to process them. Can someone >> point me to an example of marshalling arbitrary numbers of arguments >> then calling a varargs function? Martin> In a true varargs function, you cannot use PyArg_Parse*. Martin> Instead, you have to iterate over the argument tuple with Martin> PyTuple_GetItem, fetching one argument after another. I think it would be nice if PyArg_ParseTuple and friends took a "*" format character. It would only be useful at the end of a format string, but would allow the generic argument parsing machinery to be used for those arguments that precede it. The argument it writes into would be an int, which would represent the offset of the first argument not processed by PyArg_ParseTuple. Reusing my example: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) If I had a Python module wrapper function for this it might call PyArg_ParseTuple as PyArg_ParseTuple(args, "iis*", &keyval, &modifiers, &signal_name, &offset); Processing of the rest of the argument list would be the responsibility of the author and start at args[offset]. >> (I'll worry about calling gtk_binding_entry_add_signal after I figure >> out how to marshal the args.) Martin> I'd worry about this first: In C, it is not possible to call a Martin> true varargs function in a portable way if the caller doesn't Martin> statically (i.e. in source code) know the number of Martin> arguments. Only the callee can be variable, not the caller. Understood. It turns out that the function I used as an example is actually only called in a few distinct ways. I can analyze its var-arguments fairly easily and dispatch to the appropriate call to the underlying function. Martin> However, I'd rather recommend to look for alternatives to Martin> gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall Martin> accepts a GSList*, which is a chained list of arguments, instead Martin> of being varargs. This you can call in a C module - the other Martin> one is out of reach. Hmm... thanks, this does look like the correct solution. I failed to notice the distinction between the two functions when I first scanned the source code, the signall (two-els) version is never called outside of gtkbindings.c, the Gtk documentation in this area is, well, rather sparse, to say the least (nine comments over 1200 lines of code, the only two substatial ones of which are boilerplate at the top), and there is no reference manual documentation for any of the interesting functions. By comparison, the Python documentation looks as if Guido has employed a team of full-time tech writers for years. Way to go, Fred! Skip From nas@python.ca Tue Jun 19 23:12:49 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 15:12:49 -0700 Subject: [Python-Dev] OS timer and profiling Python code Message-ID: <20010619151249.A15126@glacier.fnational.com> On x86 hardware the Linux timer runs at 100 Hz by default. On modern hardware that is probably much too slow to accurately profile programs using the Python profiler. Changing the value in include/asm-i386/param.h from 100 to 1024 and recompiling the kernel made a huge difference for me. Perhaps we should include a note in the profiler documentation. I'm not sure if this affects gprof as well but I suspect it does. Neil From moshez@zadka.site.co.il Wed Jun 20 06:31:23 2001 From: moshez@zadka.site.co.il (Moshe Zadka) Date: Wed, 20 Jun 2001 08:31:23 +0300 Subject: [Python-Dev] Moshe In-Reply-To: <20010618163512.D8098@xs4all.nl> References: <20010618163512.D8098@xs4all.nl> Message-ID: On Mon, 18 Jun 2001 16:35:12 +0200, Thomas Wouters wrote: > Just FYI: Moshe has been sighted, alive and well. He's been caught up in > personal matters, apparently. He apologized and said he'd mail python-dev > with an update soonish. Yes, indeed, and soonish got sorta delayed too... Anyway, I am alive and well, and the bad guys will have to do better then 300m to get me in an explosion ;-) Anyway, I'm terribly sorry for disappearing - my personal life caught up with me and stuff. I'm now trying to catch up with everything. Thanks to whoever took 2.0.1 from where I left off and kept it going. -- "I'll be ex-DPL soon anyway so I'm |LUKE: Is Perl better than Python? looking for someplace else to grab power."|YODA: No...no... no. Quicker, -- Wichert Akkerman (on debian-private)| easier, more seductive. For public key, finger moshez@debian.org |http://www.{python,debian,gnu}.org From greg@cosc.canterbury.ac.nz Wed Jun 20 06:55:28 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 17:55:28 +1200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Tim Peters wrote: > > Who would this help? Seriously. There's nothing special about a generator > to a caller, except that it returns an object that implements the iterator > interface. What matters to the caller is irrelevant here. We're talking about what matters to someone writing or reading the implementation. To those people, there is a VERY big difference between a regular function and a generator-function -- about as big as the difference between a class and a function! In fact, a generator-function is in many ways much more like a class than a function. Calling a generator-function doesn't execute any of the code in its body; instead, it creates an instance of the generator, much like calling a class creates an instance of the class. Calling them "generator classes" and "generator instances" would perhaps be more appropriate, and more suggestive of the way they actually behave. The more I think about this, the more I agree with those who say that overloading the function-definition syntax for defining generators is a bad idea. It seems to make about as much sense as saying that there shouldn't be any special syntax for defining a class -- the header of a class definition should look exactly like a function definition, and to tell the difference you have to look for some subtle clue further down. I suggest dropping the "def" altogether and using: generator foo(args): ... yield x ... Right from the word go, this says loudly and clearly that this thing is *not* a function, it's something else. If you haven't come across generators before, you go and look in the manual to find out what it means. There you're told something like Executing a generator statement creates a special callable object called a generator. Calling a generator creates a generator-instance, which is an iterator object... [...stuff about the "yield" statement...] I think this is going to be easier to document and lead to much less confusion than trying to explain the magic going on when you call something that looks for all the world like a function and it doesn't execute any of the code in it. Explicit is better than implicit! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg@cosc.canterbury.ac.nz Wed Jun 20 07:17:09 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:17:09 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B303FE5.735A5FDC@cosc.canterbury.ac.nz> Tim Peters wrote: > > This is like saying that functions returning integers should be declared > "defint" instead, or some such gibberish. Not the same thing. If a function returns an integer, somewhere in it or in something that it calls there is a piece of code that explicitly creates an integer. But under PEP 255, there is *nothing* anywhere in the code that you can point to and say "look, here is where the generator-iterator is created!" Instead, it happens implicitly at some point just after the generator-function is called, but before any of its code is executed. You could say that the same thing is true when you call a class object -- creation of the instance happens implicitly before __init__ is called. But there is no secret made of the fact that classes are not functions, and there is nothing in the syntax to lead you to believe that they behave like functions. In contrast, the proposed generator syntax makes generators look so nearly like functions that their actual behaviour, once you get your head around it, seems quite bizarre. I just think it's going to lead to a lot of confusion and misunderstanding, among newcomers especially. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg@cosc.canterbury.ac.nz Wed Jun 20 07:28:13 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:28:13 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <3B30427D.5A90DDE7@cosc.canterbury.ac.nz> Olaf Delgado Friedrichs wrote: > > If I understand correctly, this should work: > > def f(): > for i in range(5): > for x in g(i): > yield x > > def g(i): > for j in range(10): > yield i,j Yes, I realised that shortly afterwards. But I think we're going to get a lot of questions from newcomers who have tried to implicitly nest iterators and are very confused about why it doesn't work and what needs to be done to make it work. An explicit generator definition syntax would help here, I think. First of all, it would be a syntax error to use "yield" outside of a generator definition, so they would be forced to declare the inner one as a generator. Then, if they neglected to make the outer one a generator too, it would look like this: def f(): for i in range(5): g(i) generator g(i): for j in range(10): yield i,j from which it is glaringly obvious that f() is NOT a generator, and therefore can't be used as one. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From loewis@informatik.hu-berlin.de Wed Jun 20 11:27:30 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Wed, 20 Jun 2001 12:27:30 +0200 (MEST) Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <15151.52137.623119.852524@beluga.mojam.com> (message from Skip Montanaro on Tue, 19 Jun 2001 17:01:13 -0500) References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> <15151.52137.623119.852524@beluga.mojam.com> Message-ID: <200106201027.MAA06782@pandora.informatik.hu-berlin.de> > I think it would be nice if PyArg_ParseTuple and friends took a "*" format > character. It would only be useful at the end of a format string, but would > allow the generic argument parsing machinery to be used for those arguments > that precede it. Now I understand. Yes, that would be useful, but apparently was not required often enough so far to make somebody ask for it. Regards, Martin From aahz@rahul.net Wed Jun 20 14:00:08 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 06:00:08 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> from "Greg Ewing" at Jun 20, 2001 05:55:28 PM Message-ID: <20010620130008.7880D99C88@waltz.rahul.net> Greg Ewing wrote: > > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... +2 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From nas@python.ca Wed Jun 20 15:28:20 2001 From: nas@python.ca (Neil Schemenauer) Date: Wed, 20 Jun 2001 07:28:20 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: ; from tim_one@users.sourceforge.net on Tue, Jun 19, 2001 at 11:57:34PM -0700 References: Message-ID: <20010620072820.A16584@glacier.fnational.com> Tim Peters wrote: > gen_iternext(): repair subtle refcount problem. > NeilS, please check! This came from staring at your genbug.py, but I'm > not sure it plugs all possible holes. Without this, I caught a > frameobject refcount going negative, and it was also the cause (in debug > build) of _Py_ForgetReference's attempt to forget an object with already- > NULL _ob_prev and _ob_next pointers -- although I'm still not entirely > sure how! Doesn't this cause a memory leak? f_back is INCREFed in PyFrame_New. There are other problems lurking here as well. def f(): try: yield 1 finally: print "finally" def h(): g = f() g.next() while 1: h() The above code leaks memory like mad, with or without your change. Also, the finally clause is never executed although it probably should be. My feeling is that the reference counting of f_back should be done by ceval and not by the frame object. The problem with the finally clause is another ball of wax. I think its fixable though. I'll look at it closer this evening. Neil From tim.one@home.com Wed Jun 20 15:28:19 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:28:19 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > ... Why is this on Python-Dev? The PEP announcement specifically asked for discussion to occur on the Iterators list, and specifically asked to keep it *off* of Python-Dev. I've been playing along with people who wanted to discuss it on c.l.py instead, as finite time allows, but no way does the discussion belong here. From arigo@ulb.ac.be Wed Jun 20 15:30:49 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Wed, 20 Jun 2001 16:30:49 +0200 (MET DST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: Hi, On Wed, 20 Jun 2001, Greg Ewing wrote: > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... Nice idea. We might even think about dropping the 'yield' keyword altogether and using 'return' instead (althought I'm not quite sure it is a good idea; I'm just suggesting it with a personal -0.5). A bientot, Armin. From tim.one@home.com Wed Jun 20 15:41:13 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:41:13 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: <20010620072820.A16584@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Doesn't this cause a memory leak? f_back is INCREFed in > PyFrame_New. There are other problems lurking here as well. > ... Our msgs crossed in the mail. Unfortunately, I have to get off email now and probably won't get on again before this evening. Tracebacks appear to be a potential problem too ... we'll-reinvent-stackless-before-this-is-over<0.9-wink>-ly y'rs - tim From barry@digicool.com Wed Jun 20 17:35:49 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 12:35:49 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: <15152.53477.212348.243592@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> What matters to the caller is irrelevant here. We're talking GE> about what matters to someone writing or reading the GE> implementation. To those people, there is a VERY big GE> difference between a regular function and a GE> generator-function -- about as big as the difference GE> between a class and a function! GE> In fact, a generator-function is in many ways much more GE> like a class than a function. Calling a generator-function GE> doesn't execute any of the code in its body; instead, it GE> creates an instance of the generator, much like calling GE> a class creates an instance of the class. Calling them GE> "generator classes" and "generator instances" would GE> perhaps be more appropriate, and more suggestive of the GE> way they actually behave. Thanks Greg, I think you've captured perfectly my discomfort with the proposal. I'm fine with return being "special" inside a generator, along with most of the other details of the pep. But it bugs me that the semantics of calling the thing created by `def' is different depending on some statement embedded deep in the body of the code. Think about it from a teaching perspective: You're taught that def creates a function, perhaps called foo. You know that calling foo starts execution at the first line in the function block. You know you can put a print statement on the first line and it will print something out when the function is called. You know that you can set a debugger break point at foo's first line and when you call the function, the debugger will leave you on that first line of code. But all that changes with a generator! My print statement isn't executed when I call the function... how weird! Hey, the debugger doesn't even break on the line when I call the function. Okay, maybe it's some /other/ foo my program is really calling. So let's hunt around for other possible foo's that my program might be calling. Hmm, no dice there. Now I'm really confused because I haven't gotten to the chapter that says "Now that you know all about functions, forget most of that if you find a yield statement in the body of the function, because it's a special kind of function called a generator. Calling such a special function doesn't execute any code, it just instantiates a built-in object called a generator object. To get any of the generator's code to execute, you have to call the generator object's next() method." Further, I print out the type of the object returned by calling foo and I see it's a . Okay, so now let me search foo for a return statement. Because I know about functions, and I know that the returned object isn't None, I know that the function isn't falling off the end. So there must be a return statement that explicitly returns a generator object (whatever that is). Hmm, nope, there's just a bare return sitting there. That's damn confusing. I wonder what those yield statements are doing. Well, I look those up in my book's index and I see that's described in chapter 57, which I haven't gotten to yet. Besides, those yields clearly have integers after them, so that can't be it. So how the heck do I get a generator object by calling this function??? You'll counter that the "search for yield to find out if the function is special" is a simple rule, once learned is easily remembered. I'll counter that it's harder for me to do an Isearch in XEmacs to find out what kind of thing foo is. :) To me, it's just bad mojo to have the behavior of the thing created by `def' determined by what's embedded in the body of the program. I don't buy the defint argument, because by searching for a return statement in the function, you can find out exactly what is being returned when the function is called. Not so with a generator. My vote is for a "generator" keyword to introduce the code block of a generator. Makes perfect sense to me, and it will be a strong indication to anybody reading my code that something special is going on. And something special /is/ going on! An informal poll of PythonLabs indicates a split on this subject, perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But who said this was a democracy anyway? :) somewhat-like-my-own-country-of-origin-ly y'rs, -Barry From tim@digicool.com Wed Jun 20 17:42:00 2001 From: tim@digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 12:42:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. From fredrik@pythonware.com Wed Jun 20 17:54:22 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 20 Jun 2001 18:54:22 +0200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> <15152.53477.212348.243592@anthem.wooz.org> Message-ID: <006d01c0f9a9$a879fcd0$4ffa42d5@hagrid> barry wrote: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! agreed. +1 on generator instead of def. (and +0 on suspend instead of yield, but that's me) Cheers /F From jeremy@alum.mit.edu Wed Jun 20 18:25:05 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 13:25:05 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: Why can't we discuss Python development on python-dev? please-take-replies-to-python-dev-meta-ly y'rs, Jeremy -----Original Message----- From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On Behalf Of Tim Peters Sent: Wednesday, June 20, 2001 12:42 PM To: Barry A. Warsaw Cc: python-dev@python.org Subject: RE: [Python-Dev] Suggested amendment to PEP 255 Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev From tim@digicool.com Wed Jun 20 19:28:17 2001 From: tim@digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 14:28:17 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: [Jeremy Hylton] > Why can't we discuss Python development on python-dev? You can, but without me in this case. The arguments aren't new (they were discussed on the Iterators list before the PEP was posted), and I don't have time to repeat them on (now three) different forums. The PEP announcement clearly said discussion belonged on the Iterators list, specifically asked that it stay off of Python-Dev, and the PEP Discussion-To field (which I assume Barry filled in -- I did not) reads Discussion-To: python-iterators@lists.sourceforge.net If you want a coherent historic record (I do), that's where this belongs. From aahz@rahul.net Wed Jun 20 19:37:49 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 11:37:49 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: from "Jeremy Hylton" at Jun 20, 2001 01:25:05 PM Message-ID: <20010620183749.B419E99C82@waltz.rahul.net> Jeremy Hylton wrote: > > Why can't we discuss Python development on python-dev? I'm split on this issue. I understand why Tim wants to have the discussion corralled into a single place; it's also a moderate inconvenience to have to add another mailing list every time a "critical" issue comes up. I think the best compromise is to follow the rules currently in existence for the PEP process, and if one doesn't wish to subscribe to another mailing list, e-mail one's feedback to the PEP author directly and raise bloody hell if the next PEP revision doesn't include a mention of the feedback. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry@digicool.com Wed Jun 20 20:07:00 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 15:07:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <15152.62548.504923.152041@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> and the PEP Discussion-To field (which I assume Barry filled TP> in -- I did not) reads Not me. I believe it was in Magnus's original version of the PEP. But I do think that now that the code is in the main CVS trunk, it is appropriate to remove the Discussion-To: header and redirect comments back to python-dev. That may be difficult in practice however. -Barry From jack@oratrix.nl Wed Jun 20 22:52:16 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 20 Jun 2001 23:52:16 +0200 Subject: [Python-Dev] _PyTrace_init declaration Message-ID: <20010620215221.1697FE267B@oratrix.oratrix.nl> I'm getting "no prototype" warnings on _PyTrace_init, and inspection shows that this routine indeed doesn't show up in an include file. As it is used elsewhere (in sysmodule.c) shouldn't it be called PyTrace_init and have it's prototype declared somewhere? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tim.one@home.com Wed Jun 20 23:31:10 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 20 Jun 2001 18:31:10 -0400 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: [Jack Jansen] > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? It should indeed be declared in ceval.h (Fred?), but so long as it's part of the private API it should not lose the leading underscore. From thomas@xs4all.net Wed Jun 20 23:29:51 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 21 Jun 2001 00:29:51 +0200 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <20010621002951.H8098@xs4all.nl> On Wed, Jun 20, 2001 at 11:52:16PM +0200, Jack Jansen wrote: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No, and yes. the _Py* functions are internal, but non-static (used in other files.) They should have a prototype declared somewhere, but they shouldn't be used outside of Python itself. It shouldn't be named 'PyTrace_init' unless it is a supported part of the API. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg@cosc.canterbury.ac.nz Thu Jun 21 00:39:17 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 21 Jun 2001 11:39:17 +1200 (NZST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: <200106202339.LAA04351@s454.cosc.canterbury.ac.nz> > The PEP announcement specifically asked for > discussion to occur on the Iterators list Sorry, I missed that - I was paying more attention to the PEP itself than what the announcement said. Going now to subscribe to the iterators list forthwith. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jeremy@alum.mit.edu Thu Jun 21 00:47:28 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 19:47:28 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! > > An informal poll of PythonLabs indicates a split on this subject, > perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But > who said this was a democracy anyway? :) > > somewhat-like-my-own-country-of-origin-ly y'rs, > -Barry That's a nice analogy, Ruth Barry Ginsburg; a Supreme Court, which appoints the president, seems a closer fit to Python's dictatorship than some sort of democratic process. I wasn't present for the oral arguments, but I'm sure we all know how Tim Scalia voted and that Guido van Clarence Thomas agreed without comment. I assume, then, that Anthony Kennedy Jr. joined you, although he's often a swing vote, too. Can't wait to hear the report from Nina "Michael Hudson" Totenberg. I was originally happy with the use of def. It's not much of a stretch since the def statement defines a code block that has formal parameters and creates a new scope. I certainly wouldn't be upset if Python ended up using def to define a generator. I appreciate, though, that the definition of a generator may look an awful lot like a function. I can imagine a user reading a module, missing the yield statement, and trying to use the generator as a function. I can't imagine this would happen often. My limited experience with CLU suggests that iterators aren't going to be huge, unwieldy blocks where it's hard to see what the ultimate control flow is. If a confused user treats a generator as a regular function, he or she certainly can't expect it to return anything useful, since all the return statements are bare returns; the expected behavior would be some side-effect on global state, which seems both unlikely and unseemly for an iterator. I'm not sure how hard it will be to explain generators to new users. I expect you would teach functions and iterations via for loop, then explain that there is a special kind of function called a generator that can be used in a for loop. It uses a yield statement instead of a return statement to return values. Not all that hard. If we use a different keyword to introduce them, you'd probably explain them much the same way: A generator is a special kind of function that can be used in a for loop and is defined with generator instead of def. As other people have mentioned, Icon doesn't use special syntax to introduce generators. We might as well look at CLU, too, where a different approach. You can view the CLU Reference Manual at: http://ncstrl.mit.edu/Dienst/UI/2.0/Describe/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR -225 It uses "proc" to introduce a procedure and "iter" to introduce an iterator. See page 72 for the details: http://ncstrl.mit.edu/Dienst/UI/2.0/Page/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR-225 /72 It's a toss up, then between the historical antecedents Icon and CLU. I'd tend to favor a new keyword for generators, but could be talked out of that position. Jeremy From fdrake@acm.org Thu Jun 21 00:57:57 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 20 Jun 2001 19:57:57 -0400 (EDT) Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <15153.14469.903865.533713@cj42289-a.reston1.va.home.com> Jack Jansen writes: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No. I thought I had a prototype for it just above the usage. Any, I'm re-working that code this week, so you can assign this to me in the bug tracker. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido@digicool.com Thu Jun 21 15:32:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 21 Jun 2001 10:32:40 -0400 Subject: [Python-Dev] PEP 255 - BDFL Pronouncement: 'def' it stays Message-ID: <200106211432.f5LEWeA03163@odiug.digicool.com> I've thought long and hard and tried to read almost all the mail on this topic, and I cannot get myself to change my mind. No argument on either side is totally convincing, so I have consulted my language designer's intuition. It tells me that the syntax proposed in the PEP is exactly right - not too hot, not too cold. But, like the Oracle at Delphi in Greek mythology, it doesn't tell me why, so I don't have a rebuttal for the arguments against the PEP syntax. The best I can come up with (apart from agreeing with the rebuttals that Tim and others have already made) is "FUD". If this had been part of the language from day one, I very much doubt it would have made Andrew Kuchling's "Python Warts" page. So I propose that Tim and others defending 'def' save their remaining breath, and I propose that Paul and others in favor of 'gen[erator]' start diverting their energy towards thinking about how to best teach generators the PEP syntax. Tim, please add a BDFL pronouncement to the PEP to end the argument. You can also summarize the arguments on either side, for posterity -- without trying to counter them. I found one useful comment on the PEP that isn't addressed and is orthogonal to the whole discussion: try/finally. When you have a try/finally around a yield statement, it is possible that the finally clause is not executed at all when the iterator is never resumed. I find this disturbing, and am tempted to propose that yield inside try/finally be disallowed (but yield inside try/except is still allowed). Another idea might be to somehow continue the frame with an exception at this point -- but I don't have a clue what exception would be appropriate (StopIteration isn't because it goes in the other direction) and I don't know what to do if the generator catches exception and tries to yield again (maybe the exception should be raised again?). The continued execution of the frame would be part of the destructor for the generator-iterator object, so, like a __del__ method, any unhandled exceptions wouldn't be able to propagate out of it. PS I lost my personal archive of the last 18 hours of the iter mailing list, and the web archive is down, alas, so I'm writing this from memory. I *did* read most of the messages in my archive before I accidentally deleted it, though. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tdickenson@geminidataloggers.com Thu Jun 21 16:02:54 2001 From: tdickenson@geminidataloggers.com (Toby Dickenson) Date: Thu, 21 Jun 2001 16:02:54 +0100 Subject: [Python-Dev] Re: [Python-iterators] PEP 255 - BDFL Pronouncement: 'def' it stays In-Reply-To: <200106211432.f5LEWeA03163@odiug.digicool.com> References: <200106211432.f5LEWeA03163@odiug.digicool.com> Message-ID: On Thu, 21 Jun 2001 10:32:40 -0400, Guido van Rossum wrote: > Another idea might be to somehow continue the frame with an >exception at this point -- but I don't have a clue what exception >would be appropriate (StopIteration isn't because it goes in the other >direction)=20 Im sure any exception is appropriate there. What about restarting the frame as if the 'yield' had been followed a 'return'? Toby Dickenson tdickenson@geminidataloggers.com From mwh@python.net Fri Jun 22 00:20:17 2001 From: mwh@python.net (Michael Hudson) Date: Fri, 22 Jun 2001 00:20:17 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-06-07 - 2001-06-21 Message-ID: This is a summary of traffic on the python-dev mailing list between June 7 and June 21 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the tenth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 192 | [|] | [|] 30 | [|] | [|] | [|] | [|] | [|] | [|] [|] 20 | [|] [|] | [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-019-014-001-003-014-039-026-013-009-004-001-005-023-021 Thu 07| Sat 09| Mon 11| Wed 13| Fri 15| Sun 17| Tue 19| Fri 08 Sun 10 Tue 12 Thu 14 Sat 16 Mon 18 Wed 20 Quiet fortnight. * Adding .decode() method to Unicode * Marc-Andre Lemburg asked for opinions on adding a .decode method to unicode objects: He certainly got them; the responses ranged from neutral to negative, and there was a surprising amount of hostility in the air. The problem (as ever in these matters) seems to be that Python currently uses the same type for 8-bit strings and gobs of arbitrary data. Guido came to the rescue and calmed everyone down: since when discussion has vanished again. * Adding Asian codecs to the core * Marc-Andre Lemburg announced that Tamito KAJIYAMA has decided to relicense his Japanese codecs with a BSD-style license, enabling them to be included in the core: This is clearly a good thing; the only quibble is that the encodings are by their nature rather large, so they will probably go into a separate directory in CVS (probably python/dist/encodings/) and not go into the source tarball released on python.org. * Omit printing newline after newline * As readers of comp.lang.python will have noticed, Guido posted: and retracted: PEP 259, a proposal for changing the behaviour of the print statement. * sre "improvements" * Gustavo Niemeyer asked if anyone planned to add the "(?(1)blah)" re operators to Python: but Python is not perl and there wasn't much support for making regular expressions more baffling than they already are. * Generators * In a discussion that slobbered across comp.lang.python, python-dev and the python-iterators list at sf (and belongs on the latter!) there was much talk of PEP 255, Simple Generators. Most was positive; the main dissent was from people that thought it was too hard to tell a generator from a regular function (at the source level). However Guido listened to Tim's repeated claims that this is insignificant once you've actually used generators once or twice and Pronounced "'def' it is": and noticed that there are still some issues wrt try/finally blocks. However, clever people seem to be thinking about it, so I'm sure the problem's days are numbered :-) I should also note that the gen-branch has been checked into the trunk of CVS. Woohoo! Cheers, M. From arigo@ulb.ac.be Fri Jun 22 12:00:34 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Fri, 22 Jun 2001 13:00:34 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: Hello everybody, I implemented a proof-of-concept version of a "Python compiler". It is not really a compiler. I know perfectly well that you cannot compile Python into something more efficient than a bunch of calls to PyObject_xxx. Still, this very preliminary version runs the following function twice as fast as the python interpreter: def f(n): result = 0 i = 0 while i; from arigo@ulb.ac.be on Fri, Jun 22, 2001 at 01:00:34PM +0200 References: Message-ID: <20010622071846.A7014@craie.housenet> On Fri, Jun 22, 2001 at 01:00:34PM +0200, Armin Rigo wrote: > Hello everybody, > > I implemented a proof-of-concept version of a "Python compiler". It is not > really a compiler. I know perfectly well that you cannot compile Python > into something more efficient than a bunch of calls to PyObject_xxx. > Still, this very preliminary version runs the following function twice as > fast as the python interpreter: I've implemented something similar, but didn't get such favorable results yet. I was concentrating more on implementing a type system and code to infer type information, and had spent less time on the code generation. (For instance, my system could determine the result type of subscript-type operations, and infer the types of lists over a loop, as in: l1 = [1,3.14159, "tubers"] l2 = [0]*3 for j in range(3): l2[j] = l1[j-3] # Type of l2 is HeterogeneousListType([IntType, FloatType, # StringType]) You could make it run forever on a pathological case like l = [] while 1: l = [l] with the fix being to "give up" after some number of iterations, and declare the unstable object (l) as having type "ObjectType", which is always correct but overbroad. My code is still available, but my motivation has faded somewhat and I haven't had the time to work on it recently in any case. It uses "GNU Lightning" for JIT code generation, rather than using an external compiler. (If I were to approach the problem again, I might discard the JIT code generator in favor of starting over again with the python2c compiler and adding type information) It can make judgements about sequences of calls, such as def f(): return g() when g is given the "solid" attribute, and the compilation process begins by hoisting the former global load of g into a constant load, something like def make_f(): local_g = g def f(): return local_g() return f f = make_f() What are you using to generate code? How would you compare the sophistication of your type inference system to the one I've outlined above? Jeff From Greg.Wilson@baltimore.com Fri Jun 22 13:34:17 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Fri, 22 Jun 2001 08:34:17 -0400 Subject: [Python-Dev] ...und zen, ze world! Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> >From David Wheeler's paper estimating the size of stuff in the Red Hat 7.1 distribution: http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html Language SLOC (%) C 21461450 (71.18%) C++ 4575907 (15.18%) Shell (Bourne-like) 793238 ( 2.63%) Lisp 722430 ( 2.40%) Assembly 565536 ( 1.88%) Perl 562900 ( 1.87%) Fortran 493297 ( 1.64%) Python 285050 ( 0.95%) Tcl 213014 ( 0.71%) Java 147285 ( 0.49%) yacc/bison 122325 ( 0.41%) Expect 103701 ( 0.34%) lex/flex 41967 ( 0.14%) awk/gawk 17431 ( 0.06%) Objective-C 14645 ( 0.05%) Ada 13200 ( 0.04%) C shell 10753 ( 0.04%) Pascal 4045 ( 0.01%) sed 3940 ( 0.01%) Interesting that there's as much Perl as assembly code, and more Fortran than Python :-). Thanks, Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From Samuele Pedroni Fri Jun 22 13:59:40 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 22 Jun 2001 14:59:40 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221259.OAA02519@core.inf.ethz.ch> Hi. Just after reading the README, it's very intriguing and interesting, (if I remember well this resemble the customization approach of the Self VM compiler) ideally it could evolve in a loadable extension, that then works together with the normal interp (unchanged up to offering some hooks*) in a trasparent way for the user ... emitting native code for the major platforms or just specialized bytecodes. I will give a serious look at it. regards, Samuele Pedroni. *: some possible useful hooks would be: - minimal profiling support in order to specialize only things called often - feedback for dynamic changing of methods, class hierarchy, ... if we want to optimize method lookup (which would make sense) - a mixed fixed slots/dict layout for instances. From nas@python.ca Fri Jun 22 15:43:17 2001 From: nas@python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 07:43:17 -0700 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <20010622074317.A22058@glacier.fnational.com> Is "raise StopIteration" an abuse of exceptions? Why can we not use "return StopIteration" to signal the end of an iterator? I've done a bit of hacking and the idea seems to work. On possible problem is that the StopIteration object in the builtin module could cause some confusing behavior. For example the code: for obj in __builtin__.__dict__.values(): print obj would not work as expected. This could be fixed in most causes by changing the tp_iternext protocol. Something like: int tp_iternext(PyObject *it, PyObject **item) were the return value is 1, 0, or -1. IOW, StopIteration would not have to come into the protocol if the object implemented tp_iternext. Neil From guido@digicool.com Fri Jun 22 17:19:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:19:34 -0400 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <200106221619.f5MGJY306866@odiug.digicool.com> This is treated extensively in the discussion section of the iterators-PEP; quoting: - It has been questioned whether an exception to signal the end of the iteration isn't too expensive. Several alternatives for the StopIteration exception have been proposed: a special value End to signal the end, a function end() to test whether the iterator is finished, even reusing the IndexError exception. - A special value has the problem that if a sequence ever contains that special value, a loop over that sequence will end prematurely without any warning. If the experience with null-terminated C strings hasn't taught us the problems this can cause, imagine the trouble a Python introspection tool would have iterating over a list of all built-in names, assuming that the special End value was a built-in name! - Calling an end() function would require two calls per iteration. Two calls is much more expensive than one call plus a test for an exception. Especially the time-critical for loop can test very cheaply for an exception. - Reusing IndexError can cause confusion because it can be a genuine error, which would be masked by ending the loop prematurely. I'm not sure why you are reopening this -- special terminating values are evil IMO. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri Jun 22 17:20:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:20:43 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221620.f5MGKib06875@odiug.digicool.com> Very cool, Armin! Did you announce this on c.l.py too? I wish I had time to look at this in more detail -- but please do go on developing it, and look at what others have tried... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Fri Jun 22 17:30:44 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 22 Jun 2001 12:30:44 -0400 Subject: [Python-Dev] why not "return StopIteration"? References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <15155.29364.416545.301534@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: | - Calling an end() function would require two calls per | iteration. Two calls is much more expensive than one call | plus a test for an exception. Especially the time-critical | for loop can test very cheaply for an exception. Plus, if the exception is both raised and caught in C, it is never instantiated, so exception matching is a pointer compare. I know this isn't the case with user defined iterators (since Python's raise semantics is to instantiate the exception), but it helps. -Barry From guido@digicool.com Fri Jun 22 18:12:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 13:12:20 -0400 Subject: [Python-Dev] Python 2.0.1 released! Message-ID: <200106221712.f5MHCLF07192@odiug.digicool.com> I'm happy to announce Python 2.0.1 -- the final release of the first Python version in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Compared to the release candidate, we've fixed a few typos in the license, tweaked the documentation a bit, and fixed an indentation error in statcache.py; other than that, the release candidate was perfect. :-) Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=40616 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri Jun 22 18:21:03 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 22 Jun 2001 13:21:03 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622074317.A22058@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Is "raise StopIteration" an abuse of exceptions? I only care whether it works . It certainly came as a surprise to me, though, that I'm going to need to fiddle PEP 255 to explain that return in a generator isn't really equivalent to raise StopIteration (because a return in the try-part of a try/except should not trigger the except-part if the generator is pumped again). While a minor wart, it's a wart. If this stands, I'm going to look into changing gen_iternext() to determine whether eval_frame() finished by raising StopIteration, and mark the iterator as done if so. That is, force "return" and "raise StopIteration" to act the same inside generators, and to force "raise StopIteration" inside a generator to truly *mean* "I'm done" in all cases. This would also allow to avoid the proposed special-casing of generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since it's a problem unique to generators, this simply should not be eval_frame's problem to solve -- if generators create the problem, generators should pay to solve it). > Why can we not use "return StopIteration" to signal the end of an > iterator? Just explained why not yesterday, and you did two sentences later . > .... > This could be fixed in most causes by changing the tp_iternext > protocol. Something like: > > int tp_iternext(PyObject *it, PyObject **item) > > were the return value is 1, 0, or -1. Meaning 13, 42, and 666 respectively ? That is, one for "error", one for "OK, and item is the next value", and one for "no error but no next value either -- this iterator terminated normally"? That could work. At one point during the development of the iterator PEP, Guido had some code like that in the internals, on *top* of the exception business. It was clumsy then because redundant. At the level of Python code, how would a user spell "end of iteration"? Would iterators need to return a 2-two tuple in all non-exception cases then, e.g. a (next_value, i_am_done_flag) pair? Or would Python-level iterators simply be unable to return StopIteration as a normal value? > IOW, StopIteration would not have to come into the protocol if the > object implemented tp_iternext. All iterable objects in 2.2 implement tp_iternext, although sometimes it's a Miranda tp_iternext (i.e., one created for an object that doesn't supply its own), so that shouldn't be a worry. All in all, I'm -0 on changing the exception approach -- it's worked very well so far. From thomas@xs4all.net Fri Jun 22 19:02:59 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 22 Jun 2001 20:02:59 +0200 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: References: Message-ID: <20010622200259.N8098@xs4all.nl> On Fri, Jun 22, 2001 at 01:21:03PM -0400, Tim Peters wrote: > If this stands, I'm going to look into > changing gen_iternext() to determine whether eval_frame() finished by > raising StopIteration, and mark the iterator as done if so. That is, force > "return" and "raise StopIteration" to act the same inside generators, and to > force "raise StopIteration" inside a generator to truly *mean* "I'm done" in > all cases. This would also allow to avoid the proposed special-casing of > generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since > it's a problem unique to generators, this simply should not be eval_frame's > problem to solve -- if generators create the problem, generators should pay > to solve it). I don't get this. Currently, (unless Just checked in his patch) generators work in exactly that way: the compiler compiles 'return' into 'raise StopIteration' if it encounters it inside a generator, and into a regular return otherwise. Why would you ask for the patch Just provided, and then change it back ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Fri Jun 22 19:11:13 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 22 Jun 2001 14:11:13 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622200259.N8098@xs4all.nl> Message-ID: [Thomas Wouters] > I don't get this. Currently, (unless Just checked in his patch) > generators work in exactly that way: the compiler compiles 'return' > into 'raise StopIteration' if it encounters it inside a generator, > and into a regular return otherwise. Yes. The part about analyzing the return value inside gen_iternext() would be the only change from the status quo. > Why would you ask for the patch Just provided, and then change it back ? I wouldn't. I asked *you* for a patch (which I haven't yet applied, but will) in a different area, but Just's patch was his own initiative. I hesitated on that one for reasons beyond just lack of time to get to it, and I'm still reluctant to accept it. My msg sketched an alternative to that patch. Note that Just has also (very recently) sketched another alternative, but on the Iterators list instead. just-isn't-in-need-of-defense-because-he-isn't-being-abused-ly y'rs - tim From fdrake@beowolf.digicool.com Fri Jun 22 19:31:44 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:31:44 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010622183144.C6A5428927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. From paulp@ActiveState.com Fri Jun 22 19:37:01 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 22 Jun 2001 11:37:01 -0700 Subject: [Python-Dev] ...und zen, ze world! References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> Message-ID: <3B33904D.F821FE36@ActiveState.com> > > Interesting that there's as much Perl as assembly code, > and more Fortran than Python :-). The Fortran is basically one big package: LAPACK. A bunch of the Python is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd improve our numbers quite a bit. :) -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From esr@thyrsus.com Fri Jun 22 19:46:11 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 22 Jun 2001 14:46:11 -0400 Subject: [Python-Dev] ...und zen, ze world! In-Reply-To: <3B33904D.F821FE36@ActiveState.com>; from paulp@ActiveState.com on Fri, Jun 22, 2001 at 11:37:01AM -0700 References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> <3B33904D.F821FE36@ActiveState.com> Message-ID: <20010622144611.A15388@thyrsus.com> Paul Prescod : > > Interesting that there's as much Perl as assembly code, > > and more Fortran than Python :-). > > The Fortran is basically one big package: LAPACK. A bunch of the Python > is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd > improve our numbers quite a bit. :) I'm working on it. -- Eric S. Raymond The whole of the Bill [of Rights] is a declaration of the right of the people at large or considered as individuals... It establishes some rights of the individual as unalienable and which consequently, no majority has a right to deprive them of. -- Albert Gallatin, Oct 7 1789 From fdrake@beowolf.digicool.com Fri Jun 22 19:53:37 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:53:37 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622185337.BE51228927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. This version includes some contributed changes to the documentation for the cmath module. To make the LaTeX to HTML conversion work, I have made the resulting HTML contain entity references for the "plus/minus" and "infinity" symbols (± and ∞, respectively). These may be problematic for some browsers. Please let me know how it looks on your browser by sending an email to python-docs@python.org. Be sure to state your browser name and version, and what operating system you are using. Thanks! http://python.sourceforge.net/devel-docs/lib/module-cmath.html From nas@python.ca Fri Jun 22 21:13:14 2001 From: nas@python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 13:13:14 -0700 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <200106221619.f5MGJY306866@odiug.digicool.com>; from guido@digicool.com on Fri, Jun 22, 2001 at 12:19:34PM -0400 References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <20010622131314.A22978@glacier.fnational.com> Guido van Rossum wrote: > This is treated extensively in the discussion section of the > iterators-PEP Ah. I don't remember reading that part or seeing the discussion. Sorry I brought it up. Neil From fdrake@beowolf.digicool.com Fri Jun 22 21:52:48 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 16:52:48 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622205248.6290128927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Changed the revised cmath documentation to use "j" as a suffix for complex literals instead of using "i" as a prefix; this is more similar to Python. Changed the font of the suffix to match that used elsewhere in the documentation. This should be a little more readable, but does not change any potential browser compatibility issues, so I still need reports of compatibility or non-compatibility. See my prelimiary report on the topic at: http://mail.python.org/pipermail/doc-sig/2001-June/001940.html From arigo@ulb.ac.be Sat Jun 23 09:13:04 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Sat, 23 Jun 2001 10:13:04 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <20010622071846.A7014@craie.housenet> Message-ID: Hello Jeff, On Fri, 22 Jun 2001, Jeff Epler wrote: > What are you using to generate code? I am generating pseudo-code, which is interpreted by a C module. (With real assembler code, it would of course be much faster, but it was just simpler for the moment.) > How would you compare the > sophistication of your type inference system to the one I've outlined > above? Yours is much more complete, but runs statically. Mine works at run-time. As explained in detail in the readme file, my plan is not to make a "compiler" in the usual sense. I actually have no type inferences; I just collect at run time what types are used at what places, and generate (and possibly modify) the generated code according to that information. (More about it later.) A bientot, Armin. From tim.one@home.com Sat Jun 23 10:17:54 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 23 Jun 2001 05:17:54 -0400 Subject: [Python-Dev] PEP 255: Simple Generators, Revised Posting In-Reply-To: Message-ID: Major revision: more details about exceptions, return vs StopIteration, and interactions with try/except/finally; more Q&A; and a BDFL Pronouncement. The reference implementation appears solid and works as described here in all respects, so I expect this will be the last major revision (and so also last full posting) of this PEP. The output below is in ndiff format (see Tools/scripts/ndiff.py in your Python distribution). Just the new text can be seen in HTML form here: http://python.sf.net/peps/pep-0255.html "Feature discussions" should take place primarily on the Python Iterators list: mailto:python-iterators@lists.sourceforge.net Implementation discussions may wander in and out of Python-Dev too. PEP: 255 Title: Simple Generators - Version: $Revision: 1.3 $ ? ^ + Version: $Revision: 1.12 $ ? ^^ Author: nas@python.ca (Neil Schemenauer), tim.one@home.com (Tim Peters), magnus@hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators@lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 - Post-History: 14-Jun-2001 + Post-History: 14-Jun-2001, 23-Jun-2001 ? +++++++++++++ Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. - Specification + Specification: Yield ? ++++++++ A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase - this in. [XXX spell this out] + this in. [XXX spell this out -- but new keywords have ripple effects + across tools too, and it's not clear this can be forced into the future + framework at all -- it's not even clear that Python's parser alone can + be taught to swing both ways based on a future stmt] The yield statement may only be used inside functions. A function that - contains a yield statement is called a generator function. + contains a yield statement is called a generator function. A generator ? +++++++++++++ + function is an ordinary function object in all respects, but has the + new CO_GENERATOR flag set in the code object's co_flags member. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. + Restriction: A yield statement is not allowed in the try clause of a + try/finally construct. The difficulty is that there's no guarantee + the generator will ever be resumed, hence no guarantee that the finally + block will ever get executed; that's too much a violation of finally's + purpose to bear. + + + Specification: Return + A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). - When a return statement is encountered, nothing is returned, but a + When a return statement is encountered, control proceeds as in any + function return, executing the appropriate finally clauses (if any - StopIteration exception is raised, signalling that the iterator is ? ------------ + exist). Then a StopIteration exception is raised, signalling that the ? ++++++++++++++++ - exhausted. The same is true if control flows off the end of the + iterator is exhausted. A StopIteration exception is also raised if + control flows off the end of the generator without an explict return. + - function. Note that return means "I'm done, and have nothing ? ----------- + Note that return means "I'm done, and have nothing interesting to ? +++++++++++++++ - interesting to return", for both generator functions and non-generator ? --------------- + return", for both generator functions and non-generator functions. ? +++++++++++ - functions. + + Note that return isn't always equivalent to raising StopIteration: the + difference lies in how enclosing try/except constructs are treated. + For example, + + >>> def f1(): + ... try: + ... return + ... except: + ... yield 1 + >>> print list(f1()) + [] + + because, as in any function, return simply exits, but + + >>> def f2(): + ... try: + ... raise StopIteration + ... except: + ... yield 42 + >>> print list(f2()) + [42] + + because StopIteration is captured by a bare "except", as is any + exception. + + + Specification: Generators and Exception Propagation + + If an unhandled exception-- including, but not limited to, + StopIteration --is raised by, or passes through, a generator function, + then the exception is passed on to the caller in the usual way, and + subsequent attempts to resume the generator function raise + StopIteration. In other words, an unhandled exception terminates a + generator's useful life. + + Example (not idiomatic but to illustrate the point): + + >>> def f(): + ... return 1/0 + >>> def g(): + ... yield f() # the zero division exception propagates + ... yield 42 # and we'll never get here + >>> k = g() + >>> k.next() + Traceback (most recent call last): + File "", line 1, in ? + File "", line 2, in g + File "", line 2, in f + ZeroDivisionError: integer division or modulo by zero + >>> k.next() # and the generator cannot be resumed + Traceback (most recent call last): + File "", line 1, in ? + StopIteration + >>> + + + Specification: Try/Except/Finally + + As noted earlier, yield is not allowed in the try clause of a try/ + finally construct. A consequence is that generators should allocate + critical resources with great care. There is no restriction on yield + otherwise appearing in finally clauses, except clauses, or in the try + clause of a try/except construct: + + >>> def f(): + ... try: + ... yield 1 + ... try: + ... yield 2 + ... 1/0 + ... yield 3 # never get here + ... except ZeroDivisionError: + ... yield 4 + ... yield 5 + ... raise + ... except: + ... yield 6 + ... yield 7 # the "raise" above stops this + ... except: + ... yield 8 + ... yield 9 + ... try: + ... x = 12 + ... finally: + ... yield 10 + ... yield 11 + >>> print list(f()) + [1, 2, 4, 5, 8, 9, 10, 11] + >>> Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print + Both output blocks display: + + A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + Q & A + Q. Why not a new keyword instead of reusing "def"? + + A. See BDFL Pronouncements section below. + - Q. Why a new keyword? Why not a builtin function instead? + Q. Why a new keyword for "yield"? Why not a builtin function instead? ? ++++++++++++ A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new - keyword makes that easy. + keyword makes that easy. The CPython referrence implementation also + exploits it heavily, to detect which functions *are* generator- + functions (although a new keyword in place of "def" would solve that + for CPython -- but people asking the "why a new keyword?" question + don't want any new keyword). + + Q: Then why not some other special syntax without a new keyword? For + example, one of these instead of "yield 3": + + return 3 and continue + return and continue 3 + return generating 3 + continue return 3 + return >> , 3 + from generator return 3 + return >> 3 + return << 3 + >> 3 + << 3 + + A: Did I miss one ? Out of hundreds of messages, I counted two + suggesting such an alternative, and extracted the above from them. + It would be nice not to need a new keyword, but nicer to make yield + very clear -- I don't want to have to *deduce* that a yield is + occurring from making sense of a previously senseless sequence of + keywords or operators. Still, if this attracts enough interest, + proponents should settle on a single consensus suggestion, and Guido + will Pronounce on it. + + Q. Why allow "return" at all? Why not force termination to be spelled + "raise StopIteration"? + + A. The mechanics of StopIteration are low-level details, much like the + mechanics of IndexError in Python 2.1: the implementation needs to + do *something* well-defined under the covers, and Python exposes + these mechanisms for advanced users. That's not an argument for + forcing everyone to work at that level, though. "return" means "I'm + done" in any kind of function, and that's easy to explain and to use. + Note that "return" isn't always equivalent to "raise StopIteration" + in try/except construct, either (see the "Specification: Return" + section). + + Q. Then why not allow an expression on "return" too? + + A. Perhaps we will someday. In Icon, "return expr" means both "I'm + done", and "but I have one final useful value to return too, and + this is it". At the start, and in the absence of compelling uses + for "return expr", it's simply cleaner to use "yield" exclusively + for delivering values. + + + BDFL Pronouncements + + Issue: Introduce another new keyword (say, "gen" or "generator") in + place of "def", or otherwise alter the syntax, to distinguish + generator-functions from non-generator functions. + + Con: In practice (how you think about them), generators *are* + functions, but with the twist that they're resumable. The mechanics of + how they're set up is a comparatively minor technical issue, and + introducing a new keyword would unhelpfully overemphasize the + mechanics of how generators get started (a vital but tiny part of a + generator's life). + + Pro: In reality (how you think about them), generator-functions are + actually factory functions that produce generator-iterators as if by + magic. In this respect they're radically different from non-generator + functions, acting more like a constructor than a function, so reusing + "def" is at best confusing. A "yield" statement buried in the body is + not enough warning that the semantics are so different. + + BDFL: "def" it stays. No argument on either side is totally + convincing, so I have consulted my language designer's intuition. It + tells me that the syntax proposed in the PEP is exactly right - not too + hot, not too cold. But, like the Oracle at Delphi in Greek mythology, + it doesn't tell me why, so I don't have a rebuttal for the arguments + against the PEP syntax. The best I can come up with (apart from + agreeing with the rebuttals ... already made) is "FUD". If this had + been part of the language from day one, I very much doubt it would have + made Andrew Kuchling's "Python Warts" page. Reference Implementation - A preliminary patch against the CVS Python source is available[7]. + The current implementation, in a preliminary state (no docs and no + focused tests), is part of Python's CVS development tree[9]. + Using this requires that you build Python from source. + + This was derived from an earlier patch by Neil Schemenauer[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html + [9] To experiment with this implementation, check out Python from CVS + according to the instructions at + http://sf.net/cvs/?group_id=5470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From mal@lemburg.com Sat Jun 23 11:54:27 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 23 Jun 2001 12:54:27 +0200 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B347563.9BBEF858@lemburg.com> Armin Rigo wrote: > > Hello Jeff, > > On Fri, 22 Jun 2001, Jeff Epler wrote: > > What are you using to generate code? > > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) > > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Sound like you are using (re)compiling on-the-fly -- that would certainly be a very reasonable way to deal with Python's dynamic object world. It would also solve the problems of static compilers with type inference nicely. A very nice idea ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Sat Jun 23 15:11:03 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Sat, 23 Jun 2001 09:11:03 -0500 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B347563.9BBEF858@lemburg.com> References: <3B347563.9BBEF858@lemburg.com> Message-ID: <15156.41847.86431.594106@beluga.mojam.com> mal> Sound like you are using (re)compiling on-the-fly ... This is what the Self compiler did, though I don't know if its granularity was as fine as I understand psyco's is from reading its README file. It's been awhile since I read through that stuff, but I seem to recall it would compile functions to machine code only if they were heavily executed. It also did a lot of type inferencing. Skip From guido@digicool.com Sat Jun 23 16:58:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 23 Jun 2001 11:58:40 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Sat, 23 Jun 2001 10:13:04 +0200." References: Message-ID: <20010623160024.QWCF14539.femail14.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) This has great promise! Once you have an interpreter for some kind of pseudo-code, it's always possible to tweak the interpreter or the pseudo-code to make it faster. And you can make another jump to machine code to make it a lot faster. There was a project (p2c or python2c) that tried to compile an entire Python program to C code that was mostly just calling the Python runtime C API functions. It also obtained about a factor of 2 in speed-up, but its problem was (if I recall) that even a small Python module translated into hundreds of thousands of lines of C -- think what that would do to locality. Since you have already obtained the same speedup with your approach, I think there's great promise. Count on sending in a paper for the next Python conference! > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Very cool: a Python JIT compiler. > (More about it later.) Can't wait! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@beowolf.digicool.com Sun Jun 24 03:41:04 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Sat, 23 Jun 2001 22:41:04 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010624024104.A757728927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ A couple of small updates, including spelling the keywords correctly in the language reference. This version brings back the hyperlinked grammar productions I played around with earlier. They still need work, but they are somewhat better than plain text. From m.favas@per.dem.csiro.au Sun Jun 24 05:25:27 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 12:25:27 +0800 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <3B356BB7.9BE71569@per.dem.csiro.au> Socketmodule at the moment has multiple problems after the changes to handle IPv6: 1: socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These functions both use offsetof(), which is defined (on my system, at least) in stddef.h. The #include for this file is inside a #if 0 block. 2: #including this file allow the compile to complete without error. However, there is no Makefile dependency on these two files, once socketmodule.o has been built. Changes to either of the get{name,addr}info.c files will not cause socketmodule to be rebuilt. 3: The socket module still does not work, however, since it refers to an unresolved symbol inet_pton >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: inet_pton inet_pton is called in two places in getaddrinfo.c... there's likely to be other platforms besides Tru64 Unix that do not have this function. -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one@home.com Sun Jun 24 05:48:32 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 24 Jun 2001 00:48:32 -0400 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B356BB7.9BE71569@per.dem.csiro.au> Message-ID: ]Mark Favas] > Socketmodule at the moment has multiple problems after the changes to > handle IPv6: > > 1: > socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These > functions both use offsetof(), which is defined (on my system, at least) > in stddef.h. The #include for this file is inside a #if 0 block. > > 2: > #including this file allow the compile to complete without error. > However, there is no Makefile dependency on these two files, once > socketmodule.o has been built. Changes to either of the > get{name,addr}info.c files will not cause socketmodule to be rebuilt. > > 3: > The socket module still does not work, however, since it refers to an > unresolved symbol inet_pton > >>> import socket > Traceback (most recent call last): > File "", line 1, in ? > File > "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Li > b/socket.py", > line 41, in ? > from _socket import * > ImportError: Unresolved symbol in > /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/bui > ld/lib.osf1-V4.0-alpha-2.2/_socket.so: > inet_pton > > inet_pton is called in two places in getaddrinfo.c... there's likely to > be other platforms besides Tru64 Unix that do not have this function. If it's any consolation, the Windows build is in worse shape: socketmodule.c Modules\addrinfo.h(123) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(125) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\getaddrinfo.c(109) : warning C4013: 'offsetof' undefined; assuming extern returning int Modules\getaddrinfo.c(109) : error C2143: syntax error : missing ')' before 'type' Modules\getaddrinfo.c(109) : error C2099: initializer is not a constant Modules\getaddrinfo.c(109) : error C2059: syntax error : ')' Modules\getaddrinfo.c(111) : error C2059: syntax error : ',' Modules\getaddrinfo.c(407) : warning C4013: 'inet_pton' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_MULTICAST' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_EXPERIMENTAL' undefined; assuming extern returning int Modules\getaddrinfo.c(417) : error C2065: 'IN_LOOPBACKNET' : undeclared identifier Modules\getaddrinfo.c(417) : warning C4018: '==' : signed/unsigned mismatch Modules\getaddrinfo.c(531) : error C2373: 'WSAGetLastError' : redefinition; different type modifiers C:\VC98\INCLUDE\winsock.h(787) : see declaration of 'WSAGetLastError' Modules\getnameinfo.c(66) : error C2143: syntax error : missing ')' before 'type' Modules\getnameinfo.c(66) : error C2099: initializer is not a constant Modules\getnameinfo.c(66) : error C2059: syntax error : ')' Modules\getnameinfo.c(67) : error C2059: syntax error : ',' Modules\getnameinfo.c(133) : warning C4013: 'snprintf' undefined; assuming extern returning int Modules\getnameinfo.c(153) : warning C4018: '==' : signed/unsigned mismatch Modules\getnameinfo.c(167) : warning C4013: 'inet_ntop' undefined; assuming extern returning int Modules\getnameinfo.c(168) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Modules\getnameinfo.c(200) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Martin should revert the changes to socketmodule.c until this has a prayer of working. From est@hyperreal.org Sun Jun 24 06:38:06 2001 From: est@hyperreal.org (est@hyperreal.org) Date: Sat, 23 Jun 2001 22:38:06 -0700 (PDT) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: "from Armin Rigo at Jun 22, 2001 01:00:34 pm" Message-ID: <20010624053806.16277.qmail@hyperreal.org> Am I seeing things or does it actually speed up five to six times on my machine? Very exciting! timing specializing_call(, 2000)... result 1952145856 in 4.94 seconds timing specializing_call(, 2000)... result 1952145856 in 3.91 seconds timing f(2000,)... result 1952145856 in 25.17 seconds I wonder to what extent this approach can be applied to method calls. My analysis of my performance-bound Python apps convinces me that those are a major bottleneck for me. About a fifth of their time seems to go into creating the bound method object (reducable by caching them on the instance)..another fifth into allocating the memory for the frame object (ameliorated by pymalloc). As for the rest, I really don't know. E From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 09:34:06 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:34:06 +0200 Subject: [Python-Dev] gethostbyname2 Message-ID: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> The IPv6 patch proposes to introduce a new socket function, socket.gethostbyname2(name, af). This becomes necessary as a name might have both an IPv4 and an IPv6 address. One alternative for providing such API is to get socket.gethostbyname an optional second argument (the address family). itojun's rationale for calling it gethostbyname2 is that the C API, as defined in RFC 2133. Which of these alternatives would you prefer? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 09:20:31 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:20:31 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> After integrating the first chunk of IPv6 changes, Tim Peters quickly found that they won't compile on Windows - even though this was the least-critical part of the patch. Specifically, this code emulates the getaddrinfo and getnameinfo calls, which will be exposed to Python programs in a later patch. Therefore, it is essential that they are available on every system, either directly or through emulation. For Windows, one option is to use the Microsoft-provided emulation, which is available from http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp To use this emulation, only the header files of the package are required; it is not necessary to actually install the IPv6 preview on the system. The MS emulation will try to load a few DLLs which are known to provide getaddrinfo. If neither DLL is found, the code in the header file falls back to an emulation. That way, the resulting socket.pyd would use the true API function on installations that provide them, and the emulation on all other systems. The only requirement for building Python is then that the header file from the technology preview is available on the build machine (tpipv6.h). It may be that the header file is also included in recent SDK releases, I haven't checked. Is such a requirement acceptable for building the socket module on Windows? Regards, Martin From m.favas@per.dem.csiro.au Sun Jun 24 09:58:42 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 16:58:42 +0800 Subject: [Python-Dev] IPv6 support Message-ID: <3B35ABC2.11F3B261@per.dem.csiro.au> IPv6 support may be nice, and even desirable. However, supporting IPv6 should not come at the cost of causing problems either in compilation or at runtime on those platforms that do not support IPv6 natively. Requiring additional preview code or non-standardly-supplied packages to be installed is fine if people _want_ to take advantage of the new IPv6 functionality, but _not_ fine if this IPv6 functionality is not required. IPv4 support should not require the installation of additional IPv6 packages. Well, that's my 2 cent's worth (even if that's only 1 cent US ). -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From pf@artcom-gmbh.de Sun Jun 24 10:20:10 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Sun, 24 Jun 2001 11:20:10 +0200 (MEST) Subject: foobar2(), foobar3(), ... (was Re: [Python-Dev] gethostbyname2) In-Reply-To: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> from "Martin v. Loewis" at "Jun 24, 2001 10:34:06 am" Message-ID: Martin v. Loewis: > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? IMO: The possibility to add new keyword arguments with default values is one of the major strengths Python has compared to other programming languages. Especially in the scenario, where an existing mature API has to be enhanced later with added features: In such a situation I always prefer APIs with fewer functions (may be with large lists of optional arguments) compared to APIs containing a bunch of functions or methods called 'popen2()', 'gethostbyname2()' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From tim.one@home.com Sun Jun 24 11:51:40 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 24 Jun 2001 06:51:40 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. Mark Favas also reported failure on a Unix box -- we can't leave the CVS tree in an unusable state, and Mark in particular provides uniquely valuable feedback from his collection of Platforms from Mars . I #ifdef'ed out the offending includes on Windows for now, but that doesn't help Mark. > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp It says it's unsupported preview software for Win2K only. Since even the first *real* release of anything from MS sucks, I wouldn't touch this unless I absolutely had to. But I don't have any cycles for this project anyway, so this: > ... > Is such a requirement acceptable for building the socket module on > Windows? will have to be addressed by someone who does. Is anyone, e.g., at ActiveState keen on this? From mal@lemburg.com Sun Jun 24 12:06:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 13:06:19 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: <3B35C9AB.2D1D2185@lemburg.com> "Martin v. Loewis" wrote: > > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. > > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp > > To use this emulation, only the header files of the package are > required; it is not necessary to actually install the IPv6 preview on > the system. The MS emulation will try to load a few DLLs which are > known to provide getaddrinfo. If neither DLL is found, the code in the > header file falls back to an emulation. That way, the resulting > socket.pyd would use the true API function on installations that > provide them, and the emulation on all other systems. > > The only requirement for building Python is then that the header file > from the technology preview is available on the build machine > (tpipv6.h). It may be that the header file is also included in recent > SDK releases, I haven't checked. > > Is such a requirement acceptable for building the socket module on > Windows? Isn't this the MS SDK that has the new "Open Source" license clause in it ?! If yes, I very much doubt that this approach would be feasable for Python... http://msdn.microsoft.com/downloads/eula_mit.htm Quote from a recent posting by Steven Majewski on c.l.p.: """ (c) Open Source. Recipients license rights to the Software are conditioned upon Recipient (i) not distributing such Software, in whole or in part, in conjunction with Potentially Viral Software (as defined below); and (ii) not using Potentially Viral Software (e.g. tools) to develop Recipient software which includes the Software, in whole or in part. For purposes of the foregoing, Potentially Viral Software means software which is licensed pursuant to terms that: (x) create, or purport to create, obligations for Microsoft with respect to the Software or (y) grant, or purport to grant, to any third party any rights to or immunities under Microsofts intellectual property or proprietary rights in the Software. By way of example but not limitation of the foregoing, Recipient shall not distribute the Software, in whole or in part, in conjunction with any Publicly Available Software. Publicly Available Software means each of (i) any software that contains, or is derived in any manner (in whole or in part) from, any software that is distributed as free software, open source software (e.g. Linux) or similar licensing or distribution models; and (ii) any software that requires as a condition of use, modification and/or distribution of such software that other software distributed with such software (A) be disclosed or distributed in source code form; (B) be licensed for the purpose of making derivative works; or (C) be redistributable at no charge. Publicly Available Software includes, without limitation, software licensed or distributed under any of the following licenses or distribution models, or licenses or distribution models similar to any of the following: (A) GNUs General Public License (GPL) or Lesser/Library GPL (LGPL), (B) The Artistic License (e.g., PERL), (C) the Mozilla Public License, (D) the Netscape Public License, (E) the Sun Community Source License (SCSL), and (F) the Sun Industry Standards License (SISL). """ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Sun Jun 24 14:23:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 24 Jun 2001 09:23:52 -0400 Subject: [Python-Dev] gethostbyname2 In-Reply-To: Your message of "Sun, 24 Jun 2001 10:34:06 +0200." <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> References: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> Message-ID: <20010624132540.RTEI4013.femail3.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? Definitely an optional 2nd arg to gethostbyname() -- in C, you can't do tht, so they *had* to create a new function, but Python is more flexible. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Sun Jun 24 16:18:22 2001 From: DavidA@ActiveState.com (David Ascher) Date: Sun, 24 Jun 2001 08:18:22 -0700 Subject: [Python-Dev] IPv6 and Windows References: Message-ID: <3B3604BE.7E2F6C6E@ActiveState.com> Tim Peters wrote: > > Is such a requirement acceptable for building the socket module on > > Windows? > > will have to be addressed by someone who does. Is anyone, e.g., at > ActiveState keen on this? Not as far as I know. I haven't looked at the patches, but couldn't we have the IPv6 code be #ifdef'ed out, so that those who care about IPv6 can periodically test it while the various OS-level libraries are ramped up over the next months/years, but w/o disturbing the 'current' builds? --david From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 18:00:43 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:00:43 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> (mal@lemburg.com) References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> Message-ID: <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> > > Is such a requirement acceptable for building the socket module on > > Windows? > > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No, this has a different license text, which can be seen on http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp On redistribution, it says # If you redistribute the SOFTWARE and/or your Source Modifications, # or any portion thereof as provided above, you agree: (i) to # distribute the SOFTWARE only in conjunction with, and as part of, # your Source Modifications which add significant functionality to the # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source # Modifications solely as part of your research and not in any # commercial product; (iii) the SOFTWARE and/or your Source # Modifications will not be distributed for profit; (iv) to retain all # branding, copyright and trademark notices included with the SOFTWARE # and include a copy of this EULA with any distribution of the # SOFTWARE, or any portion thereof; and (v) to indemnify, hold # harmless, and defend Microsoft from and against any claims or # lawsuits, including attorneys' fees, that arise or result from # the use or distribution of your Source Modifications. I don't know whether this is acceptable or not. Regards, Martin From mal@lemburg.com Sun Jun 24 19:08:13 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 20:08:13 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> Message-ID: <3B362C8D.D3AECE3C@lemburg.com> "Martin v. Loewis" wrote: > > > > Is such a requirement acceptable for building the socket module on > > > Windows? > > > > Isn't this the MS SDK that has the new "Open Source" license > > clause in it ?! > > No, this has a different license text, which can be seen on > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp > > On redistribution, it says > > # If you redistribute the SOFTWARE and/or your Source Modifications, > # or any portion thereof as provided above, you agree: (i) to > # distribute the SOFTWARE only in conjunction with, and as part of, > # your Source Modifications which add significant functionality to the > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > # Modifications solely as part of your research and not in any > # commercial product; (iii) the SOFTWARE and/or your Source > # Modifications will not be distributed for profit; (iv) to retain all > # branding, copyright and trademark notices included with the SOFTWARE > # and include a copy of this EULA with any distribution of the > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > # harmless, and defend Microsoft from and against any claims or > # lawsuits, including attorneys' fees, that arise or result from > # the use or distribution of your Source Modifications. > > I don't know whether this is acceptable or not. Most likely not: there are lots of commercial Python users out there who wouldn't like these clauses at all... we'd also lose the GPL compatibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 18:48:03 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:48:03 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> > I haven't looked at the patches, but couldn't we have the IPv6 code > be #ifdef'ed out, so that those who care about IPv6 can periodically > test it while the various OS-level libraries are ramped up over the > next months/years, but w/o disturbing the 'current' builds? Not if we are going to introduce itojun's patch. In that patch, the IPv6 code *is* actually ifdef'ed out. It is getaddrinfo/getnameinfo that gives problems, which isn't IPv6 specific at all. The problem is that the library patches (httplib, ftplib, etc) do use getaddrinfo to find out how to contact a remote system, which is the right thing to do IMO. So even if the IPv6 support can be activated only if desired, getaddrinfo absolutely has to work. So the only question then is where we get an implementation of these functions if the system doesn't provide one. itojun has suggested the WIDE libraries; since they apparently don't compile on Windows, I've suggested the MS TP emulation. If the latter is not acceptable, we either have to fix the WIDE implementation to work on Windows also; As for the problems Mark reported: I think they can get fixed. Regards, Martin From thomas@xs4all.net Sun Jun 24 22:35:37 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 24 Jun 2001 23:35:37 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <20010624233537.R8098@xs4all.nl> On Sun, Jun 24, 2001 at 07:48:03PM +0200, Martin v. Loewis wrote: > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL support. I'm only comfortable with the IPv6 patch if it's optional, or can at least be disabled. I haven't looked at the patch, but why is getaddrinfo absolutely necessary, if the code works without it now, too ? > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; > As for the problems Mark reported: I think they can get fixed. What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I don't think it can't, it just takes more work. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 22:39:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:39:45 +0200 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> > 1: socketmodule.c now #includes getnameinfo.c and > getaddrinfo.c. These functions both use offsetof(), which is defined > (on my system, at least) in stddef.h. That should be fixed now. stddef.h is included in socketmodule.c; if it is not available or does not define offsetof, an additional definition is provided. > 2. [...] Changes to either of the get{name,addr}info.c files will > not cause socketmodule to be rebuilt. I don't know how to solve this one. If distutils builds the modules, makefile dependencies won't help. > 3. The socket module still does not work, however, since it refers > to an unresolved symbol inet_pton I took the simplest solution that I could think of, delegating inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all other address families (AF_INET6 in particular). I've verified that this code does the same as the builtin functions on my Linux system; please let me know whether it compiles for you. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Sun Jun 24 22:56:48 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:56:48 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <20010624233537.R8098@xs4all.nl> (message from Thomas Wouters on Sun, 24 Jun 2001 23:35:37 +0200) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> Message-ID: <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> > Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL > support. I'm only comfortable with the IPv6 patch if it's optional, or can > at least be disabled. I haven't looked at the patch, but why is getaddrinfo > absolutely necessary, if the code works without it now, too ? getaddrinfo offers protocol-independent address lookup. It is necessary to use that API to support AF_INET and AF_INET6 transparently in application code. itojun proposes to change a number of standard library modules. Please have a look at the actual patch for details; the typical change will look like this (for httplib) diff -u -r1.35 httplib.py --- Lib/httplib.py 2001/06/01 16:25:38 1.35 +++ Lib/httplib.py 2001/06/24 04:41:48 @@ -357,10 +357,22 @@ def connect(self): """Connect to the host and port specified in __init__.""" - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - if self.debuglevel > 0: - print "connect: (%s, %s)" % (self.host, self.port) - self.sock.connect((self.host, self.port)) + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): + af, socktype, proto, canonname, sa = res + try: + self.sock = socket.socket(af, socktype, proto) + if self.debuglevel > 0: + print "connect: (%s, %s)" % (self.host, self.port) + self.sock.connect(sa) + except socket.error, msg: + if self.debuglevel > 0: + print 'connect fail:', (self.host, self.port) + self.sock.close() + self.sock = None + continue + break + if not self.sock: + raise socket.error, msg def close(self): """Close the connection to the HTTP server.""" As you can see, the modified code can simultaneously access both IPv4 and IPv6 hosts, and will pick whatever it can connect to best. Without getaddrinfo, httplib would continue to support IPv4 hosts only. The IPv6 support itself is absolutely optional. If it is not available, getaddrinfo will never return IPv6 addresses, or propose AF_INET6 as the address family. > What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If > this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I > don't think it can't, it just takes more work. Depends on what zero-impact-if-necessary means to you. The patch, as it stands, can be fixed to compile on all systems that are currently supported. It cannot be fixed to be taken completely out (unless you literally do that: take it out). I don't plan to fight for it too much. Please have a look at the code itself, and try to cooperate on integrating it. Don't reject it outright without having even looked at it. If I get strong rejections from everybody, I'll just withdraw it and feel sorry for the time I've already spent with it. Regards, Martin From m.favas@per.dem.csiro.au Sun Jun 24 23:16:25 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Mon, 25 Jun 2001 06:16:25 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> Message-ID: <3B3666B9.335DA17E@per.dem.csiro.au> [Martin v. Loewis] > > > 1: socketmodule.c now #includes getnameinfo.c and > > getaddrinfo.c. These functions both use offsetof(), which is defined > > (on my system, at least) in stddef.h. > > That should be fixed now. stddef.h is included in socketmodule.c; if > it is not available or does not define offsetof, an additional > definition is provided. Yes, this is fine now... > > > 2. [...] Changes to either of the get{name,addr}info.c files will > > not cause socketmodule to be rebuilt. > > I don't know how to solve this one. If distutils builds the modules, > makefile dependencies won't help. > > > 3. The socket module still does not work, however, since it refers > > to an unresolved symbol inet_pton > > I took the simplest solution that I could think of, delegating > inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all > other address families (AF_INET6 in particular). I've verified that > this code does the same as the builtin functions on my Linux system; > please let me know whether it compiles for you. > To get socketmodule.c to compile, I had to make a change to line 2963 so that the declaration of inet_pton matched the previous declaration on line 220 (changing char *src to const char *src). Still have problems though, due to the use of snprintf in getnameinfo.c: Python 2.2a0 (#444, Jun 25 2001, 05:58:17) [C] on osf1V4 Type "copyright", "credits" or "license" for more information. >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: snprintf Cheers, Mark -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one@home.com Mon Jun 25 06:02:30 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:02:30 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> Message-ID: >> http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp [MAL] > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No. That was for the "Mobile Internet Toolkit" toolkit; no relation, AFAICT. > If yes, I very much doubt that this approach > would be feasable for Python... > > http://msdn.microsoft.com/downloads/eula_mit.htm From tim.one@home.com Mon Jun 25 06:14:17 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:14:17 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; I don't have cycles for this, but will cheerily suggest that the WIDE problems didn't appear especially deep, just "the usual" careless brand of Unix+gcc+glibc specific coding. For example, HAVE_LONG_LONG is #define'd on Windows, but, just as in Python source, you can't *use* "long long" literally, you have to use the LONG_LONG macro instead. Then Windows doesn't have an offsetof() macro, or an snprintf() either. Etc. The code is in trouble exactly where it relies on platform-specific extensions to the std C language and library. Problems with those won't be unique to Windows, either, which is a deeper concern (but already well expressed by others). It would be nice if Python could contribue portability back to WIDE. That requires worker bees, though, and lots of x-platform testing. If it turns out we can't swing that, then support for this is premature, and we should wait, e.g., for WIDE to put more effort into porting their code. From just@letterror.com Mon Jun 25 07:55:17 2001 From: just@letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 08:55:17 +0200 Subject: [Python-Dev] os.path.normcase() in site.py Message-ID: <20010625085521-r01010600-9a6226c8@213.84.27.177> I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago. I think this is wrong: we should always try to *preserve* case. I see os.path.normcase() as a tool to be able to better compare two paths, but you shouldn't *store* paths this way. I for one am irritated when I see a path that doesn't have the proper case. The intention of makepath() in site.py seems good -- it turns all paths into absolute paths -- but is the normcase really neccesary? *** Please CC follow-ups to me, as I'm not on python-dev. Just From martin@loewis.home.cs.tu-berlin.de Mon Jun 25 07:39:44 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 25 Jun 2001 08:39:44 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B3666B9.335DA17E@per.dem.csiro.au> (message from Mark Favas on Mon, 25 Jun 2001 06:16:25 +0800) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> > To get socketmodule.c to compile, I had to make a change to line 2963 > so that the declaration of inet_pton matched the previous declaration on > line 220 (changing char *src to const char *src). Still have problems > though, due to the use of snprintf in getnameinfo.c: Ok, they are printing a single number into a 512 byte buffer; that is safe even with sprintf only, so I have just remove the snprintf call. Can you please try again? Thanks for your reports, Martin From thomas@xs4all.net Mon Jun 25 08:20:53 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 09:20:53 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625085521-r01010600-9a6226c8@213.84.27.177> References: <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625092053.S8098@xs4all.nl> On Mon, Jun 25, 2001 at 08:55:17AM +0200, Just van Rossum wrote: > *** Please CC follow-ups to me, as I'm not on python-dev. Is that by choice ? It seems rather... peculiar, to me, that you have checkin access but aren't on python-dev. You'll miss all those wonderful "Don't touch CVS, I'm building a release" and "Who put CVS in an unstable state?" messages. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Mon Jun 25 08:51:00 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 25 Jun 2001 03:51:00 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625092053.S8098@xs4all.nl> Message-ID: [Just van Rossum] > *** Please CC follow-ups to me, as I'm not on python-dev. [Thomas Wouters] > Is that by choice ? It seems rather... peculiar, to me, that you have > checkin access but aren't on python-dev. Well, I suppose it's supposed to be a secret, but Guido and Just haven't talked in 17 years come Wednesday. IIRC, something about a bottle of wine and a toilet seat, and a small but energetic ferret. Just hacked his way into SourceForge access (those skills just run in the family, I guess), but every time he hacks onto Python-Dev Guido detects it and locks him out again. It's very sad, really -- but also wonderfully Dutch. at-least-that's-the-best-explanation-i-can-think-of-ly y'rs - tim From thomas@xs4all.net Mon Jun 25 09:35:38 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 10:35:38 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: References: Message-ID: <20010625103538.T8098@xs4all.nl> On Mon, Jun 25, 2001 at 03:51:00AM -0400, Tim Peters wrote: [ Tim explains about the century-old, horrid blood feud that cost the lives of many an innocent ferret, not to mention bottles of wine, caused by Just's future attempts to join python-dev -- damn that timemachine ] Okay... how about someone takes Guido out for dinner and feeds him way too many bottles of wine and ferrets to show him such things do not necessarily lead to blood feuds ? Maybe take along some psychotropic drugs and a halfway decent hypnotist for safety's measure. Meanwhile Barry subscribes Just to python-dev and you or someone else with the pickpocket skills to get at the keys for the time machine (come on, fess up, you all practiced) make sure Guido can't get at it, lest he try and make up with Just in the past in his 'suggestable' state... Better change the Mailman admin password too, just to be on the safe side. Or if that has no chance of a prayer in hell of working, I can give Just a secret xs4all.nl address (since he has an XS4ALL account nowadays, that shouldn't be a problem) and we just never tell Guido that py-dev@xs4all.nl is really Just ;) > It's very sad, really -- but also wonderfully Dutch. No, it would only be wondefully dutch if either brother was German or Belgian in some way, or of royal blood and married to the wrong type of christian sect (Protestant or Catholic -- I keep forgetting which is which.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Mon Jun 25 10:05:23 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 25 Jun 2001 05:05:23 -0400 Subject: [Python-Dev] RE: [Python-iterators] Death by Leakage In-Reply-To: Message-ID: Here's a simpler leaker, amounting to an insanely convoluted way to generate the ints 1, 2, 3, ...: DO_NOT_LEAK = 1 class LazyList: def __init__(self, g): self.sofar = [] self.fetch = g.next def __getitem__(self, i): sofar, fetch = self.sofar, self.fetch while i >= len(sofar): sofar.append(fetch()) return sofar[i] def clear(self): self.__dict__.clear() def plus1(g): for i in g: yield i + 1 def genm23(): yield 1 for i in plus1(m23): yield i for i in range(10000): m23 = LazyList(genm23()) [m23[i] for i in range(50)] if DO_NOT_LEAK: m23.clear() Neil, it would help if genobjects had a memberlist so that the struct members were discoverable from Python code; that would also let me add appropriate methods to Cyclops.py to find cycles automatically. Anyway, m23 is a LazyList instance, where m23.fetch is genm23().next, i.e. m23.fetch is s bound method of the genm23() generator-iterator. So the frame for genm23 is reachable from m23. __dict__. That frame contains an anonymous (it's living in the frame's valuestack) generator-iterator thingie corresponding to the plus1(m23) call. *That* generator's frame in turn has m23 in its locals (m23 was an argument to plus1), and another iterator method referencing m23 in its valuestack (due to the "for i in g"). But m23 is the LazyList instance we started with, so there's a cycle, and clearing m23.__dict__ breaks it. gc doesn't chase generators or frames, so it can't clean this stuff up if we don't clear the dict. So this appears hopeless unless gc adds both generators and frames to its repertoire. OTOH, it's got to be rare -- maybe . Worth it? From loewis@informatik.hu-berlin.de Mon Jun 25 10:43:33 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 25 Jun 2001 11:43:33 +0200 (MEST) Subject: [Python-Dev] make static Message-ID: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> There is a bug report on SF that 'make static' fails for a Makefile.pre.in extension, see http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 Is that process still supported? Unless I'm mistaken, this is complicated by the fact that Makefile.pre.in packages use the Makefile.pre.in that comes with the package, not the one that comes with the Python installation. Any insights welcome, Martin From jack@oratrix.nl Mon Jun 25 11:18:40 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:18:40 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: Message by Mark Favas , Mon, 25 Jun 2001 06:16:25 +0800 , <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <20010625101842.B6BC6303182@snelboot.oratrix.nl> I'm having a lot of problems with the new getaddrinfo stuff: no prototypes used in various routines, missing consts in routine declarations and then passing const strings to it, all routines seem to be globals (and with pretty dangerous names) even though they all look pretty static to me, etc. Could whoever put this in do a round of quality control on it, please? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon Jun 25 11:28:08 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:28:08 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Message by Just van Rossum , Mon, 25 Jun 2001 08:55:17 +0200 , <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625102809.42357303182@snelboot.oratrix.nl> > I noticed that these days __file__ attributes of modules are case normalized > (ie. lowercased on case insensitive file systems), or at least the directory > part. Then I noticed that this is caused by the fact that all sys.path entries > are case normalized. It turns out that site.py does this, in a function called > makepath(), added by Fred about 8 months ago. > > I think this is wrong: we should always try to *preserve* case. There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters. And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting. I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik@pythonware.com Mon Jun 25 13:12:23 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 25 Jun 2001 14:12:23 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> Message-ID: <006101c0fd70$17a6b660$0900a8c0@spiff> martin wrote: > getaddrinfo offers protocol-independent address lookup. It is > necessary to use that API to support AF_INET and AF_INET6 > transparently in application code. itojun proposes to change a number > of standard library modules. Please have a look at the actual patch > for details; the typical change will look like this (for httplib) > > diff -u -r1.35 httplib.py > --- Lib/httplib.py 2001/06/01 16:25:38 1.35 > +++ Lib/httplib.py 2001/06/24 04:41:48 > @@ -357,10 +357,22 @@ > > def connect(self): > """Connect to the host and port specified in __init__.""" > - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > - if self.debuglevel > 0: > - print "connect: (%s, %s)" % (self.host, self.port) > - self.sock.connect((self.host, self.port)) > + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): > + af, socktype, proto, canonname, sa = res > + try: > + self.sock = socket.socket(af, socktype, proto) > + if self.debuglevel > 0: > + print "connect: (%s, %s)" % (self.host, self.port) > + self.sock.connect(sa) > + except socket.error, msg: > + if self.debuglevel > 0: > + print 'connect fail:', (self.host, self.port) > + self.sock.close() > + self.sock = None > + continue > + break > + if not self.sock: > + raise socket.error, msg instead of adding code like that to every single module, maybe we should add a convenience function to the socket module? (and make that function smart enough to work also if getaddrinfo isn't supported by the native platform...) From guido@digicool.com Mon Jun 25 14:40:10 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:10 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251340.f5PDeAO07244@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:40:47 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:47 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 10:35:38 +0200." <20010625103538.T8098@xs4all.nl> References: <20010625103538.T8098@xs4all.nl> Message-ID: <200106251340.f5PDele07256@odiug.digicool.com> No need to get me drunk. Barry & I decided to change this policy weeks ago, but (in order to avoid a flurry of subscription requests from functional-language proponents) we decided to keep the policy change a secret. :-) Just can suscribe safely now. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:40:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251340.f5PDe6e07238@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:41:46 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:41:46 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 19:48:03 +0200." <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <200106251341.f5PDfkg07283@odiug.digicool.com> > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Yes, but in an IPv4-only environment it would be super trivial to implement, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:42:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:42:18 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 20:08:13 +0200." <3B362C8D.D3AECE3C@lemburg.com> References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> <3B362C8D.D3AECE3C@lemburg.com> Message-ID: <200106251342.f5PDgI107298@odiug.digicool.com> > > # If you redistribute the SOFTWARE and/or your Source Modifications, > > # or any portion thereof as provided above, you agree: (i) to > > # distribute the SOFTWARE only in conjunction with, and as part of, > > # your Source Modifications which add significant functionality to the > > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > > # Modifications solely as part of your research and not in any > > # commercial product; (iii) the SOFTWARE and/or your Source > > # Modifications will not be distributed for profit; (iv) to retain all > > # branding, copyright and trademark notices included with the SOFTWARE > > # and include a copy of this EULA with any distribution of the > > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > > # harmless, and defend Microsoft from and against any claims or > > # lawsuits, including attorneys' fees, that arise or result from > > # the use or distribution of your Source Modifications. > > > > I don't know whether this is acceptable or not. > > Most likely not: there are lots of commercial Python users out there > who wouldn't like these clauses at all... we'd also lose the GPL > compatibility. Don't even *think* about using code with that license. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:43:04 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:04 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251343.f5PDh4907304@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 14:43:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:25 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251343.f5PDhQ407309@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Mon Jun 25 14:50:31 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 25 Jun 2001 08:50:31 -0500 Subject: [Python-Dev] xrange vs generators Message-ID: <15159.16807.480121.637386@beluga.mojam.com> With generators in the language, should xrange be deprecated? Skip From just@letterror.com Mon Jun 25 15:05:43 2001 From: just@letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 16:05:43 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <20010625160545-r01010600-e232a14e@213.84.27.177> Guido van Rossum wrote: > I can't think of any function besides the attempt to avoid duplicates. > > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? They are already, they already go through abspath(), which calls normpath(). > I'll leave it to Fred, Jack or Just to fix this. If it were up to me, I'd simply remove the normcase() call from makepath(). Just From arigo@ulb.ac.be Mon Jun 25 14:08:52 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:08:52 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625134824.00abde60@127.0.0.1> Hello everybody, A note about what I have in mind about Psyco... Type-sets are independent=20 from memory representation. In other words, it is not because two variables= =20 can take the same set of values that the data is necessarily encoded in the= =20 same way in memory. In particular, I believe we won't need to change the way the current Python= =20 interpreted encodes data. For example, instances currently have a=20 dictionary of attributes and no "fixed slots", but this is not a problem=20 for Psyco, which can encode instances in better ways (e.g. as a C struct)=20 as long as it is only accessed by Psyco-compiled Python code and no=20 "legacy" code. This approach also allows Psyco to completely remove the overhead of=20 creating bound method objects and frame objects; both are generally=20 temporary, and so during their whole lifetime they can be represented much= =20 more efficiently in memory. For frame objects it should be clear (we=20 probably need no frame at all as long as no exception exits the current=20 procedure, and even in this case it could be optimized). For method objects= =20 we use "memory sharing", a technique already applied in the current Psyco.= =20 More precisely, if some (immutable) data is found at some memory location=20 (or machine register) and Python code says it should be duplicated, we need= =20 not duplicate it at all; we can just consider that the copy is at the same= =20 location as the original. For method objects it means the following:=20 suppose you have an instance "xyz" and query its "foo()" method. Suppose=20 that you can (at some time) be sure that, because of the class of "xyz",=20 "xyz.foo" will always be the Python function "f". Then the method object's= =20 representation can be simplified: all it needs to store in memory is a=20 pointer to "xyz", because "f" is a constant part. Now a single pointer to=20 the "xyz" instance is exactly the same memory format as the original "xyz"= =20 variable, so that this particular representation of a bound method object=20 can share the original "xyz" pointer. No actual machine code is produced;=20 Psyco simply notes that both "xyz" and "xyz.foo" are represented at the=20 same location, althought "xyz" represents an instance with the given=20 pointer, and "xyz.foo" represents the "f" function with its first argument= =20 bound to the given pointer. According to est@hyperreal.org, method and frame objects each represent 20%= =20 of the execution time... (Est, on which kind of machine did you get Psyco=20 run the sample code 5 times faster !? It's only 2 times faster on a modern= =20 Pentium...) A bient=F4t, Armin. From arigo@ulb.ac.be Mon Jun 25 14:45:20 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:45:20 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625150819.00aa5220@127.0.0.1> Hello, At 14:59 22.06.2001 +0200, Samuele Pedroni wrote: >*: some possible useful hooks would be: >- minimal profiling support in order to specialize only things called often >- feedback for dynamic changing of methods, class hierarchy, ... if we want >to optimize method lookup (which would make sense) >- a mixed fixed slots/dict layout for instances. There is one point that you didn't mention, which I believe is important: how to handle global/builtin variables. First, a few words about the current Python semantics. * I am sorry if what follows has already been discussed; I am raising the question again because it might be important for Psyco. If you feel this should better be a PEP please just tell me so. * Complete lexical scoping was recently added, implemented with "free" and "cell" variables. These are only used for functions defined inside of other functions; top-level functions use the opcode LOAD_GLOBAL for all non-local variables. LOAD_GLOBAL performs one or two dictionary look-up (two if the variable is built-in). For simple built-ins like "len" this might be expensive (has someone measured such costs ?). I suggest generalizing the compile-time lexical scoping rules. Let's compile all functions' non-local variables (top-level and others) as "free" variables. This means the corresponding module's global variables must be "cell" variables. This is just what we would get if the module's code was one big function enclosing the definition of all the other functions. Next, the variables not defined in the module (the built-ins) are "free" variables of the module, and the built-in module provides "cell" variables for them. Remember that "free" and "cell" variables are linked together when the function (or module in this case) is defined (for functions, when "def" is executed; for modules, it would be at load-time). Benefit: not a single dictionary look-up any more; uniformity of treatment. Potential code break: global variables shadowing built-ins would behave like local variables shadowing globals, i.e. the mere presence of a global "xyz=..." would forever hide the "xyz" built-in from the module, even before the assignment or after a "del xyz". (c.f. UnboundLocalError.) To think about: what the "global" keyword would mean in this context. Implementation problems: if we want to keep the module's dictionary of global variables (and we certainly do) it would require changes to the dictionary implementation (or the creation of a different kind of dictionary). One solution is to automatically dereference cell objects and raise exceptions upon reading empty cells. Another solution is to turn dictionaries into collections of objects that all behave like cell objects (so that if "d" is any dictionary, something like "d.ref(key)" would let us get a cell object which could be read or written later to actually get or set the value associated to "key", and "d[key]" would mean "d.ref(key).cell_ref). Well, these are just proposals; they might not be a good solution. Why it is related to Psyco: the current treatment of globals/builtins makes it hard for Psyco to statically tell what function we are calling when it sees e.g. "len(a)" in the code. We would at least need some help from the interpreter; at least hooks called when the module's globals() dictionary change. The above proposal might provide a more uniform solution. Thanks for your attention. Armin. From guido@digicool.com Mon Jun 25 15:26:08 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:26:08 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 08:50:31 CDT." <15159.16807.480121.637386@beluga.mojam.com> References: <15159.16807.480121.637386@beluga.mojam.com> Message-ID: <200106251426.f5PEQ8907629@odiug.digicool.com> > With generators in the language, should xrange be deprecated? > > Skip No, but maybe xrange() should be changed to return an iterator. E.g. something like this: def xrange(start, stop, step): while start < stop: yield start start += stop but with the appropriate defaults, and reversal of the test if step < 0, and an error if step == 0, and type checks enforcing ints (or long ints!), and implemented in C. :-) Although xrange() objects currently support some sequence algebra, that is mostly bogus and I don't think anyone in their right mind uses it. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Mon Jun 25 15:37:31 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 16:37:31 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> > > With generators in the language, should xrange be deprecated? > > > > Skip > > No, but maybe xrange() should be changed to return an iterator. > E.g. something like this: > > def xrange(start, stop, step): > while start < stop: > yield start > start += stop > > but with the appropriate defaults, and reversal of the test if step < > 0, and an error if step == 0, and type checks enforcing ints (or long > ints!), and implemented in C. :-) > > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I _was_ using xrange as sets representing (potentially large) ranges of ints. Example: positive = xrange(1, sys.maxint) if num in positive: ... I didt follow the iterators discussion: would this continue to work? Thomas From esr@thyrsus.com Mon Jun 25 15:41:34 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 25 Jun 2001 10:41:34 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251426.f5PEQ8907629@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 10:26:08AM -0400 References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <20010625104134.B30559@thyrsus.com> Guido van Rossum : > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I agree. As long as we make those cases fail loudly, I see no objection to dropping support for them. -- Eric S. Raymond Americans have the will to resist because you have weapons. If you don't have a gun, freedom of speech has no power. -- Yoshimi Ishikawa, Japanese author, in the LA Times 15 Oct 1992 From barry@digicool.com Mon Jun 25 15:38:20 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 25 Jun 2001 10:38:20 -0400 Subject: [Python-Dev] os.path.normcase() in site.py References: <20010625103538.T8098@xs4all.nl> Message-ID: <15159.19676.727068.217548@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> Okay... how about someone takes Guido out for dinner and feeds TW> him way too many bottles of wine and ferrets to show him such TW> things do not necessarily lead to blood feuds ? Maybe take TW> along some psychotropic drugs and a halfway decent hypnotist TW> for safety's measure. Don't forget the dentist, proctologist, and a trepanist. Actually, if you can find a holeologist it would be much more efficient (my cousin Neil, a.k.a. Dr. Finger, a.k.a. Dr Watumpka would be ideal, but he's studying in Dortmund these days). TW> Meanwhile Barry subscribes Just to python-dev I'd be glad to, and I won't even divulge the fact that python-dev is only ostensibly a closed, insular mailing list these days. TW> and you or someone else with the pickpocket skills to get at TW> the keys for the time machine No pickpocketing skill necessary. Guido leaves the keys in a small safebox magnetically adhered underneath the running boards. Just be sure to ground yourself first (learned the hard way)! TW> (come on, fess up, you all practiced) make sure Guido can't TW> get at it, lest he try and make up with Just in the past in TW> his 'suggestable' state... Better change the Mailman admin TW> password too, just to be on the safe side. I've tried that many times, but I suspect Guido has a Pybot thermetically linked to the time machine which "instantly" recedes several seconds into the past each time I change it, only to change it back. TW> Or if that has no chance of a prayer in hell of working, I can TW> give Just a secret xs4all.nl address (since he has an XS4ALL TW> account nowadays, that shouldn't be a problem) and we just TW> never tell Guido that py-dev@xs4all.nl is really Just ;) You realize it's way too "late" for that, don't you? The time machine works just as well in the forward direction as in the past direction, and long before he left the comfy environs of Amsterdam to brave it out in the harsh, unforgiving wilderness of Washington, he mapped out every moment of young Wouters' life. Why do you think I've worn aluminum foil underwear for the past 30 years? Trust me, it's not for the feeling of freshness and confidence it provides (okay, only partially). >> It's very sad, really -- but also wonderfully Dutch. TW> No, it would only be wondefully dutch if either brother was TW> German or Belgian in some way, or of royal blood and married TW> to the wrong type of christian sect (Protestant or Catholic -- TW> I keep forgetting which is which.) It would also be wonderfully American, but only if Just had trivially wronged Guido years ago by eating one of his nabisco cookies or some such. -Barry From guido@digicool.com Mon Jun 25 15:47:50 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:47:50 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 16:37:31 +0200." <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> Message-ID: <200106251447.f5PEloH07777@odiug.digicool.com> [me] > > Although xrange() objects currently support some sequence algebra, > > that is mostly bogus and I don't think anyone in their right mind uses > > it. [theller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > > I didt follow the iterators discussion: would this > continue to work? No, it would break. And I see another breakage too: r = xrange(10) for i in r: for j in r: print i, j would not do the right thing if xrange() returned an iterator (because iterators can only be used once). This is too bad; I really wish that xrange() could die or be limited entirely to for loops. I wonder if we could put warnings on xrange() uses beyond the most basic...? --Guido van Rossum (home page: http://www.python.org/~guido/) From Samuele Pedroni Mon Jun 25 15:51:16 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Mon, 25 Jun 2001 16:51:16 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251451.QAA17756@core.inf.ethz.ch> Hi. [Armin Rigo] ... > Why it is related to Psyco: the current treatment of globals/builtins makes > it hard for Psyco to statically tell what function we are calling when it > sees e.g. "len(a)" in the code. We would at least need some help from the > interpreter; at least hooks called when the module's globals() dictionary > change. The above proposal might provide a more uniform solution. > FYI, a different proposal for opt. globals access by Jeremy Hylton. It seems, it would break fewer things ... don't know whether it can be as useful for Psyco: http://mail.python.org/pipermail/python-dev/2001-May/014995.html In any case I think Psyco will need notification support from the interpreter about dynamic changes to things that Psyco honestly assumes to be invariant in order to achieve performance. regards, Samuele Pedroni. From thomas.heller@ion-tof.com Mon Jun 25 16:05:09 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 17:05:09 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <00e001c0fd88$3a532140$e000a8c0@thomasnotebook> > [theller] > > I _was_ using xrange as sets representing (potentially large) > > ranges of ints. > > Example: > > > > positive = xrange(1, sys.maxint) > > > > if num in positive: > > ... > > > > I didt follow the iterators discussion: would this > > continue to work? > > No, it would break. Since there was a off-by-one bug for 'if num in xrange()' in Pyhon2.0 my code already has been rewritten. Thomas From Samuele Pedroni Mon Jun 25 16:04:45 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Mon, 25 Jun 2001 17:04:45 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251504.RAA18642@core.inf.ethz.ch> Hi. [Armin Rigo] > In particular, I believe we won't need to change the way the current Python > interpreted encodes data. For example, instances currently have a > dictionary of attributes and no "fixed slots", but this is not a problem > for Psyco, which can encode instances in better ways (e.g. as a C struct) > as long as it is only accessed by Psyco-compiled Python code and no > "legacy" code. This makes sense, but I'm asking if it is affordable to have all code executed (if we aim for usage-transparency) through Psyco-compiled code (memory foot-print, compilation vs. execution trade-offs for rarely executed code) Otherwise in a mixed execution context we would pay for conversions. I can see how a dynamic compiler can deal with methods together with the interpreter that notifies when a dynamic change to hierarchy, method defs can potetianlly invalidate compiled code. I see more problems with instance data slots, because there are no strong hints in the code about which are the "official" slots of a class, and undisciplined code can treat instances just as dicts. regards, Samuele Pedroni. From fdrake@acm.org Mon Jun 25 16:13:31 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 25 Jun 2001 11:13:31 -0400 (EDT) Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> References: <20010625102809.42357303182@snelboot.oratrix.nl> <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. There were two reasons for adding this code: 1. Avoid duplicates (speeds imports if there are duplicates and the modules are found on an entry after the dupes). 2. Avoid breakage when a script uses os.chdir(). This is probably unusual for large applications, but fairly common for little admin helper scripts. > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? > > I'll leave it to Fred, Jack or Just to fix this. I certainly agree that this can be improved; if Jack or Just would like to assign it to me on SourceForge, I'd be glad to fix it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim@digicool.com Mon Jun 25 16:39:47 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 11:39:47 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: [Thomas Heller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > I didt follow the iterators discussion: would this > continue to work? [Guido] > No, it would break. "x in y" works with any iterable y in 2.2, incl. generators. So e.g. >>> def xr(n): ... i = 0 ... while i < n: ... yield i ... i += 1 ... >>> 1 in xr(10) 1 >>> 9 in xr(10) 1 >>> 10 in xr(10) 0 >>> However, there's no __contains__ method here, so in the last case it actually did 10 compares. 0 in xr(sys.maxint) is very quick, but I'm still waiting for -1 in xr(sys.maxint) to complete . > And I see another breakage too: This would also apply to Thomas's example of giving a name to an xrange object, if implemented via generator: >>> small = xr(5) >>> 2 in small 1 >>> 2 in small 0 >>> > ... > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Hmm. I'd rather not endure the resulting complaints without a strong rationale for deprecating it. One that strikes close to my heart: there's more code in 2.2 to support xrange than there is to support generators! But users don't care about that. From thomas@xs4all.net Mon Jun 25 16:42:12 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 17:42:12 +0200 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <20010625174211.U8098@xs4all.nl> On Mon, Jun 25, 2001 at 10:47:50AM -0400, Guido van Rossum wrote: [ xrange can't be changed into a generator ] > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Why do we want to do this ? xrange() is still exactly what it was: an object that pretends to be a list of integers. Besides being useful for those who work a lot with ranges, it's a wondeful example on what you can do with Python (even if it isn't actually written in Python :-) I see less reason to deprecate xrange than to deprecate the gopherlib, wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Mon Jun 25 17:07:44 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:07:44 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 11:39:47 EDT." References: Message-ID: <200106251607.f5PG7iq08192@odiug.digicool.com> > Hmm. I'd rather not endure the resulting complaints without a > strong rationale for deprecating it. One that strikes close to my > heart: there's more code in 2.2 to support xrange than there is to > support generators! But users don't care about that. But I do, and historically this code has often been bug-ridden without anybody noticing -- so it's not like it's needed much. I would suggest to remove most of the fancy features of xrange(), in particular the slice, contains and repeat slots. A step further would be to remove getitem also, and add a tp_getiter slot instead -- returning not itself but a new iterator that iterates through the prescribed sequence. We need a PEP for this. Anyone? Should be short and sweet. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Mon Jun 25 17:11:10 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:11:10 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 17:42:12 +0200." <20010625174211.U8098@xs4all.nl> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> <20010625174211.U8098@xs4all.nl> Message-ID: <200106251611.f5PGBA608205@odiug.digicool.com> > [ xrange can't be changed into a generator ] > > > This is too bad; I really wish that xrange() could die or be limited > > entirely to for loops. I wonder if we could put warnings on xrange() > > uses beyond the most basic...? > > Why do we want to do this ? xrange() is still exactly what it was: an object > that pretends to be a list of integers. Besides being useful for those who > work a lot with ranges, it's a wondeful example on what you can do with > Python (even if it isn't actually written in Python :-) There is exactly *one* idiomatic use of xrange(): for i in xrange(...): ... All other operations supported by the xrange object are very rarely used, and historically their implementation has had obvious bugs that no-one noticed for years. > I see less reason to deprecate xrange than to deprecate the gopherlib, > wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. Those are useful application-area libraries for some folks. The idiomatic xrange() object is useful too. But the advanced features of xrange() are an example of code bloat. --Guido van Rossum (home page: http://www.python.org/~guido/) From Greg.Wilson@baltimore.com Mon Jun 25 17:25:33 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Mon, 25 Jun 2001 12:25:33 -0400 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1437 - 13 msgs Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E27F1@nsamcanms1.ca.baltimore.com> > Guido: > Since you have already obtained the same speedup with your approach, I > think there's great promise. Count on sending in a paper for the next > Python conference! Greg: "Doctor Dobb's Journal" would also be interested in an article. Who knows --- it might even be done before the ones on stackless, garbage collection, Zope acquisition, and generators... :-) Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From just@letterror.com Mon Jun 25 17:47:30 2001 From: just@letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 18:47:30 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Message-ID: <20010625184734-r01010600-dbd1c84a@213.84.27.177> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. Fred L. Drake, Jr. wrote: > There were two reasons for adding this code: > > 1. Avoid duplicates (speeds imports if there are duplicates and > the modules are found on an entry after the dupes). > > 2. Avoid breakage when a script uses os.chdir(). This is > probably unusual for large applications, but fairly common for > little admin helper scripts. 1) normcase(). Bad. 2) abspath(). Good. I think #2 is a ligitimate problem, but I'm not so sure of #1: is it really so common for sys.path to contain duplicates, to worry about it at all? > > I'll leave it to Fred, Jack or Just to fix this. > > I certainly agree that this can be improved; if Jack or Just would > like to assign it to me on SourceForge, I'd be glad to fix it. Here's my proposed fix: Index: site.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/site.py,v retrieving revision 1.27 diff -c -3 -r1.27 site.py *** site.py 2001/06/12 16:48:52 1.27 --- site.py 2001/06/25 16:42:33 *************** *** 67,73 **** def makepath(*paths): dir = os.path.join(*paths) ! return os.path.normcase(os.path.abspath(dir)) L = sys.modules.values() for m in L: --- 67,73 ---- def makepath(*paths): dir = os.path.join(*paths) ! return os.path.abspath(dir) L = sys.modules.values() for m in L: Just From aahz@rahul.net Mon Jun 25 18:19:48 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 25 Jun 2001 10:19:48 -0700 (PDT) Subject: [Python-Dev] 2.1.1 vs. os.normcase() Message-ID: <20010625171948.D636399C80@waltz.rahul.net> It's too late for 2.0.1, but should this bugfix go into 2.1.1? (Just to be clear, this is the problem that Just reported with site.py calling os.normcase() in makepath().) ((I'm only asking about this bug in specific because we're getting down to the wire on 2.1.1 IIUC.)) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido@digicool.com Mon Jun 25 19:06:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 14:06:02 -0400 Subject: [Python-Dev] 2.1.1 vs. os.normcase() In-Reply-To: Your message of "Mon, 25 Jun 2001 10:19:48 PDT." <20010625171948.D636399C80@waltz.rahul.net> References: <20010625171948.D636399C80@waltz.rahul.net> Message-ID: <200106251806.f5PI62L08770@odiug.digicool.com> > It's too late for 2.0.1, but should this bugfix go into 2.1.1? > > (Just to be clear, this is the problem that Just reported with site.py > calling os.normcase() in makepath().) > > ((I'm only asking about this bug in specific because we're getting down > to the wire on 2.1.1 IIUC.)) Unclear if it's purely a bugfix -- this could be considered a feature, but I don't know. What do others think? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@digicool.com Mon Jun 25 19:47:06 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 14:47:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > ... > With MacPython's way of finding the initial sys.path contents we > don't have the Windows-Python problem that we add the same directory > 5 times (once in uppercase, once in lowercase, once in mixed case, > once in mixed-case with / for \, etc:-), Happily, we don't have that problem on a stock Windows Python anymore: C:\Python21>python Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import sys, pprint >>> pprint.pprint(sys.path) ['', 'c:\\python21', 'c:\\python21\\dlls', 'c:\\python21\\lib', 'c:\\python21\\lib\\plat-win', 'c:\\python21\\lib\\lib-tk'] >>> OTOH, this is still Icky, because those don't match (wrt case) the names in the filesystem (e.g., just look at the initial prompt line: I was in Python21 when I ran this, not python21). > so if this is what it's trying to solve we can take it out easily. It's hard to believe Fred added code to solve a Windows problem ; I don't know what it's trying to do. From m.favas@per.dem.csiro.au Mon Jun 25 20:38:47 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 26 Jun 2001 03:38:47 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> Message-ID: <3B379347.7E8D00EB@per.dem.csiro.au> "Martin v. Loewis" wrote: > > > To get socketmodule.c to compile, I had to make a change to line 2963 > > so that the declaration of inet_pton matched the previous declaration on > > line 220 (changing char *src to const char *src). Still have problems > > though, due to the use of snprintf in getnameinfo.c: > > Ok, they are printing a single number into a 512 byte buffer; that is > safe even with sprintf only, so I have just remove the snprintf call. > Can you please try again? > > Thanks for your reports, > Martin No trouble... The current CVS compiles (with a warning), links, and runs. The warning given is: cc: Warning: /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Modu les/getaddrinfo.c, line 407: In this statement, the referenced type of the point er value "hostname" is const, but the referenced type of the target of this assi gnment is not. (notconstqual) if (inet_pton(gai_afdl[i].a_af, hostname, pton)) { ------------------------------------------------^ which can be fixed by declaring the second argument to inet_pton as const char* instead of char* in the two occurences of inet_pton in socketmodule.c Cheers, Mark -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From martin@loewis.home.cs.tu-berlin.de Tue Jun 26 00:08:00 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 26 Jun 2001 01:08:00 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106251341.f5PDfkg07283@odiug.digicool.com> (message from Guido van Rossum on Mon, 25 Jun 2001 09:41:46 -0400) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <200106251341.f5PDfkg07283@odiug.digicool.com> Message-ID: <200106252308.f5PN80701342@mira.informatik.hu-berlin.de> > > The problem is that the library patches (httplib, ftplib, etc) do use > > getaddrinfo to find out how to contact a remote system, which is the > > right thing to do IMO. So even if the IPv6 support can be activated > > only if desired, getaddrinfo absolutely has to work. > > Yes, but in an IPv4-only environment it would be super trivial to > implement, right? Right, and getaddrinfo.c/getnameinfo.c attempt such an implementation. They might attempt to get it "more right" than necessary, but still they are "pure C", in the sense that they don't rely on any libraries except for those available in a typical IPv4 sockets implementation. At least that's the theory. It turns out that they've been using inet_pton and snprintf, which is probably because they have been mainly tested on BSD. I'm in good faith that we can reduce them to a "no funny library calls needed" minimum. If somebody wants to implement them anew from ground up, only using what the socketmodule already uses, that would be fine as well. An actual review for the code for portability problems would also be helpful. Regards, Martin From greg@cosc.canterbury.ac.nz Tue Jun 26 05:32:05 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 26 Jun 2001 16:32:05 +1200 (NZST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106251451.QAA17756@core.inf.ethz.ch> Message-ID: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Samuele Pedroni : > a different proposal for opt. globals access > by Jeremy Hylton. It seems, it would break fewer things ... I really like Jeremy's proposal. I've been having similar thoughts myself for quite a while. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Tue Jun 26 15:57:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 10:57:37 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Tue, 26 Jun 2001 16:32:05 +1200." <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> References: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Message-ID: <200106261457.f5QEvbZ11007@odiug.digicool.com> > Samuele Pedroni : > > > a different proposal for opt. globals access > > by Jeremy Hylton. It seems, it would break fewer things ... > > I really like Jeremy's proposal. I've been having similar > thoughts myself for quite a while. > > Greg Ewing Ditto. Isn't this what I've been calling "low-hanging fruit" for ages? Apparently it's low but still out of reach. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue Jun 26 18:59:55 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 13:59:55 -0400 Subject: [Python-Dev] PEP 260: simplify xrange() Message-ID: <200106261759.f5QHxtH15045@odiug.digicool.com> Here's another sweet and short PEP. What do folks think? Is xrange()'s complexity really worth having? --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 260 Title: Simplify xrange() Version: $Revision: 1.1 $ Author: guido@python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 26-Jun-2001 Post-History: 26-Jun-2001 Abstract This PEP proposes to strip the xrange() object from some rarely used behavior like x[i:j] and x*n. Problem The xrange() function has one idiomatic use: for i in xrange(...): ... However, the xrange() object has a bunch of rarely used behaviors that attempt to make it more sequence-like. These are so rarely used that historically they have has serious bugs (e.g. off-by-one errors) that went undetected for several releases. I claim that it's better to drop these unused features. This will simplify the implementation, testing, and documentation, and reduce maintenance and code size. Proposed Solution I propose to strip the xrange() object to the bare minimum. The only retained sequence behaviors are x[i], len(x), and repr(x). In particular, these behaviors will be dropped: x[i:j] (slicing) x*n, n*x (sequence-repeat) cmp(x1, x2) (comparisons) i in x (containment test) x.tolist() method x.start, x.stop, x.step attributes By implementing a custom iterator type, we could speed up the common use, but this is optional (the default sequence iterator does just fine). I expect it will take at most an hour to rip it all out; another hour to reduce the test suite and documentation. Scope This PEP only affects the xrange() built-in function. Risks Somebody's code could be relying on the extended code, and this code would break. However, given that historically bugs in the extended code have gone undetected for so long, it's unlikely that much code is affected. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From fdrake@acm.org Tue Jun 26 21:01:41 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:01:41 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... Message-ID: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> I'd like people to run the attached C program and send the output to me. What this does is run the gettimeofday() and getrusage() functions until the time values change. The intent is to determine the quality of the available timing information. For example, on my Linux-Mandrake 7.2 installation with a stock 2.2.17 kernel, I get this: timeofday: 1 (1 calls), rusage: 10000 (2465 calls) Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Tue Jun 26 21:05:48 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:05:48 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> --FghjlKmKF6 Content-Type: text/plain; charset=us-ascii Content-Description: message body and .signature Content-Transfer-Encoding: 7bit Fred L. Drake, Jr. writes: > I'd like people to run the attached C program and send the output to OK, I've attached it this time. Sorry! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations --FghjlKmKF6 Content-Type: text/plain Content-Description: timer check program Content-Disposition: inline; filename="observation.c" Content-Transfer-Encoding: 7bit #include #include #include /* * This should be defined the same way Python defines it: * * Define if gettimeofday() does not have second (timezone) argument. * This is the case on Motorola V4 (R40V4.2) */ /* #undef GETTIMEOFDAY_NO_TZ */ #ifdef GETTIMEOFDAY_NO_TZ #define GETTIMEOFDAY(ptv) gettimeofday(ptv) #else #define GETTIMEOFDAY(ptv) gettimeofday((ptv), (struct timezone *)NULL) #endif static void calibrate(int report) { unsigned long timeofday_diff = 0; unsigned long rusage_diff = 0; unsigned long timeofday_calls = 0; unsigned long rusage_calls = 0; struct rusage ru1, ru2; struct timeval tv1, tv2; GETTIMEOFDAY(&tv1); while (1) { GETTIMEOFDAY(&tv2); ++timeofday_calls; if (tv1.tv_sec != tv2.tv_sec || tv1.tv_usec != tv2.tv_usec) break; } if (tv1.tv_sec == tv2.tv_sec) timeofday_diff = tv2.tv_usec - tv1.tv_usec; else timeofday_diff = (1000000 - tv1.tv_usec) + tv2.tv_usec; getrusage(RUSAGE_SELF, &ru1); while (1) { getrusage(RUSAGE_SELF, &ru2); ++rusage_calls; if (ru1.ru_utime.tv_sec != ru2.ru_utime.tv_sec) { rusage_diff = ((1000000 - ru1.ru_utime.tv_usec) + ru2.ru_utime.tv_usec); break; } else if (ru1.ru_utime.tv_usec != ru2.ru_utime.tv_usec) { rusage_diff = ru2.ru_utime.tv_usec - ru1.ru_utime.tv_usec; break; } else if (ru1.ru_stime.tv_sec != ru2.ru_stime.tv_sec) { rusage_diff = ((1000000 - ru1.ru_stime.tv_usec) + ru2.ru_stime.tv_usec); break; } else if (ru1.ru_stime.tv_usec != ru2.ru_stime.tv_usec) { rusage_diff = ru2.ru_stime.tv_usec - ru1.ru_stime.tv_usec; break; } } if (report) printf("timeofday: %lu (%lu calls), rusage: %lu (%lu calls)\n", timeofday_diff, timeofday_calls, rusage_diff, rusage_calls); } int main(int argc, char *argv[]) { calibrate(0); calibrate(0); calibrate(1); return 0; } --FghjlKmKF6-- From gward@python.net Tue Jun 26 21:10:09 2001 From: gward@python.net (Greg Ward) Date: Tue, 26 Jun 2001 16:10:09 -0400 Subject: [Python-Dev] make static In-Reply-To: <200106251340.f5PDeAO07244@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 09:40:10AM -0400 References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> <200106251340.f5PDeAO07244@odiug.digicool.com> Message-ID: <20010626161009.B2820@gerg.ca> On 25 June 2001, Guido van Rossum said: > As long as it works, it works. I don't think there's a reason to > spend more than absolutely minimal time trying to keep it working > though -- we're trying to encourage everybody to migrate towards > distutils. So (without having seen the SF report) I'd say "tough > luck". The catch is that I never got around to implementing statically building a new interpreter via the Distutils, so (for now) Makefile.pre.in is the only way to do this. ;-( (Unless someone added it to the Distutils while I wasn't looking, which wouldn't be hard since I haven't looked in, ummm, six months or so...) Greg -- Greg Ward - just another /P(erl|ython)/ hacker gward@python.net http://starship.python.net/~gward/ "When I hear the word `culture', I reach for my gun." --Goebbels "When I hear the word `Microsoft', *I* reach for *my* gun." --me From arigo@ulb.ac.be Wed Jun 27 03:01:54 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From arigo@ulb.ac.be Wed Jun 27 03:01:54 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From nas@python.ca Tue Jun 26 22:01:38 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 14:01:38 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 04:01:41PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <20010626140138.A2838@glacier.fnational.com> Fred L. Drake, Jr. wrote: > timeofday: 1 (1 calls), rusage: 10000 (2465 calls) My hacked version of Linux 2.4 on an AMD-800 box: timeofday: 1 (2 calls), rusage: 976 (1792 calls) I don't quite understand the output. What does the 976 mean? Neil From fdrake@acm.org Tue Jun 26 22:23:53 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 17:23:53 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626140138.A2838@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> Message-ID: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > My hacked version of Linux 2.4 on an AMD-800 box: > > timeofday: 1 (2 calls), rusage: 976 (1792 calls) > > I don't quite understand the output. What does the 976 mean? The "1" and the "976" are the appearant resolution of the time values reported by those two calls, in microseconds. It looks like the HZ define in that header file you pointed out could be bumped a little higher. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas@csiro.au Wed Jun 27 00:21:47 2001 From: mark.favas@csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 07:21:47 +0800 Subject: [Python-Dev] latest unicode-related change causes failure in test_unicode & test_unicodedata Message-ID: <3B39190B.E7DA5B5D@csiro.au> CVS of a short while ago, Tru64 Unix: "make test" gives two unicode-related failures: test_unicode test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding test_unicodedata The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_unicodedata Testing Unicode Database... Methods: ********************************************************************** Then ... We expected (repr): '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' But instead we got: '374108f225e0c1488f8389ce6333902830d299fb' test test_unicodedata failed -- Writing: '374108f225e0c1488f8389ce6333902830d299fb', expected: '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' Running the tests manually, test_unicode fails, test_unicodedata doesn't fail, but doesn't match the expected output for Methods: (test_unicode) Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing builtin codecs... Traceback (most recent call last): File "Lib/test/test_unicode.py", line 383, in ? verify(u'\ud800\udc02'.encode('utf-8') == \ File "./Lib/test/test_support.py", line 95, in verify raise TestFailed(reason) test_support.TestFailed: test failed (test_unicodedata) python Lib/test/test_unicodedata.py Testing Unicode Database... Methods: 374108f225e0c1488f8389ce6333902830d299fb Functions: 41e1d4792185d6474a43c83ce4f593b1bdb01f8a API: ok -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From JamesL@Lugoj.Com Wed Jun 27 01:06:23 2001 From: JamesL@Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 17:06:23 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39237F.1A7EF3F2@Lugoj.Com> Guido van Rossum wrote: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? Are there still known bugs that will take some effort to repair? Is xrange constantly touched when changes are made elsewhere? If no to both, then I suggest don't fix what ain't broken; life is too short. (Unless it is annoying you to distraction, then do the deed and get it over with.) From tim.one@home.com Wed Jun 27 01:32:26 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 26 Jun 2001 20:32:26 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39237F.1A7EF3F2@Lugoj.Com> Message-ID: [James Logajan] > Are there still known bugs that will take some effort to repair? Is > xrange constantly touched when changes are made elsewhere? If no to > both, then I suggest don't fix what ain't broken; life is too short. > (Unless it is annoying you to distraction, then do the deed and get > it over with.) I think it's more the latter. I partly provoked this by bitterly pointing out that there's more code in the CVS tree devoted to supporting the single xrange() gimmick than Neil Schemenauer added to support the get-out-of-town more powerful new generators. Masses of crufty code nobody benefits from are a burden on the soul. although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- full-of-crufty-old-irix5-demos-in-the-std-library-ly y'rs - tim From tdelaney@avaya.com Wed Jun 27 01:36:25 2001 From: tdelaney@avaya.com (Delaney, Timothy) Date: Wed, 27 Jun 2001 10:36:25 +1000 Subject: [Python-Dev] RE: PEP 260: simplify xrange() Message-ID: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 260 > Title: Simplify xrange() > Version: $Revision: 1.1 $ > Author: guido@python.org (Guido van Rossum) > Status: Draft > Type: Standards Track > Python-Version: 2.2 > Created: 26-Jun-2001 > Post-History: 26-Jun-2001 > > Abstract > > This PEP proposes to strip the xrange() object from some rarely > used behavior like x[i:j] and x*n. > > > Problem > > The xrange() function has one idiomatic use: > > for i in xrange(...): ... If this is to be done, I would also propose that xrange() and range() be changed to allow passing in a straight-out sequence such as in the following code in order to get rid of the need for range(len(seq)): import __builtin__ def range (start, stop=None, step=1, range=range): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return range(start2, stop2, step) except TypeError: assert stop is None return range(len(start)) def xrange (start, stop=None, step=1, xrange=xrange): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return xrange(start2, stop2, step) except TypeError: assert stop is None return xrange(len(start)) a = [5, 'a', 'Hello, world!'] b = range(a) c = xrange(4, 6) d = xrange(b) e = range(c) print a print b print c print d print e print range(d, 2) Tim Delaney From gward@python.net Wed Jun 27 02:24:32 2001 From: gward@python.net (Greg Ward) Date: Tue, 26 Jun 2001 21:24:32 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: ; from tdelaney@avaya.com on Wed, Jun 27, 2001 at 10:36:25AM +1000 References: Message-ID: <20010626212432.A4003@gerg.ca> On 27 June 2001, Delaney, Timothy said: > If this is to be done, I would also propose that xrange() and range() be > changed to allow passing in a straight-out sequence such as in the following > code in order to get rid of the need for range(len(seq)): I'm +1 on the face of it without stopping to consider any implications. ;-) Some bits of syntactic sugar as just too good to pass up. range(len(sequence)) is syntactic cod-liver oil. Greg -- Greg Ward - programmer-at-big gward@python.net http://starship.python.net/~gward/ Blood is thicker than water, and much tastier. From nas@python.ca Wed Jun 27 02:28:29 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 18:28:29 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 05:23:53PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Message-ID: <20010626182829.A3344@glacier.fnational.com> Fred L. Drake, Jr. wrote: > The "1" and the "976" are the appearant resolution of the time > values reported by those two calls, in microseconds. It looks like > the HZ define in that header file you pointed out could be bumped a > little higher. ;-) I've got it at 1024. >>> 976. / 10000 * 1024 99.942400000000006 I think yours is at the 100 default. Neil From fdrake@acm.org Wed Jun 27 03:14:00 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 22:14:00 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626182829.A3344@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> <20010626182829.A3344@glacier.fnational.com> Message-ID: <15161.16744.665259.229385@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > I've got it at 1024. > > >>> 976. / 10000 * 1024 > 99.942400000000006 > > I think yours is at the 100 default. That's correct. Yours could be bumped a bit (factor of 10? I'm not really sure where it would cause problems in practice, though I think I understand the general explanations I've seen), and mine could be bumped a good bit. But I intend to stick with a stock kernel since I expect most users will be using a stock kernel, and I don't have a pile of extra machines to play with. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From greg@cosc.canterbury.ac.nz Wed Jun 27 03:37:21 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 14:37:21 +1200 (NZST) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> Message-ID: <200106270237.OAA05182@s454.cosc.canterbury.ac.nz> Here are the results from a few machines around here: s454% uname -a SunOS s454 5.7 Generic_106541-10 sun4m sparc SUNW,SPARCstation-4 s454% observation timeofday: 2 (1 calls), rusage: 10000 (22 calls) oma% uname -a SunOS oma 5.7 Generic sun4u sparc SUNW,Ultra-4 oma% observation timeofday: 1 (2 calls), rusage: 10000 (115 calls) pc250% uname -a SunOS pc250 5.8 Generic_108529-03 i86pc i386 i86pc pc250% observation timeofday: 1 (1 calls), rusage: 10000 (232 calls) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From JamesL@Lugoj.Com Wed Jun 27 03:42:20 2001 From: JamesL@Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 19:42:20 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39480C.F4808C1F@Lugoj.Com> Tim Peters wrote: > [James Logajan] > > Are there still known bugs that will take some effort to repair? Is > > xrange constantly touched when changes are made elsewhere? If no to > > both, then I suggest don't fix what ain't broken; life is too short. > > (Unless it is annoying you to distraction, then do the deed and get > > it over with.) > > I think it's more the latter. I partly provoked this by bitterly pointing > out that there's more code in the CVS tree devoted to supporting the single > xrange() gimmick than Neil Schemenauer added to support the get-out-of-town > more powerful new generators. Masses of crufty code nobody benefits from > are a burden on the soul. Design mistakes one has made do tend to weigh on one's soul (speaking from more than two decades of programming experience) so I understand the primal urge to correct them when one can, and even when one shouldn't. So although I'm quite annoyed by all these new-fangled gimmicks being added to the language (i.e. Python generators being added to solve California's power problems) I have no problem with xrange being fenced in. (I find the very existence of the PEP process somewhat unsettling; there are now thousands of programmers trying to use the language. Why burden them with insuring their programs remain compatible with yet-another-damn-set-of-proposals every year? Or worse: trying to rewrite their code "more elegantly" using all the latest gimmicks. Why in my day, if you wanted to, say, save execution state, you figured out how to do it and didn't go crying to the language designer. Damn these young lazy programmers. Don't know how good they have it. Wouldn't know how to save their execution state if their lives depended on it. Harumph.) Speaking of "generators", I just want to say that I think that "generator" makes for lousy terminology. If I understand correctly, "generators" are coroutines that have peer-to-peer synchronized messaging (synchronizing and communicating at the "yield" points). To my mind, "generators" does not evoke that image at all. Assuming I understand it in my early senility.... > although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- > full-of-crufty-old-irix5-demos-in-the-std-library-ly Perhaps because the Irix community would be quite Irate if they were removed? From tim.one@home.com Wed Jun 27 05:38:15 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 27 Jun 2001 00:38:15 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39480C.F4808C1F@Lugoj.Com> Message-ID: [James Logajan] > Design mistakes one has made do tend to weigh on one's soul (speaking > from more than two decades of programming experience) so I understand > the primal urge to correct them when one can, and even when one > shouldn't. Is this a case when one shouldn't? That is, is it a specific comment on PEP 260, or just a general venting here? > So although I'm quite annoyed by all these new-fangled gimmicks being > added to the language (i.e. Python generators being added to solve > California's power problems) I have no problem with xrange being fenced > in. OK. > (I find the very existence of the PEP process somewhat unsettling; > there are now thousands of programmers trying to use the language. Why > burden them with insuring their programs remain compatible with yet- > another-damn-set-of-proposals every year? You can ask the C, C++, Fortran, Perl, COBOL (etc, etc) folks that too, but I suspect it's a rhetorical question. I wish you could ask the Java committee, but they work in secret . > Or worse: trying to rewrite their code "more elegantly" using all the > latest gimmicks. Use of new features isn't required by Guido, and neither is downloading new releases. If *you* waste your time doing that, we both know it's because you can't resist <0.5 wink>. > ... > Speaking of "generators", I just want to say that I think that > "generator" makes for lousy terminology. A generator, umm, *generates* a sequence of values. It's neither more specific nor more general than that, so we're pretty much limited to vaguely suggestive terms like "generator" and "iterator"; Python already used the latter word for something else. I'd be happy to call them pink flamingos. > If I understand correctly, "generators" are coroutines They're formally semi-coroutines; it's not symmetric. > that have peer-to-peer synchronized messaging (synchronizing and > communicating at the "yield" points). Way too highfalutin' a view. Think of a generator as a resumable function, and you're not missing anything -- not even an implementation subtlety. They *are* resumable functions. A "yield" is just a "return", but with the twist that the function can resume executing after the "yield" again. If you also think of ordinary call/return as a peer-to-peer etc etc, then I suppose you're stuck with that view here too. > To my mind, "generators" does not evoke that image at all. Good, because that image was overblown beyond recognition . >> although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- >> full-of-crufty-old-irix5-demos-in-the-std-library-ly > Perhaps because the Irix community would be quite Irate if they were > removed? Doubt it: the Irix5 library files haven't really been touched since 1993. For several years we've also shipped an Irix6 library with all the same stuff. But I suppose releasing a new OS was a symptom of SGI picking on its users too . From tim.one@home.com Wed Jun 27 06:14:29 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:14:29 -0400 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: The _winreg project no longer links: Creating library ./_winreg_d.lib and object ./_winreg_d.exp _winreg.obj : error LNK2001: unresolved external symbol __imp__PyUnicode_DecodeMBCS The compilation of PyUnicode_DecodeMBCS in unicodeobject.c is in a #if defined(MS_WIN32) && defined(HAVE_USABLE_WCHAR_T) block. But the top of unicodeobject.h now wraps the enabling # if defined(MS_WIN32) && !defined(USE_UCS4_STORAGE) # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif block inside a #ifndef PY_UNICODE_TYPE block, and a change to PC/config.h: #define PY_UNICODE_TYPE unsigned short stops all that. IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and that prevents unicodeobject.c from supplying routines _winreg.c calls. leaving-it-to-an-expert-who-thinks-they-know-what-all-these-symbols- are-supposed-to-really-mean-ly y'rs - tim From greg@cosc.canterbury.ac.nz Wed Jun 27 06:41:46 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 17:41:46 +1200 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" Message-ID: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> I'm trying to install Python-2.1 on Windows, and I keep getting "Corrupt Installation Detected" when I run the installer. >From other postings I've seen about this message, it means that the installer itself is corrupted somehow. But everything I can think of doing to test its integrity says that it's okay. I can open the installer with WinZip and "Test" it, and it passes. I ftp'ed it to another machine which has Python installed (in binary mode) and ran md5sum.py on it, and the result matches the checksum advertised on the web page. I also tried downloading Python-2.0.1.exe, but it says "Corrupt Installation Detected" as well. Does anyone have any idea what is going on here? -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From tim.one@home.com Wed Jun 27 06:53:01 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:53:01 -0400 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" In-Reply-To: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm trying to install Python-2.1 on Windows, > and I keep getting "Corrupt Installation Detected" > when I run the installer. [but no other evidence that > it's actually corrupt] You didn't say which flavor of Windows, but should have . Ditto what it is you're running (the PythonLabs distro? ActiveState's? PythonWare's?). Known causes for this from the PythonLabs installer include (across various flavors of Windows), in decreasing order of likelihood: + Trying to install while logged in to an account with insufficient permissions (try logging in as Adminstrator, if on a version of Windows where that makes sense). + Trying to install over a network. Copy the installer to a local disk first. + Conflicts with anti-virus software (disable it -- indeed, my Win9x Life got much saner after I wiped Norton AntiVirus from my hard drive). + Conflicts with other running programs (like installer splash screens always say, close all other programs). + Insufficient memory, disk space, or magic low-level Windows resources. + There may or may not be a problem unique to French versions of Windows. Any of those apply? From martin@loewis.home.cs.tu-berlin.de Wed Jun 27 08:12:11 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 27 Jun 2001 09:12:11 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > that prevents unicodeobject.c from supplying routines _winreg.c > calls. The best thing, IMO, would be if PC/config.h defines everything available in config.h also. In this case, the proper defines would be #define Py_USING_UNICODE #define HAVE_USABLE_WCHAR_T #define Py_UNICODE_SIZE 2 #define PY_UNICODE_TYPE wchar_t If that approach is used, the defaulting in Include/unicodeobject.h could go away. Alternatively, define only Py_USING_UNICODE of this in PC/config.h, and change the block in Include/unicodeobject.h to /* Windows has a usable wchar_t type (unless we're using UCS-4) */ # ifdef MS_WIN32 # ifdef USE_UCS4_STORAGE # define Py_UNICODE_SIZE 4 # define PY_UNICODE_TYPE unsigned int # else # define Py_UNICODE_SIZE 2 # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif # endif Regards, Martin From tim.one@home.com Wed Jun 27 08:39:38 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 27 Jun 2001 03:39:38 -0400 Subject: [Python-Dev] New Unicode warnings Message-ID: There are 3 functions now where the prototypes in unicodeobject.h don't match the definitions in unicodeobject.c. Like, in .h, extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( register const Py_UNICODE ch /* Unicode character */ ); but in .c: Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) That is, they disagree about const (a silly language idea if ever there was one ). The others (I haven't check these for the exact reason(s), but assume they're the same deal): _PyUnicode_ToUppercase _PyUnicode_ToLowercase From Armin.Rigo@ima.unil.ch Wed Jun 27 10:01:18 2001 From: Armin.Rigo@ima.unil.ch (RIGO Armin) Date: Wed, 27 Jun 2001 11:01:18 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B391D88.305CCB4E@ActiveState.com> Message-ID: On Tue, 26 Jun 2001, Paul Prescod wrote: > Armin Rigo wrote: > > I am considering using GNU Lightning to produce code from the Psyco > > compiler. (...) > > Core Python has no GPLed components. I would hate to have you put in a > bunch of work worthy of inclusion in core Python to see it rejected on > those grounds. Good remark. Anyone else has comments about this ? Psyco would probably not be part of the core Python, but only an extension module; but your objection is nevertheless valid. Any alternatives ? I am considering a more theoretical approach, based on Tunes (http://tunes.org) as mentionned in Psyco's readme file, but this would take a lot more time -- althought it might give much more impressive results. Armin. From PyChecker Wed Jun 27 12:48:00 2001 From: PyChecker (Neal Norwitz) Date: Wed, 27 Jun 2001 07:48:00 -0400 Subject: [Python-Dev] ANN: PyChecker version 0.6.1 Message-ID: <3B39C7F0.2CA171C5@metaslash.com> A new version of PyChecker is available for your hacking pleasure. PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Comments, criticisms, new ideas, and other feedback is welcome. Here's the CHANGELOG: * Check format strings: "%s %s %s" % (v1, v2, v3, v4) for arg counts * Warn when format strings do: '%(var) %(var2)' * Fix Local variable (xxx) not used, when have: "%(xxx)s" % locals() * Warn when local variable (xxx) doesn't exist and have: "%(xxx)s" % locals() * Install script in /usr/local/bin to invoke PyChecker * Don't produce unused global warnings when using a module in parameters * Don't produce unused global warnings when using a module in class variables * Add check when using method as an attribute (if self.method and x == y:) * Add check for right # of args to object construction * Add check for right # of args to function calls in other modules * Check for returning a value from __init__ * Fix using from XX import YY ; from XX import ZZ causing re-import warning * Fix UNABLE TO IMPORT errors for files that don't end with a newline * Support for checking consistent return values -- not complete produces too many false positives (off by default, use -r/--returnvalues to enable) PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker@metaslash.com From paulp@ActiveState.com Wed Jun 27 12:53:08 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 27 Jun 2001 04:53:08 -0700 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B39C924.E865177D@ActiveState.com> RIGO Armin wrote: > >... > > I am considering a more theoretical approach, based on Tunes > (http://tunes.org) as mentionned in Psyco's readme file, but this would > take a lot more time -- althought it might give much more impressive > results. If you are thinking about incorporating some ideas from Tunes that's one thing. But if you want to use their code I would ask "what code?" I have heard about Tunes for several years now and not seen any visible forward progress. See also: http://tunes.org/Tunes-FAQ-6.html#ss6.2 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mark.favas@csiro.au Wed Jun 27 12:48:37 2001 From: mark.favas@csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 19:48:37 +0800 Subject: [Python-Dev] More unicode blues... Message-ID: <3B39C815.E9CDF41B@csiro.au> unicodectype.c now fails to compile, because ch is declared const, and then assigned to. Tim has (apparently) had similar problems, but in his case the compiler just gives a warning, rather than an error.: cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal@lemburg.com Wed Jun 27 13:10:57 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 27 Jun 2001 14:10:57 +0200 Subject: [Python-Dev] Unicode Maintenance Message-ID: <3B39CD51.406C28F0@lemburg.com> Looking at the recent burst of checkins for the Unicode implementation completely bypassing the standard SF procedure and possible comments I might have on the different approaches, I guess I've been ruled out as maintainer and designer of the Unicode implementation. Well, I guess that's how things go. Was nice working for you guys, but no longer is... I'm tired of having to defend myself against meta-comments about the design, uncontrolled checkins and no true backup about my standing in all this from Guido. Perhaps I am misunderstanding the role of a maintainer and implementation designer, but as it is all respect for the work I've put into all this seems faded. That's the conclusion I draw from recent postings by Martin and Fredrik and their nightly "takeover". Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From arigo@ulb.ac.be Wed Jun 27 13:18:43 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Wed, 27 Jun 2001 14:18:43 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B39C924.E865177D@ActiveState.com> Message-ID: Hello Paul, On Wed, 27 Jun 2001, Paul Prescod wrote: > If you are thinking about incorporating some ideas from Tunes that's one > thing. But if you want to use their code I would ask "what code?" I have > heard about Tunes for several years now and not seen any visible forward > progress. Yes, I know this. I am myself a (recent) member of the Tunes project, and have made Tunes' goals mine. Armin From guido@digicool.com Wed Jun 27 15:32:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 10:32:23 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Wed, 27 Jun 2001 11:01:18 +0200." References: Message-ID: <200106271432.f5REWOn19377@odiug.digicool.com> > Good remark. Anyone else has comments about this ? Not really, except to emphasize that inclusion of GPL'ed code in core Python is indeed a no-no. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed Jun 27 15:48:02 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:48:02 +0200 Subject: [Python-Dev] New Unicode warnings References: Message-ID: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> tim peters wrote: > There are 3 functions now where the prototypes in unicodeobject.h don't > match the definitions in unicodeobject.c. Like, in .h, > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > register const Py_UNICODE ch /* Unicode character */ > ); > > but in .c: > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) what's that "register" doing in a prototype? any reason we cannot just change the signature(s) to Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) to make it look more like contemporary C code? From fredrik@pythonware.com Wed Jun 27 15:49:31 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:49:31 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken References: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> Message-ID: <00a101c0ff19$e2a19740$4ffa42d5@hagrid> martin wrote: > > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > > that prevents unicodeobject.c from supplying routines _winreg.c > > calls. > > The best thing, IMO, would be if PC/config.h defines everything > available in config.h also. In this case, the proper defines would be > > #define Py_USING_UNICODE > #define HAVE_USABLE_WCHAR_T > #define Py_UNICODE_SIZE 2 > #define PY_UNICODE_TYPE wchar_t > > If that approach is used, the defaulting in Include/unicodeobject.h > could go away. my fault; I missed the HAVE_USABLE_WCHAR_T define when I tried to fix tim's fix. I'll fix it. From guido@digicool.com Wed Jun 27 16:07:47 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 11:07:47 -0400 Subject: [Python-Dev] New Unicode warnings In-Reply-To: Your message of "Wed, 27 Jun 2001 16:48:02 +0200." <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> References: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> Message-ID: <200106271507.f5RF7lq19494@odiug.digicool.com> > tim peters wrote: > > > There are 3 functions now where the prototypes in unicodeobject.h don't > > match the definitions in unicodeobject.c. Like, in .h, > > > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > > register const Py_UNICODE ch /* Unicode character */ > > ); > > > > but in .c: > > > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) > > what's that "register" doing in a prototype? Enjoying a day off? > any reason we cannot just change the signature(s) to > > Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) > > to make it look more like contemporary C code? > > I cannot see how either register or const are going to make any difference in the prototype given that Py_UNICODE is a scalar type, so please just do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From JamesL@Lugoj.Com Wed Jun 27 16:58:54 2001 From: JamesL@Lugoj.Com (James Logajan) Date: Wed, 27 Jun 2001 08:58:54 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B3A02BE.21039365@Lugoj.Com> Tim Peters wrote: > > [James Logajan] > > Design mistakes one has made do tend to weigh on one's soul (speaking > > from more than two decades of programming experience) so I understand > > the primal urge to correct them when one can, and even when one > > shouldn't. > > Is this a case when one shouldn't? That is, is it a specific comment on PEP > 260, or just a general venting here? Just a general bit of silly "" venting. Insert some non-zero fraction in the wink. I tried to insert some obvious absurdities to indicate I was not being very serious. (Yes, I know that one shouldn't try that in mixed company.) From guido@digicool.com Wed Jun 27 17:11:49 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:11:49 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 14:10:57 +0200." <3B39CD51.406C28F0@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> Message-ID: <200106271611.f5RGBn819631@odiug.digicool.com> > Looking at the recent burst of checkins for the Unicode implementation > completely bypassing the standard SF procedure and possible comments > I might have on the different approaches, I guess I've been ruled out > as maintainer and designer of the Unicode implementation. > > Well, I guess that's how things go. Was nice working for you guys, > but no longer is... I'm tired of having to defend myself against > meta-comments about the design, uncontrolled checkins and no true > backup about my standing in all this from Guido. > > Perhaps I am misunderstanding the role of a maintainer and > implementation designer, but as it is all respect for the work I've > put into all this seems faded. That's the conclusion I draw from recent > postings by Martin and Fredrik and their nightly "takeover". > > Thanks, > -- > Marc-Andre Lemburg [For those of us to whom Marc-Andre's complaint comes as a total surprise: there was a thread on i18n-sig about whether we should support Unicode surrogates, followed by a conclusion to skip surrogates and jump directly to optional support for UCS-4, followed by some checkins that enabled a configuration choice between UCS-2 and UCS-4, and code to make it work. As a side effect, surrogate support in the UCS-2 version actually improved slightly.] Now, now, Marc-Andre. The only comments I recall from you on my "surrogates: just say no" post seemed favorable, except that you proposed to to all the way and make UCS-4 mandatory. I explained why I didn't want to go that far, and why I didn't believe your arguments against giving users a choice. I didn't hear back from you then, and I didn't think you could have much of a problem with my position. Our process requires the use of the SF patch manager only for controversial changes. Based on your feedback, I didn't think there was anything controversial about the changes that Fredrik and Martin have made! (If there was, IMO it was temporarily breaking the Windows build and the test suite -- but that's all fixed now.) I don't understand where you get the idea that we lost respect for your work! In fact, the fact that it was so easy to make the changes suggested to me that the original design was well suited to this particular change (as opposed to the surrugate support proposals, which all sounded like they would require a *lot* of changes). I don't think that we have very strict roles in this community anyway. (My role as BDFL excluded -- that's why I get to write this response. :-) I'd say that Fredrik owns SRE, because he has asserted that ownership at various times: he's undone changes by others that broke the 1.5.2 support, for example. But the Unicode support in Python isn't owned by one person: many folks have contributed to that, including Fredrik, who designed and wrote the original Unicode string object implementation. If you have specific comments about the changes made, please be specific. If you feel slighted by meta-comments, please also be specific. I don't think I've said anything derogatory about you or your design. Paul Prescod offered to write a PEP on this issue. My cynical half believes that we'll never hear from him again, but my optimistic half hopes that he'll actually write one, so that we'll be able to discuss the various issues for the users with the users. I encourage you to co-author the PEP, since you have a lot of background knowledge about the issues. BTW, I think that Misc/unicode.txt should be converted to a PEP, for the historic record. It was very much a PEP before the PEP process was invented. Barry, how much work would this be? No editing needed, just formatting, and assignment of a PEP number (the lower the better). --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Wed Jun 27 17:24:30 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 12:24:30 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <15162.2238.720508.508081@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> BTW, I think that Misc/unicode.txt should be converted to a GvR> PEP, for the historic record. It was very much a PEP before GvR> the PEP process was invented. Barry, how much work would GvR> this be? No editing needed, just formatting, and assignment GvR> of a PEP number (the lower the better). Not much work at all, so I'll do this (and replace Misc/unicode.txt with a pointer to the PEP). Let's go with PEP 7, but stick it under the "Other Informational PEPs" category. -Barry From guido@digicool.com Wed Jun 27 17:36:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:36:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 12:24:30 EDT." <15162.2238.720508.508081@anthem.wooz.org> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> Message-ID: <200106271636.f5RGa5719660@odiug.digicool.com> > GvR> BTW, I think that Misc/unicode.txt should be converted to a > GvR> PEP, for the historic record. It was very much a PEP before > GvR> the PEP process was invented. Barry, how much work would > GvR> this be? No editing needed, just formatting, and assignment > GvR> of a PEP number (the lower the better). > > Not much work at all, so I'll do this (and replace Misc/unicode.txt > with a pointer to the PEP). Let's go with PEP 7, but stick it under > the "Other Informational PEPs" category. > > -Barry Rather than informational, how about "Standard Track - Accepted (or Final)" ? That really matches the history best. I'd propose PEP number 100 -- the below-100 series is more for meta-PEPs. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Wed Jun 27 18:05:35 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 13:05:35 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> <200106271636.f5RGa5719660@odiug.digicool.com> Message-ID: <15162.4703.741647.850696@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Rather than informational, how about "Standard Track - GvR> Accepted (or Final)" ? That really matches the history best. GvR> I'd propose PEP number 100 -- the below-100 series is more GvR> for meta-PEPs. Fine with me. -Barry From fdrake@acm.org Wed Jun 27 20:45:05 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 15:45:05 -0400 (EDT) Subject: [Python-Dev] New profiling interface Message-ID: <15162.14273.490573.156770@cj42289-a.reston1.va.home.com> The new core interface I checked in allows profilers and tracers (debuggers, coverage tools) to be written in C. I still need to write documentation for it; that shouldn't be too far off though. If anyone would like to have this available for Python 2.1.x, I have a version that I developed on the release20-maint branch. It can't be added to that branch since it's pretty clearly a new feature, but the patch is available at: http://starship.python.net/crew/fdrake/patches/py21-profiling.patch Enjoy! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas@csiro.au Wed Jun 27 22:45:17 2001 From: mark.favas@csiro.au (Mark Favas) Date: Thu, 28 Jun 2001 05:45:17 +0800 Subject: [Python-Dev] unicode, "const"s and lvalues Message-ID: <3B3A53ED.A8EEE265@csiro.au> Unreasonable as it may seem, my compiler really expects that entities declared as const's not be used in contexts where a modifiable lvalue is required. It gets all huffy, and refuses to continue compiling, even if I speak nicely (in unicode) to it. I'll file a bug report. On the code, not the compiler . cc -c -O -Olimit 1500 -Dss_family=__ss_family -Dss_len=__ss_len -I. -I./Include -DHAVE_CONFIG_H -o Objects/unicodectype.o Objects/unicodectype.c cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 362: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; ----^ cc: Error: Objects/unicodectype.c, line 366: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 378: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->lower; ----^ cc: Error: Objects/unicodectype.c, line 382: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ make: *** [Objects/unicodectype.o] Error 1 -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido@digicool.com Wed Jun 27 22:57:16 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 17:57:16 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: Your message of "Thu, 28 Jun 2001 05:45:17 +0800." <3B3A53ED.A8EEE265@csiro.au> References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <200106272157.f5RLvGo20101@odiug.digicool.com> > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. On the code, > not the compiler . VC++ also warns about this. I think the declaration of the Character Type APIs in unicodeobject.h really shouldn't include either register or char. Then their implementations should also lose the 'const'. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed Jun 27 22:58:34 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 27 Jun 2001 17:58:34 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: <3B3A53ED.A8EEE265@csiro.au> Message-ID: [Mark Favas] > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. No real need, this was already brought up about 13 hours ago, although maybe that was only on the i18n-sig. I was left with the vague impression that Fredrik intended to fix it. If it's not fixed by tomorrow, you can make me feel guilty enough to fix it (I first reported it, so I guess it's my problem ). could've-been-yours!-ly y'rs - tim From fredrik@pythonware.com Wed Jun 27 23:42:14 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 28 Jun 2001 00:42:14 +0200 Subject: [Python-Dev] unicode, "const"s and lvalues References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <00b701c0ff5a$6ab8f660$4ffa42d5@hagrid> mark wrote: > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. it's fixed now, I think. (btw, unreasonable as it may seem, your mail server refuses to accept mail sent to your reply address, even if I speak nicely to it ;-) Cheers /F From fdrake@acm.org Thu Jun 28 03:44:54 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 22:44:54 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Is anyone here using NIS (Sun's old "Yellow Pages" service)? There's a bug for this on Linux that's been assigned to me for some time, but I don't have access to a network using NIS. Can anyone either confirm the bug or the fix? Or at least confirm that the suggested fix doesn't break the nis module on some other platform? (Testing this on a Sun SPARC box would be really nice!) I'd really appreciate some help on this one. The bug report is: http://sourceforge.net/tracker/index.php?func=detail&aid=233084&group_id=5470&atid=105470 Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas@xs4all.net Thu Jun 28 09:13:09 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 28 Jun 2001 10:13:09 +0200 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> References: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Message-ID: <20010628101309.X8098@xs4all.nl> On Wed, Jun 27, 2001 at 10:44:54PM -0400, Fred L. Drake, Jr. wrote: > Is anyone here using NIS (Sun's old "Yellow Pages" service)? > There's a bug for this on Linux that's been assigned to me for some > time, but I don't have access to a network using NIS. Can anyone > either confirm the bug or the fix? Or at least confirm that the > suggested fix doesn't break the nis module on some other platform? > (Testing this on a Sun SPARC box would be really nice!) > I'd really appreciate some help on this one. The bug report is: If noone else pops up, I'll setup a small NIS network at home to test it when my new computer arrives (a week or two.) We use NIS a lot at work, but not on Linux machines (the 16-bit uid limitation prevented us from using Linux for user-accessible machines for a long time.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Thu Jun 28 10:04:07 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 28 Jun 2001 11:04:07 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <3B3AF307.6496AFB4@lemburg.com> Guido van Rossum wrote: > > > Looking at the recent burst of checkins for the Unicode implementation > > completely bypassing the standard SF procedure and possible comments > > I might have on the different approaches, I guess I've been ruled out > > as maintainer and designer of the Unicode implementation. > > > > Well, I guess that's how things go. Was nice working for you guys, > > but no longer is... I'm tired of having to defend myself against > > meta-comments about the design, uncontrolled checkins and no true > > backup about my standing in all this from Guido. > > > > Perhaps I am misunderstanding the role of a maintainer and > > implementation designer, but as it is all respect for the work I've > > put into all this seems faded. That's the conclusion I draw from recent > > postings by Martin and Fredrik and their nightly "takeover". > > > > Thanks, > > -- > > Marc-Andre Lemburg > > [For those of us to whom Marc-Andre's complaint comes as a total > surprise: there was a thread on i18n-sig about whether we should > support Unicode surrogates, followed by a conclusion to skip > surrogates and jump directly to optional support for UCS-4, followed > by some checkins that enabled a configuration choice between UCS-2 and > UCS-4, and code to make it work. As a side effect, surrogate support > in the UCS-2 version actually improved slightly.] > > Now, now, Marc-Andre. > > The only comments I recall from you on my "surrogates: just say no" > post seemed favorable, except that you proposed to to all the way and > make UCS-4 mandatory. I explained why I didn't want to go that far, > and why I didn't believe your arguments against giving users a choice. > I didn't hear back from you then, and I didn't think you could have > much of a problem with my position. > > Our process requires the use of the SF patch manager only for > controversial changes. Based on your feedback, I didn't think there > was anything controversial about the changes that Fredrik and Martin > have made! (If there was, IMO it was temporarily breaking the Windows > build and the test suite -- but that's all fixed now.) > > I don't understand where you get the idea that we lost respect for > your work! In fact, the fact that it was so easy to make the changes > suggested to me that the original design was well suited to this > particular change (as opposed to the surrugate support proposals, > which all sounded like they would require a *lot* of changes). > > I don't think that we have very strict roles in this community anyway. > (My role as BDFL excluded -- that's why I get to write this > response. :-) I'd say that Fredrik owns SRE, because he has asserted > that ownership at various times: he's undone changes by others that > broke the 1.5.2 support, for example. > > But the Unicode support in Python isn't owned by one person: many > folks have contributed to that, including Fredrik, who designed and > wrote the original Unicode string object implementation. > > If you have specific comments about the changes made, please be > specific. If you feel slighted by meta-comments, please also be > specific. I don't think I've said anything derogatory about you or > your design. You didn't get my point. I feel responsable for the Unicode implementation design and would like to see it become a continued success. In that sense and taking into account that I am the maintainer of all this stuff, I think it is very reasonable to ask me before making any significant changes to the implementation and also respect any comments I put forward. Currently, I have to watch the checkins list very closely to find out who changed what in the implementation and then to take actions only after the fact. Since I'm not supporting Unicode as my full-time job this is simply impossible. We have the SF manager and there is really no need to rush anything around here. If I am offline or too busy with other things for a day or two, then I want to see patches on SF and not find new versions of the implementation already checked in. This has worked just fine during the last year, so I can only explain the latest actions in this direction with an urge to bypass my comments and any discussion this might cause. Needless to say that quality control is not possible anymore. Conclusion: I am not going to continue this work if this does not change. Another other problem for me is the continued hostility I feel on i18n against parts of the design and some of my decisions. I am not talking about your feedback and the feedback from many other people on the list which was excellent and to high standards. But reading the postings of the last few months you will find notices of what I am referring to here (no, I don't want to be specific). If people don't respect my comments or decision, then how can I defend the design and how can I stop endless discussions which simply don't lead anywhere ? So either I am missing something or there is a need for a clear statement from you about my status in all this. If I don't have the right to comment on proposals and patches, possibly even rejecting them, then I simply don't see any ground for keeping the implementation in a state which I can maintain. And last but not least: The fun-factor has faded which was the main motor driving my into working on Unicode in the first place. Nothing much you can do about this, though :-/ > Paul Prescod offered to write a PEP on this issue. My cynical half > believes that we'll never hear from him again, but my optimistic half > hopes that he'll actually write one, so that we'll be able to discuss > the various issues for the users with the users. I encourage you to > co-author the PEP, since you have a lot of background knowledge about > the issues. I guess your optimistic half won :-) I think Paul already did all the work, so I'll simply comment on what he wrote. > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > the historic record. It was very much a PEP before the PEP process > was invented. Barry, how much work would this be? No editing needed, > just formatting, and assignment of a PEP number (the lower the better). Thanks for converting the text to PEP format, Barry. Thanks for reading this far, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Thu Jun 28 13:25:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 28 Jun 2001 08:25:14 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Thu, 28 Jun 2001 11:04:07 +0200." <3B3AF307.6496AFB4@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> Message-ID: <200106281225.f5SCPIr20874@odiug.digicool.com> Hi Marc-Andre, I'm dropping the i18n-sig from the distribution list. I hear you: > You didn't get my point. I feel responsable for the Unicode > implementation design and would like to see it become a continued > success. I'm sure we all share this goal! > In that sense and taking into account that I am the > maintainer of all this stuff, I think it is very reasonable to > ask me before making any significant changes to the implementation > and also respect any comments I put forward. I understand you feel that we've rushed this in without waiting for your comments. Given how close your implementation was, I still feel that the changes weren't that significant, but I understand that you get nervous. If Christian were to check in his speed hack changes to the guts of ceval.c I would be nervous too! (Heck, I got nervous when Eric checked in his library-wide string method changes without asking.) Next time I'll try to be more sensitive to situations that require your review before going forward. > Currently, I have to watch the checkins list very closely > to find out who changed what in the implementation and then to > take actions only after the fact. Since I'm not supporting Unicode > as my full-time job this is simply impossible. We have the SF manager > and there is really no need to rush anything around here. Hm, apart from the fact that you ought to be left in charge, I think that in this case the live checkins were a big win over the usual SF process. At least two people were making changes, sometimes to each other's code, and many others on at least three continents were checking out the changes on many different platforms and immediately reporting problems. We would definitely not have a patch as solid as the code that's now checked in, after two days of using SF! (We could've used a branch, but I've found that getting people to actually check out the branch is not easy.) So I think that the net result was favorable. Sometimes you just have to let people work in the spur of the moment to get the results of their best thinking, otherwise they lose interest or their train of thought. > If I am offline or too busy with other things for a day or two, > then I want to see patches on SF and not find new versions of > the implementation already checked in. That's still the general rule, but in our enthousiasm (and mine was definitely part of this!) we didn't want to wait. Also, I have to admit that I mistook your silence for consent -- I didn't think the main proposed changes (making the size of Py_UNICODE a config choice) were controversial at all, so I didn't realize you would have a problem with it. > This has worked just fine during the last year, so I can only explain > the latest actions in this direction with an urge to bypass my comments > and any discussion this might cause. I think you're projecting your own stuff here. I honestly didn't think there was much disagreement on your part and thought we were doing you a favor by implementing the consensus. IMO, Martin and and Fredrik are familiar enough with both the code and the issues to do a good job. > Needless to say that > quality control is not possible anymore. Unclear. Lots of other people looked over the changes in your absence. And CVS makes code review after it's checked in easy enough. (Hey, in many other open source projects that's the normal procedure once the rough characteristics of a feature have been agreed upon: check in first and review later!) > Conclusion: > I am not going to continue this work if this does not change. That would be sad, and I hope you will stay with us. We certainly don't plan to ignore your comments! > Another other problem for me is the continued hostility I feel on i18n > against parts of the design and some of my decisions. I am > not talking about your feedback and the feedback from many other > people on the list which was excellent and to high standards. > But reading the postings of the last few months you will > find notices of what I am referring to here (no, I don't want > to be specific). I don't know what to say about this, and obviously nobody has the time to go back and read the archives. I'm sure it's not you as a person that was attacked. If the design isn't perfect -- and hey, since Python is the 80 percent language, few things in it are quite perfect! -- then (positive) criticism is an attempt to help, to move it closer to perfection. If people have at times said "the Unicode support sucks", well, that may hurt. You can't always stay friends with everybody. I get flames occasionally for features in Python that folks don't like. I get used to them, and it doesn't affect my confidence any more. Be the same! But sometimes, after saying "it sucks", people make specific suggestions for improvements, and it's important to be open for those even from sources that use offending language. (Within reason, of course. I don't ask you to listen to somebody who is persistently hostile to you as a person.) > If people don't respect my comments or decision, then how can > I defend the design and how can I stop endless discussions which > simply don't lead anywhere ? So either I am missing something > or there is a need for a clear statement from you about > my status in all this. Do you really *want* to be the Unicode BDFL? Being something's BDFL a full-time job, and you've indicated you're too busy. (Or is that temporary?) I see you as the original coder, which means that you know that section of the code better than anyone, and whenever there's a question that others can't answer about its design, implementation, or restrictions, I refer to you. But given that you've said you wouldn't be able to work much on it, I welcome contributions by others as long as they seem knowledgeable. > If I don't have the right to comment on proposals and patches, > possibly even rejecting them, then I simply don't see any > ground for keeping the implementation in a state which I can > maintain. Nobody said you couldn't comment, and you know that. When it comes to rejecting or accepting, I feel that I am still the final arbiter, even for Unicode, until I get hit by a bus. Since I don't always understand the implementation or the issues, I'll of course defer to you in cases where I think I can't make the decision, but I do reserve the right to be convinced by others to override your judgement, occasionally, if there's a good reason. And when you're not responsive, I may try to channel you. (I'll try to be more explicit about that.) > And last but not least: The fun-factor has faded which was > the main motor driving my into working on Unicode in the first > place. Nothing much you can do about this, though :-/ Yes, that happens to all of us at times. The fun factor goes up and down, and sometimes we must look for fun elsewhere for a while. Then the fun may come back where it appeared lost. Go on vacation, read a book, tackle a new project in a totally different area! Then come back and see if you can find some fun in the old stuff again. > > Paul Prescod offered to write a PEP on this issue. My cynical half > > believes that we'll never hear from him again, but my optimistic half > > hopes that he'll actually write one, so that we'll be able to discuss > > the various issues for the users with the users. I encourage you to > > co-author the PEP, since you have a lot of background knowledge about > > the issues. > > I guess your optimistic half won :-) I think Paul already did all the > work, so I'll simply comment on what he wrote. Your suggestions were very valuable. My opinion of Paul also went up a notch! > > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > > the historic record. It was very much a PEP before the PEP process > > was invented. Barry, how much work would this be? No editing needed, > > just formatting, and assignment of a PEP number (the lower the better). > > Thanks for converting the text to PEP format, Barry. > > Thanks for reading this far, You're welcome, and likewise. Just one more thing, Marc-Andre. Please know that I respect your work very much even if we don't always agree. We would get by without you, but Python would be hurt if you turned your back on us. --Guido van Rossum (home page: http://www.python.org/~guido/) From arigo@ulb.ac.be Thu Jun 28 14:04:06 2001 From: arigo@ulb.ac.be (Armin Rigo) Date: Thu, 28 Jun 2001 15:04:06 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B393E92.B0719A7A@ulb.ac.be> Message-ID: On Tue, 26 Jun 2001, Armin Rigo wrote: > I am considering using GNU Lightning to produce code from the Psyco > compiler. I just found "vcode" (http://www.pdos.lcs.mit.edu/~engler/pldi96-abstract.html), which seems very interesting for portable JIT code generation. I am considering using it for Psyco. Has someone some experience with vcode ? Or any other comments ? Armin. From gball@cfa.harvard.edu Thu Jun 28 16:26:36 2001 From: gball@cfa.harvard.edu (Greg Ball) Date: Thu, 28 Jun 2001 11:26:36 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: Short version: I can confirm that bug under linux, but the patch breaks nis module on solaris. Linux machine is: Linux malhar 2.2.16-3smp #1 SMP Mon Jun 19 17:37:04 EDT 2000 i686 unknown with python version from recent CVS. I see the reported bug and the suggested patch does fix the problem. Sparc box looks like this: SunOS cfa0 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-Enterprise using python2.0 source tree. The nis module works out of the box, but applying the suggested patch breaks it: 'nis.error: No such key in map'. --Greg Ball From gregor@hoffleit.de Thu Jun 28 20:56:35 2001 From: gregor@hoffleit.de (Gregor Hoffleit) Date: Thu, 28 Jun 2001 21:56:35 +0200 Subject: [Python-Dev] MAGIC after 2001 ? Message-ID: <20010628215635.A5621@53b.hoffleit.de> Correct me, but AFAICS there are only 186 days left until Python's MAGIC scheme overflows: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) I couldn't find this problem in the SF bug tracking system. Should I submit a new bug entry ? Gregor From jack@oratrix.nl Thu Jun 28 22:03:47 2001 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:03:47 +0200 Subject: [Python-Dev] Passing silly values to time.strftime Message-ID: <20010628210352.33157120260@oratrix.oratrix.nl> Just noted (that's Just-the-person, not me-just-noting:-) that on the Mac time.strftime() can blow up with an access violation if you pass silly values to it (such as 9 zeroes). Does anyone know enough of the ANSI standard to tell me how strftime should behave with out-of-range values? I.e. should I report this as a bug to MetroWerks or should we rig up time.strftime() to check that all the values are in range? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack@oratrix.nl Thu Jun 28 22:12:45 2001 From: jack@oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:12:45 +0200 Subject: [Python-Dev] Passing silly values to time.strftime In-Reply-To: Message by Jack Jansen , Thu, 28 Jun 2001 23:03:47 +0200 , <20010628210352.33157120260@oratrix.oratrix.nl> Message-ID: <20010628211250.4A6BC120260@oratrix.oratrix.nl> Recently, Jack Jansen said: > Just noted (that's Just-the-person, not me-just-noting:-) that on the > Mac time.strftime() can blow up with an access violation if you pass > silly values to it (such as 9 zeroes). Following up to myself, after I just noticed (just-me-noticing, not Just-the-person this time) that all zeros is a legal C value: gettmarg() converts this all-zeroes tuple to (0, 0, 0, 0, -1, 100, 0, -1, 0) Fine with me, apparently Python wants to have human-understandable (1-based) monthnumbers and yeardaynumbers, but then I think it really should also check that the values are in-range. What do others think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Jason.Tishler@dothill.com Thu Jun 28 22:17:15 2001 From: Jason.Tishler@dothill.com (Jason Tishler) Date: Thu, 28 Jun 2001 17:17:15 -0400 Subject: [Python-Dev] Threaded Cygwin Python Import Problem Message-ID: <20010628171715.P488@dothill.com> --arHUdbnEaPHuuMIt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now provides enough pthreads support so that Cygwin Python builds OOTB *and* functions reasonably well even with threads enabled. Unfortunately, there are still a few issues that need to be resolved. The one that I would like to address in this posting prevents a threaded Cygwin Python from building the standard extension modules (without some kind of intervention). :,( Specifically, the build would frequently hang during the Distutils part when Cygwin Python is attempting to execvp a gcc process. See the first attachment, test.py, for a minimal Python script that exhibits the hang. See the second attachment, test.c, for a rewrite of test.py in C. Since test.c did not hang, I was able to conclude that this was not just a straight Cygwin problem. Further tracing uncovered that the hang occurs in _execvpe() (in os.py), when the child tries to import tempfile. If I apply the third attachment, os.py.patch, then the hang is avoided. Hence, it appears that importing a module (or specifically the tempfile module) in a threaded Cygwin Python child cause a hang. I saw the following comment in _execvpe(): # Process handling (fork, wait) under BeOS (up to 5.0) # doesn't interoperate reliably with the thread interlocking # that happens during an import. The actual error we need # is the same on BeOS for posix.open() et al., ENOENT. The above makes me think that possibly Cygwin is having a similar problem. Can anyone offer suggestions on how to further debug this problem? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com --arHUdbnEaPHuuMIt Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="test.py" import os cmd = ['ls', '-l'] pid = os.fork() if pid == 0: print 'child execvp-ing' os.execvp(cmd[0], cmd) else: (pid, status) = os.waitpid(pid, 0) print 'status =', status print 'parent done' --arHUdbnEaPHuuMIt Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="test.c" #include #include char* const cmd[] = {"ls", "-l", 0}; int main() { int status; pid_t pid = fork(); if (pid == 0) { printf("child execvp-ing\n"); execvp(cmd[0], cmd); } else { waitpid(pid, &status, 0); printf("status = %d\n", status); printf("parent done\n"); } } --arHUdbnEaPHuuMIt Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="os.py.patch" --- os.py.orig Thu Jun 28 16:14:28 2001 +++ os.py Thu Jun 28 16:30:12 2001 @@ -329,8 +329,9 @@ def _execvpe(file, args, env=None): try: unlink('/_#.# ## #.#') except error, _notfound: pass else: - import tempfile - t = tempfile.mktemp() + #import tempfile + #t = tempfile.mktemp() + t = '/mnt/c/TEMP/@279.3' # Exec a file that is guaranteed not to exist try: execv(t, ('blah',)) except error, _notfound: pass --arHUdbnEaPHuuMIt-- From tim@digicool.com Thu Jun 28 22:24:17 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 28 Jun 2001 17:24:17 -0400 Subject: [Python-Dev] MAGIC after 2001 ? In-Reply-To: <20010628215635.A5621@53b.hoffleit.de> Message-ID: [Gregor Hoffleit] > Correct me, Can't: you're correct. > but AFAICS there are only 186 days left until Python's MAGIC scheme > overflows: > > /* XXX Perhaps the magic number should be frozen and a version field > added to the .pyc file header? */ > /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ > #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) > > I couldn't find this problem in the SF bug tracking system. Should I > submit a new bug entry ? Somebody should! It's a known problem, but the last crusade to redefine it ended up with 85% of a spec but no worker bees. If that continues, note that it has no effect on whether existing Python releases will continue to run, it just means we can't release new versions -- but now that the licensing issue is settled, I think we'll just close down the project instead . fun-while-it-lasted-ly y'rs - tim From paulp@ActiveState.com Fri Jun 29 03:59:45 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 28 Jun 2001 19:59:45 -0700 Subject: [Python-Dev] [Fwd: PEP: Support for "wide" Unicode characters] Message-ID: <3B3BEF21.63411C4C@ActiveState.com> Slow python-dev day...consider this exiting new proposal to allow deal with important new characters like the Japanese dentristy symbols and ecological symbols (but not Klingon) -------- Original Message -------- Subject: PEP: Support for "wide" Unicode characters Date: Thu, 28 Jun 2001 15:33:00 -0700 From: Paul Prescod Organization: ActiveState To: "python-list@python.org" PEP: 261 Title: Support for "wide" Unicode characters Version: $Revision: 1.3 $ Author: paulp@activestate.com (Paul Prescod) Status: Draft Type: Standards Track Created: 27-Jun-2001 Python-Version: 2.2 Post-History: 27-Jun-2001, 28-Jun-2001 Abstract Python 2.1 unicode characters can have ordinals only up to 2**16 -1. These characters are known as Basic Multilinual Plane characters. There are now characters in Unicode that live on other "planes". The largest addressable character in Unicode has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR and call characters in this range "wide characters". Glossary Character Used by itself, means the addressable units of a Python Unicode string. Code point If you imagine Unicode as a mapping from integers to characters, each integer represents a code point. Some are really used for characters. Some will someday be used for characters. Some are guaranteed never to be used for characters. Unicode character A code point defined in the Unicode standard whether it is already assigned or not. Identified by an integer. Code unit An integer representing a character in some encoding. Surrogate pair Two code units that represnt a single Unicode character. Proposed Solution One solution would be to merely increase the maximum ordinal to a larger value. Unfortunately the only straightforward implementation of this idea is to increase the character code unit to 4 bytes. This has the effect of doubling the size of most Unicode strings. In order to avoid imposing this cost on every user, Python 2.2 will allow 4-byte Unicode characters as a build-time option. Users can choose whether they care about wide characters or prefer to preserve memory. The 4-byte option is called "wide Py_UNICODE". The 2-byte option is called "narrow Py_UNICODE". Most things will behave identically in the wide and narrow worlds. * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a length-one string. * unichr(i) for 2**16 <= i <= TOPCHAR will return a length-one string representing the character on wide Python builds. On narrow builds it will return ValueError. ISSUE: Python currently allows \U literals that cannot be represented as a single character. It generates two characters known as a "surrogate pair". Should this be disallowed on future narrow Python builds? ISSUE: Should Python allow the construction of characters that do not correspond to Unicode characters? Unassigned Unicode characters should obviously be legal (because they could be assigned at any time). But code points above TOPCHAR are guaranteed never to be used by Unicode. Should we allow access to them anyhow? * ord() is always the inverse of unichr() * There is an integer value in the sys module that describes the largest ordinal for a Unicode character on the current interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds of Python and TOPCHAR on wide builds. ISSUE: Should there be distinct constants for accessing TOPCHAR and the real upper bound for the domain of unichr (if they differ)? There has also been a suggestion of sys.unicodewith which can take the values 'wide' and 'narrow'. * codecs will be upgraded to support "wide characters" (represented directly in UCS-4, as surrogate pairs in UTF-16 and as multi-byte sequences in UTF-8). On narrow Python builds, the codecs will generate surrogate pairs, on wide Python builds they will generate a single character. This is the main part of the implementation left to be done. * there are no restrictions on constructing strings that use code points "reserved for surrogates" improperly. These are called "isolated surrogates". The codecs should disallow reading these but you could construct them using string literals or unichr(). unichr() is not restricted to values less than either TOPCHAR nor sys.maxunicode. Implementation There is a new (experimental) define: #define PY_UNICODE_SIZE 2 There is a new configure options: --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses wchar_t if it fits --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses whchar_t if it fits --enable-unicode same as "=ucs2" The intention is that --disable-unicode, or --enable-unicode=no removes the Unicode type altogether; this is not yet implemented. Notes This PEP does NOT imply that people using Unicode need to use a 4-byte encoding. It only allows them to do so. For example, ASCII is still a legitimate (7-bit) Unicode-encoding. Rationale for Surrogate Creation Behaviour Python currently supports the construction of a surrogate pair for a large unicode literal character escape sequence. This is basically designed as a simple way to construct "wide characters" even in a narrow Python build. ISSUE: surrogates can be created this way but the user still needs to be careful about slicing, indexing, printing etc. Another option is to remove knowledge of surrogates from everything other than the codecs. Rejected Suggestions There were two primary solutions that were rejected. The first was more or less the status-quo. We could officially say that Python characters represent UTF-16 code units and require programmers to implement wide characters in their application logic. This is a heavy burden because emulating 32-bit characters is likely to be very inefficient if it is coded entirely in Python. Plus these abstracted pseudo-strings would not be legal as input to the regular expression engine. The other class of solution is to use some efficient storage internally but present an abstraction of wide characters to the programmer. Any of these would require a much more complex implementation than the accepted solution. For instance consider the impact on the regular expression engine. In theory, we could move to this implementation in the future without breaking Python code. A future Python could "emulate" wide Python semantics on narrow Python. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- http://mail.python.org/mailman/listinfo/python-list From fdrake@acm.org Fri Jun 29 15:03:28 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 29 Jun 2001 10:03:28 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: References: Message-ID: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Greg Ball writes: > Short version: I can confirm that bug under linux, but the patch breaks > nis module on solaris. I'm presuming that these were using the same NIS server? I'm wondering if this may be an endianess-related problem. I don't understand enough about the NIS protocols to know what's going on in that module. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal@egenix.com Fri Jun 29 15:51:04 2001 From: mal@egenix.com (M.-A. Lemburg) Date: Fri, 29 Jun 2001 16:51:04 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> Message-ID: <3B3C95D8.518E5175@egenix.com> Paul Prescod wrote: > > Slow python-dev day...consider this exiting new proposal to allow deal > with important new characters like the Japanese dentristy symbols and > ecological symbols (but not Klingon) More comments... > -------- Original Message -------- > Subject: PEP: Support for "wide" Unicode characters > Date: Thu, 28 Jun 2001 15:33:00 -0700 > From: Paul Prescod > Organization: ActiveState > To: "python-list@python.org" > > PEP: 261 > Title: Support for "wide" Unicode characters > Version: $Revision: 1.3 $ > Author: paulp@activestate.com (Paul Prescod) > Status: Draft > Type: Standards Track > Created: 27-Jun-2001 > Python-Version: 2.2 > Post-History: 27-Jun-2001, 28-Jun-2001 > > Abstract > > Python 2.1 unicode characters can have ordinals only up to 2**16-1. > These characters are known as Basic Multilinual Plane characters. > There are now characters in Unicode that live on other "planes". > The largest addressable character in Unicode has the ordinal 17 * > 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR > and call characters in this range "wide characters". > > Glossary > > Character > > Used by itself, means the addressable units of a Python > Unicode string. > > Code point > > If you imagine Unicode as a mapping from integers to > characters, each integer represents a code point. Some are > really used for characters. Some will someday be used for > characters. Some are guaranteed never to be used for > characters. > > Unicode character > > A code point defined in the Unicode standard whether it is > already assigned or not. Identified by an integer. You're mixing terms here: being a character in Unicode is a property which is defined by the Unicode specs; not all code points are characters ! I'd suggest not to use the term character in this PEP at all; this is also what Mark Davis recommends in his paper on Unicode. That way people reading the PEP won't even start to confuse things since they will most likely have to read this glossary to understand what code point and code units are. Also, a link to the Unicode glossary would be a good thing. > Code unit > > An integer representing a character in some encoding. A code unit is the basic storage unit used by Unicode strings, e.g. u[0], not necessarily a character. > Surrogate pair > > Two code units that represnt a single Unicode character. Please add Unicode string A sequence of code units. and a note that on wide builds: code unit == code point. > Proposed Solution > > One solution would be to merely increase the maximum ordinal to a > larger value. Unfortunately the only straightforward > implementation of this idea is to increase the character code unit > to 4 bytes. This has the effect of doubling the size of most > Unicode strings. In order to avoid imposing this cost on every > user, Python 2.2 will allow 4-byte Unicode characters as a > build-time option. Users can choose whether they care about > wide characters or prefer to preserve memory. > > The 4-byte option is called "wide Py_UNICODE". The 2-byte option > is called "narrow Py_UNICODE". > > Most things will behave identically in the wide and narrow worlds. > > * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a > length-one string. > > * unichr(i) for 2**16 <= i <= TOPCHAR will return a > length-one string representing the character on wide Python > builds. On narrow builds it will return ValueError. > > ISSUE: Python currently allows \U literals that cannot be > represented as a single character. It generates two > characters known as a "surrogate pair". Should this be > disallowed on future narrow Python builds? Why not make the codec used by Python to convert Unicode literals to Unicode strings an option just like the default encoding ? That way we could have a version of the unicode-escape codec which supports surrogates and one which doesn't. > ISSUE: Should Python allow the construction of characters > that do not correspond to Unicode characters? > Unassigned Unicode characters should obviously be legal > (because they could be assigned at any time). But > code points above TOPCHAR are guaranteed never to > be used by Unicode. Should we allow access to them > anyhow? I wouldn't count on that last point ;-) Please note that you are mixing terms: you don't construct characters, you construct code points. Whether the concatenation of these code points makes a valid Unicode character string is an issue which applications and codecs have to decide. > * ord() is always the inverse of unichr() > > * There is an integer value in the sys module that describes the > largest ordinal for a Unicode character on the current > interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds > of Python and TOPCHAR on wide builds. > > ISSUE: Should there be distinct constants for accessing > TOPCHAR and the real upper bound for the domain of > unichr (if they differ)? There has also been a > suggestion of sys.unicodewith which can take the > values 'wide' and 'narrow'. > > * codecs will be upgraded to support "wide characters" > (represented directly in UCS-4, as surrogate pairs in UTF-16 and > as multi-byte sequences in UTF-8). On narrow Python builds, the > codecs will generate surrogate pairs, on wide Python builds they > will generate a single character. This is the main part of the > implementation left to be done. > > * there are no restrictions on constructing strings that use > code points "reserved for surrogates" improperly. These are > called "isolated surrogates". The codecs should disallow reading > these but you could construct them using string literals or > unichr(). unichr() is not restricted to values less than either > TOPCHAR nor sys.maxunicode. > > Implementation > > There is a new (experimental) define: > > #define PY_UNICODE_SIZE 2 > > There is a new configure options: > > --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses > wchar_t if it fits > --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses > whchar_t if it fits > --enable-unicode same as "=ucs2" > > The intention is that --disable-unicode, or --enable-unicode=no > removes the Unicode type altogether; this is not yet implemented. > > Notes > > This PEP does NOT imply that people using Unicode need to use a > 4-byte encoding. It only allows them to do so. For example, > ASCII is still a legitimate (7-bit) Unicode-encoding. > > Rationale for Surrogate Creation Behaviour > > Python currently supports the construction of a surrogate pair > for a large unicode literal character escape sequence. This is > basically designed as a simple way to construct "wide characters" > even in a narrow Python build. > > ISSUE: surrogates can be created this way but the user still > needs to be careful about slicing, indexing, printing > etc. Another option is to remove knowledge of > surrogates from everything other than the codecs. +1 on removing knowledge about surrogates from the Unicode implementation core (it's also the easiest: there is none :-) We should provide a new module which provides a few handy utilities though: functions which provide code point-, character-, word- and line- based indexing into Unicode strings. > Rejected Suggestions > > There were two primary solutions that were rejected. The first was > more or less the status-quo. We could officially say that Python > characters represent UTF-16 code units and require programmers to > implement wide characters in their application logic. This is a > heavy burden because emulating 32-bit characters is likely to be > very inefficient if it is coded entirely in Python. Plus these > abstracted pseudo-strings would not be legal as input to the > regular expression engine. > > The other class of solution is to use some efficient storage > internally but present an abstraction of wide characters > to the programmer. Any of these would require a much more complex > implementation than the accepted solution. For instance consider > the impact on the regular expression engine. In theory, we could > move to this implementation in the future without breaking Python > code. A future Python could "emulate" wide Python semantics on > narrow Python. > > Copyright > > This document has been placed in the public domain. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jepler@inetnebr.com Fri Jun 29 16:04:18 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Fri, 29 Jun 2001 10:04:18 -0500 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Jun 29, 2001 at 10:03:28AM -0400 References: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Message-ID: <20010629100416.A24069@inetnebr.com> On Fri, Jun 29, 2001 at 10:03:28AM -0400, Fred L. Drake, Jr. wrote: > > Greg Ball writes: > > Short version: I can confirm that bug under linux, but the patch breaks > > nis module on solaris. > > I'm presuming that these were using the same NIS server? I'm > wondering if this may be an endianess-related problem. I don't > understand enough about the NIS protocols to know what's going on in > that module. It's my suspicion that it depends how the "aliases" map is built. The patch that "broke" things for the Linux systems includes the comment /* created with 'makedbm -a' */ which makes me suspect that it's dependant on the way the map is constructed. (I couldn't find an online makedbm manpage which documents a -a option) Endian issues should not exist, the protocol below NIS/YP takes care of this. Jeff From guido@digicool.com Fri Jun 29 16:24:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 29 Jun 2001 11:24:56 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Fri, 29 Jun 2001 16:51:04 +0200." <3B3C95D8.518E5175@egenix.com> References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <200106291525.f5TFP0H29410@odiug.digicool.com> > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. I like this idea! I know that I *still* have a hard time not to think "C 'char' datatype, i.e. an 8-bit byte" when I read "character"... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Smart idea, but how practical is this? Can you spec this out a bit more? > +1 on removing knowledge about surrogates from the Unicode > implementation core (it's also the easiest: there is none :-) Except for \U currently -- or is that not part of the implementation core? > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. But its design is outside the scope of this PEP, I'd say. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp@ActiveState.com Sat Jun 30 02:16:25 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 29 Jun 2001 18:16:25 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <3B3D2869.5C1DDCF1@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. That's fine, but Python does have a concept of character and I'm going to use the term character for discussing these. > Also, a link to the Unicode glossary would be a good thing. Funny how these little PEPs grow... >... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Adding more and more knobs to tweak just adds up to Python code being non-portable from one machine to another. > > ISSUE: Should Python allow the construction of characters > > that do not correspond to Unicode characters? > > Unassigned Unicode characters should obviously be legal > > (because they could be assigned at any time). But > > code points above TOPCHAR are guaranteed never to > > be used by Unicode. Should we allow access to them > > anyhow? > > I wouldn't count on that last point ;-) > > Please note that you are mixing terms: you don't construct > characters, you construct code points. Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. unichr() does not construct code points. It constructs 1-char Python Unicode strings...also known as Python Unicode characters. > ... Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. The concatenation of true code points would *always* make a valid Unicode string, right? It's code units that cannot be blindly concatenated. >... > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. Okay, I'll add: It has been proposed that there should be a module for working with UTF-16 strings in narrow Python builds through some sort of abstraction that handles surrogates for you. If someone wants to implement that, it will be another PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh@python.net Sat Jun 30 10:32:34 2001 From: mwh@python.net (Michael Hudson) Date: 30 Jun 2001 10:32:34 +0100 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Paul Prescod's message of "Fri, 29 Jun 2001 18:16:25 -0700" References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: Paul Prescod writes: > "M.-A. Lemburg" wrote: > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. As a Unicode Idiot (tm) can I please beg you to reconsider? There are so many possible meanings for "character" that I really think it's best to avoid the word altogether. Call Python characters "length 1 strings" or even "length 1 Python strings". [...] > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings This is what I think you should be saying. > ...also known as Python Unicode characters. Which I'm suggesting you forget! Cheers, M. -- I'm a keen cyclist and I stop at red lights. Those who don't need hitting with a great big slapping machine. -- Colin Davidson, cam.misc From paulp@ActiveState.com Sat Jun 30 12:28:28 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 04:28:28 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DB7DC.511A3D8@ActiveState.com> Michael Hudson wrote: > >... > > As a Unicode Idiot (tm) can I please beg you to reconsider? There are > so many possible meanings for "character" that I really think it's > best to avoid the word altogether. Call Python characters "length 1 > strings" or even "length 1 Python strings". Do you really feel that there are many possible meanings for the word "Python Unicode character?" This is a PEP: I have to assume a certain degree of common understanding. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal@egenix.com Sat Jun 30 12:52:38 2001 From: mal@egenix.com (M.-A. Lemburg) Date: Sat, 30 Jun 2001 13:52:38 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DBD86.81F80D06@egenix.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. The term "character" in Python should really only be used for the 8-bit strings. In Unicode a "character" can mean any of: """ Unfortunately the term character is vastly overloaded. At various times people can use it to mean any of these things: - An image on paper (glyph) - What an end-user thinks of as a character (grapheme) - What a character encoding standard encodes (code point) - A memory storage unit in a character encoding (code unit) Because of this, ironically, it is best to avoid the use of the term character entirely when discussing character encodings, and stick to the term code point. """ Taken from Mark Davis' paper: http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ > > Also, a link to the Unicode glossary would be a good thing. > > Funny how these little PEPs grow... Is that a problem ? The Unicode glossary is very useful in providing a common base for understanding the different terms and tries very hard to avoid ambiguity in meaning. This discussion is partly caused by exactly these different understanding of the terms used in the PEP. I will update the Unicode PEP to the Unicode terminology too. > >... > > Why not make the codec used by Python to convert Unicode > > literals to Unicode strings an option just like the default > > encoding ? > > > > That way we could have a version of the unicode-escape codec > > which supports surrogates and one which doesn't. > > Adding more and more knobs to tweak just adds up to Python code being > non-portable from one machine to another. Not necessarily so; I'll write a more precise spec next week. The idea is to put the codec information into the Python source code, so that it is bound to the literals that way with the result of the Python source code being portable across platforms. Currently this is just an idea and still have to check how far this can go... > > > ISSUE: Should Python allow the construction of characters > > > that do not correspond to Unicode characters? > > > Unassigned Unicode characters should obviously be legal > > > (because they could be assigned at any time). But > > > code points above TOPCHAR are guaranteed never to > > > be used by Unicode. Should we allow access to them > > > anyhow? > > > > I wouldn't count on that last point ;-) > > > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings...also known as Python Unicode characters. > > > ... Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > The concatenation of true code points would *always* make a valid > Unicode string, right? It's code units that cannot be blindly > concatenated. Both wrong :-) U+D800 is a valid Unicode code point and can occur as code unit in both narrow and wide builds. Concatenating this with e.g. U+0020 will still make it a valid Unicode code point sequence (aka Unicode object), but not a valid Unicode character string (since the U+D800 is not a character). The same is true for e.g. U+FFFF. Note that the Unicode type should happily store these values, while the codecs complain. As a result and like I said above, dealing with these problems is left to the applications which use these Unicode objects. > >... > > We should provide a new module which provides a few handy > > utilities though: functions which provide code point-, > > character-, word- and line- based indexing into Unicode > > strings. > > Okay, I'll add: > > It has been proposed that there should be a module for working > with UTF-16 strings in narrow Python builds through some sort of > abstraction that handles surrogates for you. If someone wants > to implement that, it will be another PEP. Uhm, narrow builds don't support UTF-16... it's UCS-2 which is supported (basically: store everything in range(0x10000)); the codecs can map code points to surrogates, but it is solely their responsibility and the responsibility of the application using them to take care of dealing with surrogates. Also, the module will be useful for both narrow and wide builds, since the notion of an encoded character can involve multiple code points. In that sense Unicode is always a variable length encoding for characters and that's the application field of this module. Here's the adjusted text: It has been proposed that there should be a module for working with Unicode objects using character-, word- and line- based indexing. The details of the implementation is left to another PEP. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From bckfnn@worldonline.dk Sat Jun 30 14:07:55 2001 From: bckfnn@worldonline.dk (Finn Bock) Date: Sat, 30 Jun 2001 13:07:55 GMT Subject: [Python-Dev] Corrupt Jython CVS (off topic). Message-ID: <3b3dccf6.26562024@mail.wanadoo.dk> A week ago I posted this on jython-dev, but no-one was able to give any advise on the best way to fix it. Maybe you can help. For some time now, our [jython] web CVS have not worked correctly: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ Finally I managed to track the problem to the Java2Accessibility.py,v file in the CVS repository. The "rlog" command cannot be executed on this file. >From the start of the Java2Accessibility.py,v: head 2.4; access; symbols Release_2_1alpha1:2.4 Release_2_0:2.2 Release_2_0rc1:2.2 Release_2_0beta2:2.2 Release_2_0beta1:2.2 Release_2_0alpha3:2.2 Release_2_0alpha2:2.2 Release_2_0alpha1:2.2 Release_1_1rc1:2.2 Release_1_1beta4:2.2 Release_1_1beta3:2.2 2.0:1.1.0.2; locks; strict; As an experiment, I tried to remove the strange "2.0:1.1.0.2;" line from the file and then I could run rlog on the file. Does anyone know if/how we can fix this? As a last resort I suppose I can attach my hand edited version to a SF support request where I ask them to copy my file to the CVS server. To this day I have never been very successful whenever I have tried to edit files in a CVS repository so I'm reluctant to do this. regards, finn From nhv@cape.com Sat Jun 30 14:16:48 2001 From: nhv@cape.com (Norman Vine) Date: Sat, 30 Jun 2001 09:16:48 -0400 Subject: [Python-Dev] RE: Threaded Cygwin Python Import Problem In-Reply-To: <20010628171715.P488@dothill.com> Message-ID: <015601c10166$eb79bb00$a300a8c0@nhv> Jason Tishler > >Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now >provides enough pthreads support so that Cygwin Python builds OOTB *and* >functions reasonably well even with threads enabled. Unfortunately, >there are still a few issues that need to be resolved. > >The one that I would like to address in this posting prevents a threaded >Cygwin Python from building the standard extension modules (without some >kind of intervention). :,( Specifically, the build would frequently >hang during the Distutils part when Cygwin Python is attempting to execvp >a gcc process. > >See the first attachment, test.py, for a minimal Python script that >exhibits the hang. See the second attachment, test.c, for a rewrite >of test.py in C. Since test.c did not hang, I was able to conclude that >this was not just a straight Cygwin problem. > >Further tracing uncovered that the hang occurs in _execvpe() (in os.py), >when the child tries to import tempfile. If I apply the third >attachment, >os.py.patch, then the hang is avoided. Hence, it appears that importing a >module (or specifically the tempfile module) in a threaded Cygwin Python >child cause a hang. > >I saw the following comment in _execvpe(): > > # Process handling (fork, wait) under BeOS (up to 5.0) > # doesn't interoperate reliably with the thread interlocking > # that happens during an import. The actual error we need > # is the same on BeOS for posix.open() et al., ENOENT. > >The above makes me think that possibly Cygwin is having a >similar problem. > >Can anyone offer suggestions on how to further debug this problem? I was experiencing the same problems as Jason with Win2k sp1 and had used the same work-around successfully. < I believe Jason is working with NT 4.0 sp 5 > Curiously after applying the Win2k sp2 I no longer need to do this and the original Python code works fine. Leading me to believe that this may be but a symptom of a another Windows mystery. Regards Norman Vine From aahz@rahul.net Sat Jun 30 15:15:24 2001 From: aahz@rahul.net (Aahz Maruch) Date: Sat, 30 Jun 2001 07:15:24 -0700 (PDT) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3DB7DC.511A3D8@ActiveState.com> from "Paul Prescod" at Jun 30, 2001 04:28:28 AM Message-ID: <20010630141524.E029999C80@waltz.rahul.net> Paul Prescod wrote: > Michael Hudson wrote: >> >>... >> >> As a Unicode Idiot (tm) can I please beg you to reconsider? There are >> so many possible meanings for "character" that I really think it's >> best to avoid the word altogether. Call Python characters "length 1 >> strings" or even "length 1 Python strings". > > Do you really feel that there are many possible meanings for the word > "Python Unicode character?" This is a PEP: I have to assume a certain > degree of common understanding. After reading Michael's and MA's arguments, I'm +1 on making the change they're requesting. But what really triggered my posting this was your use of the phrase "common understanding"; IME, Python's "explicit is better than implicit" rule is truly critical in documentation. Particularly if "character" has been deprecated in standard Unicode documentation, I think sticking to a common vocabulary makes more sense. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From Jason.Tishler@dothill.com Sat Jun 30 16:20:19 2001 From: Jason.Tishler@dothill.com (Jason Tishler) Date: Sat, 30 Jun 2001 11:20:19 -0400 Subject: [Python-Dev] Re: Threaded Cygwin Python Import Problem In-Reply-To: <015601c10166$eb79bb00$a300a8c0@nhv> Message-ID: <20010630112019.B626@dothill.com> Norman, On Sat, Jun 30, 2001 at 09:16:48AM -0400, Norman Vine wrote: > Jason Tishler > >The one that I would like to address in this posting prevents a threaded > >Cygwin Python from building the standard extension modules (without some > >kind of intervention). :,( Specifically, the build would frequently > >hang during the Distutils part when Cygwin Python is attempting to execvp > >a gcc process. > I was experiencing the same problems as Jason with Win2k sp1 and > had used the same work-around successfully. > < I believe Jason is working with NT 4.0 sp 5 > > > Curiously after applying the Win2k sp2 I no longer need to do this > and the original Python code works fine. > > Leading me to believe that this may be but a symptom of a another > Windows mystery. After further reflection, I feel that I have found another race/deadlock issue with the Cygwin's pthreads implementation. If I'm correct, this would explain why you experienced it intermittently with Windows 2000 SP1 and it is "gone" with SP2. Probably SP2 slows down your machine so much that the problem is not triggered. :,) I am going to reconfigure --with-pydebug and set THREADDEBUG. Hopefully, the hang will still be reproducible under these conditions. If so, then I will attempt to produce a minimal C test case for Rob to use to isolate and solve this problem. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From guido@digicool.com Sat Jun 30 19:06:35 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 30 Jun 2001 14:06:35 -0400 Subject: [Python-Dev] Corrupt Jython CVS (off topic). In-Reply-To: Your message of "Sat, 30 Jun 2001 13:07:55 GMT." <3b3dccf6.26562024@mail.wanadoo.dk> References: <3b3dccf6.26562024@mail.wanadoo.dk> Message-ID: <200106301806.f5UI6Zq30293@odiug.digicool.com> > A week ago I posted this on jython-dev, but no-one was able to give any > advise on the best way to fix it. Maybe you can help. > > > For some time now, our [jython] web CVS have not worked correctly: > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ > > Finally I managed to track the problem to the Java2Accessibility.py,v > file in the CVS repository. The "rlog" command cannot be executed on > this file. > > >From the start of the Java2Accessibility.py,v: > > head 2.4; > access; > symbols > Release_2_1alpha1:2.4 > Release_2_0:2.2 > Release_2_0rc1:2.2 > Release_2_0beta2:2.2 > Release_2_0beta1:2.2 > Release_2_0alpha3:2.2 > Release_2_0alpha2:2.2 > Release_2_0alpha1:2.2 > Release_1_1rc1:2.2 > Release_1_1beta4:2.2 > Release_1_1beta3:2.2 > 2.0:1.1.0.2; > locks; strict; > > > As an experiment, I tried to remove the strange "2.0:1.1.0.2;" line from > the file and then I could run rlog on the file. Make sure to move the semicolon to the end of the previous line. > Does anyone know if/how we can fix this? > > As a last resort I suppose I can attach my hand edited version to a SF > support request where I ask them to copy my file to the CVS server. To > this day I have never been very successful whenever I have tried to edit > files in a CVS repository so I'm reluctant to do this. > > regards, > finn Yes, I think a SF request should be the way to go. I don't know how this could have happened; the "2.0" is illegal as a symbolic tag name... --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp@ActiveState.com Sat Jun 30 20:09:07 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 12:09:07 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> Message-ID: <3B3E23D3.69D591DD@ActiveState.com> Aahz Maruch wrote: > > > After reading Michael's and MA's arguments, I'm +1 on making the change > they're requesting. But what really triggered my posting this was your > use of the phrase "common understanding"; IME, Python's "explicit is > better than implicit" rule is truly critical in documentation. The spec starts of with an absolutely water tight definition of the term: "the addressable units of a Python Unicode string." I can't get more explicit than that. Expanding every usage of the word to "length 1 Python Unicode string" does not make the document more explicit any more than this is a "more explicit" equation than Ensteins: "The Energy is the mass of the object times the speed of light times two." > Particularly if "character" has been deprecated in standard Unicode > documentation, I think sticking to a common vocabulary makes more sense. "Character" is still a central term in all unicode documentation. Go to their web page and look. It's right on the front page. "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." But I'm not using it in the Unicode sense anyhow, so it doesn't matter. If ISO deprecates the use of the word integer in some standard will we stop talking about Python integers as integers? The addressable unit of a Python string is a character. If it is a Python Unicode String then it is a Python Unicode character. The term "Python Unicode character" is not going away: http://www.python.org/doc/current/tut/node5.html#SECTION005120000000000000000 I will be alot more concerned about this issue when someone reads the PEP and is actually confused by something as opposed to worrying that somebody might be confused by something. If I start using a bunch of technical terms and obfuscatory expansions, it will just dissuade people from reading the PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From DavidA@ActiveState.com Sat Jun 30 22:28:39 2001 From: DavidA@ActiveState.com (David Ascher) Date: Sat, 30 Jun 2001 14:28:39 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> Message-ID: <3B3E4487.40054EAE@ActiveState.com> > "The Energy is the mass of the object times the speed of light times > two." Actually, it's "squared", not times two. At least in my universe =) --david-Unicode-idiot-much-to-Paul's-dismay-ascher From m.favas at per.dem.csiro.au Fri Jun 1 00:41:13 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Fri, 01 Jun 2001 06:41:13 +0800 Subject: [Python-Dev] One more dict trick Message-ID: <3B16C889.C01905BD@per.dem.csiro.au> Tried the patch (thanks, Tim!) - but I guess the things I'm running aren't too sensitive to dict speed . I see a slight speed-up, around 1-2%... Nice, elegant patch that should go places! Maybe the bio-informatics people on c.l.py (Andrew Dalke?) would be interested in trying it out? -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Fri Jun 1 02:24:01 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 20:24:01 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: Message-ID: Another version of the patch attached, a bit faster and with a large new comment block explaining it. It's looking good! As I hope the new comments make clear, nothing about this approach is "a mystery" -- there are explainable reasons for each fiddly bit. This gives me more confidence in it than in the previous approach, and, indeed, it turned out that when I *thought* "hmm! I bet this change would be a little faster!", it actually was . -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dict.txt URL: From tim.one at home.com Fri Jun 1 03:32:30 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 21:32:30 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com> Message-ID: Heh. I was implementing 128-bit floats in software, for Cray, in about 1980. They didn't do it because they *wanted* to make the Cray boxes look like pigs . A 128-bit float type is simply necessary for some scientific work: not all problems are well-conditioned, and the "extra" bits can vanish fast. Went thru the same bit at KSR. Just yesterday Konrad Hinsen was worrying on c.l.py that his scripts that took 2 hours using native floats zoomed to 5 days when he started using GMP's arbitrary-precision float type *just* to get 100 bits of precision. When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was never quite sure why the founders thought that would be a killer selling point, but it wasn't for floats. Down in the trenches we thought it would be mondo cool to have an address space so large that for the rest of our lives we'd never need to bother calling free() again <0.8 wink>. From tim.one at home.com Fri Jun 1 03:46:11 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 21:46:11 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531124533.J690@xs4all.nl> Message-ID: [Thomas Wouters] > Why ? Bumping register size doesn't mean Intel expects to use it all as > address space. They could be used for video-processing, Bingo. Common wisdom holds that vector machines are dead, but the truth is virtually *everyone* runs on a vector box now: Intel just renamed "vector" to "multimedia" (or AMD to "3D Now!"), and adopted a feeble (but ever-growing) subset of traditional vector machines' instruction sets. > or to represent a modest range of rationals , or to help core > 'net routers deal with those nasty IPv6 addresses. KSR's founders had in mind bit-level addressability of networks of machines spanning the globe. Were he to press the point, though, I'd have to agree with Eric that they didn't really *need* 128 bits for that modest goal. > I'm sure cryptomunchers would like bigger registers as well. Agencies we can't talk about would like them as big as they can get them. Each vector register in a Cray box actually consisted of 64 64-bit words, or 4K bits per register. Some "special" models were constructed where the vector FPU was thrown away and additional bit-fiddling units added in its place: they really treated the vector registers as giant bitstrings, and didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. > Oh wait... I get it! You were trying to get yourself in the > historybooks as the guy that said "64 bits ought to be enough for > everyone" :-) That would be foolish indeed! 128, though, now *that's* surely enough for at least a decade . From fdrake at acm.org Fri Jun 1 03:45:45 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 21:45:45 -0400 (EDT) Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531044332.B5026@thyrsus.com> Message-ID: <15126.62409.909290.736779@cj42289-a.reston1.va.home.com> Tim Peters writes: > When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was > never quite sure why the founders thought that would be a killer selling > point, but it wasn't for floats. Down in the trenches we thought it would > be mondo cool to have an address space so large that for the rest of our > lives we'd never need to bother calling free() again <0.8 wink>. And given what (little) I know about the memory architecture on those things, that actually would have be quite reasonable on that platform! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jun 1 04:23:47 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 22:23:47 -0400 Subject: [Python-Dev] FW: CP4E and Python newbies, it works! Message-ID: Good for the soul! -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Ron Stephens [mailto:rdsteph at earthlink.net] Sent: Thursday, May 31, 2001 7:12 PM To: python-list at python.org Subject: CP4E and Python newbies, it works! I am a complete newbie, and with a very low programming IQ. Although I had programmed a little in college thirty years ago, in Basic, PL/1 and a very little assembler, and fooled around in later years on PC's at home with Basic, then tried PERL, then an effort at Java, they were all too much trouble to really use to program, given that it was a *hobby* that was supposed to be fun. After all, I have a demanding day job that has nothing to do with software, that requires extensive travel, and four kids, a wife, two dogs, and a cat. Java et al, by the time I had digested a couple of books and put in a lot of hours, was just no fun at all to program; and I had to look in the book every other line of code just to recall the syntax etc.; I could not keep it in my head. Now, four months into Python, after being attracted by reading a blurb about Guido van Rossum's Computer Programming for Everybody project, I am in awe of his achievement. I am having fun; and if I can do so then almost anyone can. I am really absent minded, lazy, and not good at detail. Yet I have done the following in four months, and I believe Python therefore has the potential to open up programming to a much wider audience for a lot of people, which is nice: 1. I have written a half dozen scripts that are meaningful to me in Python, more than I ever accomplished with any other language. 2. I am able to have fun by sitting down in the evening, or especially on a weekend, and just programming in Python. The syntax and keywords are gratifyingly just in my head, enough anyway that I can just program like I am having a conversation, and check the details later for errors etc. This is the most satisfying thing of all. 3. I find the debugger just works; magically, it helps me turn my scripts into actual working programs, simply by rather mindlessly following the road laid out for me by using the debugger. 4. I have pleasurably read more Python books from front cover to back than I care to admit. I must be enjoying myself ;-))) 5. I am exploring Jython, which is also pleasurable. After fooling around with Java a couple of years ago, it is really a kick to see jython generating such detailed Java code for me, just as if I had written it (but it would have taken me untold pain to actually do so in Java). Whether or not I actually end up using the java code so generated, I still am enjoying the sheer experience. 6. I have Zope and other things to look forward to. 7. I am able to enjoy the discussions on this newsgroup, even though they are over my head technically. I find them intriguing. Now, I may never actually accomplish anything truly useful by my programming. But I am happy. I hope that others, younger and brighter than myself, who have an interest in programming, but need the right stimulus to get going, will find Python and produce programs of real value. I think Guido van Rossum and his team should be very proud of what they are enabling. The CP4E idea is alive and well. My hat's off to Guido and the whole community which he has spawned, especially those on this newsgroup. I am humbled and honored to read your erudite technical discussions, as a voyeur of mysteries and wonders I can only dimly see on the horizon, but that nonetheless fill me with mental delight. Ron Stephens -- http://mail.python.org/mailman/listinfo/python-list From esr at thyrsus.com Fri Jun 1 05:51:48 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:51:48 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:32:30PM -0400 References: <20010531044332.B5026@thyrsus.com> Message-ID: <20010531235148.B14591@thyrsus.com> Tim Peters : > A 128-bit float type is simply necessary for some > scientific work: not all problems are well-conditioned, and the "extra" > bits can vanish fast. Makes me wonder how competent your customers' numerical analysts were. Where the heck did they think they were getting data with that many digits of accuracy? (Note that I didn't say "precision"...) -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From esr at thyrsus.com Fri Jun 1 05:54:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:54:33 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:46:11PM -0400 References: <20010531124533.J690@xs4all.nl> Message-ID: <20010531235433.C14591@thyrsus.com> Tim Peters : > Agencies we can't talk about would like them as big as they can get them. > Each vector register in a Cray box actually consisted of 64 64-bit words, or > 4K bits per register. Some "special" models were constructed where the > vector FPU was thrown away and additional bit-fiddling units added in its > place: they really treated the vector registers as giant bitstrings, and > didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. You've got a point...but I don't think it's really economical to build that kind of hardware into general-purpose processors. You end up with a camel. You know, a horse designed by committee? -- Eric S. Raymond To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 From tim.one at home.com Fri Jun 1 08:58:08 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 02:58:08 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235148.B14591@thyrsus.com> Message-ID: [Tim] > A 128-bit float type is simply necessary for some scientific work: not > all problems are well-conditioned, and the "extra" bits can vanish fast. [ESR] > Makes me wonder how competent your customers' numerical analysts were. > Where the heck did they think they were getting data with that many > digits of accuracy? (Note that I didn't say "precision"...) Not all scientific work consists of predicting the weather with inputs known to half a digit on a calm day . Knuth gives examples of ill-conditioned problems where resorting to unbounded rationals is faster than any known stable f.p. approach (stuck with limited precision) -- think, e.g., chaotic systems here, which includes parts of many hydrodynamics problems in real life. Some scientific work involves modeling ab initio across trillions of computations (and on a Cray box in particular, where addition didn't even bother to round, nor multiplication bother to compute the full product tree, the error bounds per operation were much worse than in a 754 world). You shouldn't overlook either that algorithms often needed massive rewriting to exploit vector and parallel architectures, and in a world where a supremely competent numerical analysis can take a month to verify the numerical robustness of a new algorithm covering two pages of Fortran, a million lines of massively reworked seat-of-the-pants modeling code couldn't be trusted at all without running it under many conditions in at least two precisions (it only takes one surprise catastrophic cancellation to destroy everything). A major oil company once threatened to sue Cray when their reservoir model produced wildly different results under a new release of the compiler. Some exceedingly sharp analysts worked on that one for a solid week. Turned out the new compiler evaluated a subexpression A*B*C by doing (B*C) first instead of (A*B), because it was faster in context (and fine to do so by Fortran's rules). It so happened A was very large, and B and C both small, and doing B*C first caused the whole product to underflow to zero where doing A*B first left a product of roughly C's magnitude. I can't imagine how they ever would have found this if they weren't able to recompile the code using twice the precision (which worked fine thanks to the larger dynamic range), then tracing to see where the runs diverged. Even then it took a week because this was 100s of thousands of lines of crufty Fortran than ran for hours on the world's then-fastest machine before delivering bogus results. BTW, if you think the bulk of the world's numeric production code has even been *seen* by a qualified numerical analyst, you should ride on planes more often . From tim.one at home.com Fri Jun 1 09:08:28 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 03:08:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235433.C14591@thyrsus.com> Message-ID: [EAR] > You've got a point... Well, really, they do -- but they had a much more compelling point when the Cold War came with an unlimited budget. > but I don't think it's really economical to build that kind of > hardware into general-purpose processors. Economical? The marginal cost of adding even nutso new features in silicon now for mass-market chips is pretty close to zero. Indeed, if you're in the speech recog or 3D imaging games (i.e., things that still tax a PC), Intel comes around *begging* for new ideas to use up all their chip real estate. The only one I recall them turning down was a request from Dragon's founder to add an instruction that, given x and y, returned log(exp(x)+exp(y)). They were skeptical, and turned out even *we* didn't need it . > You end up with a camel. You know, a horse designed by committee? Yup! But that's the camel Intel rides to the bank, so it will probably grow more humps, on which to hang more bags of gold. From esr at thyrsus.com Fri Jun 1 09:23:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 1 Jun 2001 03:23:16 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Fri, Jun 01, 2001 at 02:58:08AM -0400 References: <20010531235148.B14591@thyrsus.com> Message-ID: <20010601032316.A15635@thyrsus.com> Tim Peters : > Not all scientific work consists of predicting the weather with inputs known > to half a digit on a calm day . Knuth gives examples of > ill-conditioned problems where resorting to unbounded rationals is faster > than any known stable f.p. approach (stuck with limited precision) -- think, > e.g., chaotic systems here, which includes parts of many hydrodynamics > problems in real life. Hmmm...good answer. I still believe it's the case that real-world measurements max out below 48 bits or so of precision because the real world is a noisy, fuzzy place. But I can see that most of the algorithms for partial differential equationss would multiply those by very small or very large quantities repeatedly. The range-doubling trick for catching divergences is neat, too. So maybe there's a market for 128-bit floats after all. I'm still skeptical about how likely those applications are to influence the architecture of general-purpose processors. I saw a study once that said heavy-duty scientific floating point only accounts for about 2% of the computing market -- and I think it's significant that MMX instructions and so forth entered the Intel line to support *games*, not Navier-Stokes calculations. That 2% will have to get a lot bigger before I can see Intel doubling its word size again. It's not just the processor design; the word size has huge implications for buses, memory controllers, and the whole system architecture. -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From pf at artcom-gmbh.de Fri Jun 1 09:22:50 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 1 Jun 2001 09:22:50 +0200 (MEST) Subject: [Python-Dev] precision thread (was One more dict trick) Message-ID: Eric: > > You end up with a camel. You know, a horse designed by committee? Tim: > Yup! But that's the camel Intel rides to the bank, so it will probably grow > more humps, on which to hang more bags of gold. cam*ls? Guido is only one week on vacation and soon heretical words show up here. ;-) sorry, couldn't resist, Peter From thomas at xs4all.net Fri Jun 1 09:28:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 1 Jun 2001 09:28:01 +0200 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 01:06:01PM -0500 References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <20010601092800.K690@xs4all.nl> On Thu, May 31, 2001 at 01:06:01PM -0500, Skip Montanaro wrote: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? You had a sticky tag on the file, probably because you used '-rrelease21-maint' on a cvs checkout or update. Good thing it was release21-maint, though, and not some random other revision, or you would have created another branch :-) You can remove stickyness by using 'cvs update -A'. I personally just have two trees, ~/python/python-2.2 and ~/python/python-2.1.1, where the last one was checked out with -rrelease21-maint. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From gmcm at hypernet.com Fri Jun 1 13:29:28 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 1 Jun 2001 07:29:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531235433.C14591@thyrsus.com> Message-ID: <3B174458.1998.46DEEE2B@localhost> [ESR] > > You end up with a camel. You know, a horse designed by > > committee? [Tim] > Yup! But that's the camel Intel rides to the bank, so it will > probably grow more humps, on which to hang more bags of gold. Been a camel a long time, too. x86 assembler is the, er, Perl of assemblers. - Gordon From mwh at python.net Fri Jun 1 13:54:40 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 12:54:40 +0100 Subject: [Python-Dev] another dict crasher Message-ID: Adapted from a report on comp.lang.python from Wolfgang Lipp: class Child: def __init__(self, parent): self.__dict__['parent'] = parent def __getattr__(self, attr): self.parent.a = 1 self.parent.b = 1 self.parent.c = 1 self.parent.d = 1 self.parent.e = 1 self.parent.f = 1 self.parent.g = 1 self.parent.h = 1 self.parent.i = 1 return getattr(self.parent, attr) class Parent: def __init__(self): self.a = Child(self) print Parent().__dict__ segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't tried Tim's latest patch, but I don't believe that will make any difference. It's obvious what's happening; the dict's resizing inside the for loop in dict_repr and the ep pointer is dangling. By the time we've shaken all of these out of dictobject.c it's going to be pretty close to free-threading safe, I'd have thought. reentrancy-sucks-ly y'rs M. -- But since I'm not trying to impress anybody in The Software Big Top, I'd rather walk the wire using a big pole, a safety harness, a net, and with the wire not more than 3 feet off the ground. -- Grant Griffin, comp.lang.python From mwh at python.net Fri Jun 1 14:12:55 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 13:12:55 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: Michael Hudson's message of "01 Jun 2001 12:54:40 +0100" References: Message-ID: Michael Hudson writes: > Adapted from a report on comp.lang.python from Wolfgang Lipp: [snip] > segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't > tried Tim's latest patch, but I don't believe that will make any > difference. > > It's obvious what's happening; the dict's resizing inside the > for loop in dict_repr and the ep pointer is dangling. Actually this crash was dict_print (I always forget about tp_print...). It's pretty easy to mend: *** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter, and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem; if the user's being this contrary there's only so much we can do to help him or her. Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4 From pedroni at inf.ethz.ch Fri Jun 1 14:49:11 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 1 Jun 2001 14:49:11 +0200 (MET DST) Subject: [Python-Dev] __xxxattr__ caching semantic Message-ID: <200106011249.OAA05837@core.inf.ethz.ch> Hi. What is the intendend semantic wrt to __xxxattr__ caching: class X: pass def cga(self,name): print name def iga(name): print name x=X() x.__dict__['__getattr__'] = iga # 1. x.__getattr__ = iga # 2. X.__dict__['__getattr__'] = cga # 3. X.__getattr__ = cga # 4. x.a for the manual http://www.python.org/doc/current/ref/customization.html with all the variants x.a should fail, they should have no effect. In practice 4. work. Is that an implementation manual mismatch, is this indented, is there code around using 4. ? I'm asking this because jython has differences/bugs in this respect? I imagine that 1.-4. should work for all other __magic__ methods (this should be fixed in jython for some methods), OTOH jython has such a restriction on __del__ too, and this one cannot be removed (is not simply a matter of caching/non caching). regards, Samuele Pedroni. From Greg.Wilson at baltimore.com Fri Jun 1 14:59:28 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 1 Jun 2001 08:59:28 -0400 Subject: [Python-Dev] re: %b format Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1E47@nsamcanms1.ca.baltimore.com> My thanks to everyone who commented on the idea of adding a binary format specifier to Python. I'll volunteer to draft the PEP --- volunteers for a co-author? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From tismer at tismer.com Fri Jun 1 15:56:26 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 15:56:26 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B179F0A.CFA3B2C@tismer.com> Tim Peters wrote: > > Another version of the patch attached, a bit faster and with a large new > comment block explaining it. It's looking good! As I hope the new comments > make clear, nothing about this approach is "a mystery" -- there are > explainable reasons for each fiddly bit. This gives me more confidence in > it than in the previous approach, and, indeed, it turned out that when I > *thought* "hmm! I bet this change would be a little faster!", it actually > was . Thanks a lot for this nice patch. It looks like a real improvement. Also thanks for mentioning my division idea. Since all bits of the hash are eventually taken into account, this idea has somehow survived in an even more efficient solution, good end, file closed. (and good that I saved the time to check my patch in, lately :-) cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From pedroni at inf.ethz.ch Fri Jun 1 16:18:20 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 1 Jun 2001 16:18:20 +0200 (MET DST) Subject: [Python-Dev] Re: [Jython-dev] Using PyChecker in Jython Message-ID: <200106011418.QAA13570@core.inf.ethz.ch> Hi. [Neal Norwitz] > Hello! > > I have created a program PyChecker to perform Python source code checking. > (http://pychecker.sourceforge.net). > > PyChecker is implemented in C Python and does some "tricky" things. > It doesn't currently work in Jython due to the module dis (disassemble code) > not being available in Jython. > > Is there any fundamental problem with getting PyChecker to work under Jython? > > Here's a high-level overview of what PyChecker does: > > imp.find_module() > imp.load_module() > for each object in dir(module): > # object can be a class, function, imported module, etc. > for each instruction in disassembled byte code: > # handle each instruction appropriately > > This hides a lot of details, but I do lots of things like getting the code objects from the classes, methods, and > functions, look at the arguments > in functions, etc. > > Is it possible to make work in Jython? Easy? > > Thanks for any guidance, > Neal It would be great - really - but about easy? As easy as making PyChecker working on source code without using dis and without importing/executing modules and their top defs, I think there will be no dis support on jython side (we produce java bytecode and getting "back" to python vm bytecode would be very tricky, not very elegant, etc. ) any time soon . Seriously, two possible workaround hacks (they are also not very easy), this is just after small brainstorming and ignoring the concrete needs and code of PyChecker: +) more elegant one, but maybe still too difficult or requiring too much work: let PyChecker run under CPython even when checking jython code, jython code can compile down to py vm bytecode but then does not run: why? java classes imports and the jython specific builtin modules (not so many) So one needs to implement a sufficient amount of python (an import hook, etc) code that does the minimal partial evalution required and the required amount of loading&introspection on java, jython specific stuff in order to have the imports work and PyChecher feeded with the things it needs. This means dealing with the java class format, or a two passes approach: run the code under jython in order to gather the information needed to load it succesfully under python. If the top level code contains conditionals that depend on jython stuff this could be hard, but one can ignore that (at least for starting). Clearly the main PyChecker loop would require some adaptation, and maybe include some logic to check some jython specific stuff (subclassing from java, etc). *) let an adapted PyChecker run under jython, obtain someway the needed py vm bytecode stream from a source -> py vm bytecode compiler written in python (such a thing exists - if I remember well) . And similar ideas ... regards, Samuele Pedroni. From barry at digicool.com Fri Jun 1 16:43:59 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 10:43:59 -0400 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.43567.202950.192811@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> You can remove stickyness by using 'cvs update -A'. I TW> personally just have two trees, ~/python/python-2.2 and TW> ~/python/python-2.1.1, where the last one was checked out with TW> -rrelease21-maint. Very good advice for anybody playing with branches! -Barry From barry at digicool.com Fri Jun 1 17:12:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 11:12:33 -0400 Subject: [Python-Dev] another dict crasher References: Message-ID: <15127.45281.435849.822222@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that MH> will make any difference. That is highly, highly nasty. Sounds to me like there ought to be an emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if necessary. And if we can trojan in the NAIPL (New And Improved Python License), I wouldn't mind. :) -Barry From jeremy at digicool.com Fri Jun 1 17:18:05 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Fri, 1 Jun 2001 11:18:05 -0400 (EDT) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <15127.45613.947590.246269@slothrop.digicool.com> >>>>> "BAW" == Barry A Warsaw writes: >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that will MH> make any difference. BAW> That is highly, highly nasty. Sounds to me like there ought to BAW> be an emergency 2.1.1 patch made for this, bumping Thomas's BAW> work to 2.1.2 if necessary. And if we can trojan in the NAIPL BAW> (New And Improved Python License), I wouldn't mind. :) We can release a critical patch for this bug, ala the CriticalPatches page for the Python 2.0 release. Jeremy From mwh at python.net Fri Jun 1 18:03:55 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 1 Jun 2001 17:03:55 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: On Fri, 1 Jun 2001, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Yes. > Sounds to me like there ought to be an emergency 2.1.1 patch made for > this, bumping Thomas's work to 2.1.2 if necessary. Really? Two mild counterpoints: 1) It's *old*; 1.5.2 at least, and that's only because that's the oldest version I happen to have lying around. It's quite similar to the test_mutants oddness in some ways. 2) There's at least one other crasher in 2.1; the one in the compiler where a variable is referenced in a class and in a contained method. (I've actually run into that one). But a "fix these crashers" release seems reasonable if there's someone with the time to put it out (not me!). > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) Well me neither... Cheers, M. From skip at pobox.com Fri Jun 1 18:26:35 2001 From: skip at pobox.com (Skip Montanaro) Date: Fri, 1 Jun 2001 11:26:35 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <20010601092800.K690@xs4all.nl> References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.49723.186388.220648@beluga.mojam.com> Thomas> I personally just have two trees, ~/python/python-2.2 and Thomas> ~/python/python-2.1.1, where the last one was checked out with Thomas> -rrelease21-maint. Thanks, good advice. httplib.py has now been updated on both the head and release21-maint branches. Skip From loewis at informatik.hu-berlin.de Fri Jun 1 19:07:52 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 1 Jun 2001 19:07:52 +0200 (MEST) Subject: [Python-Dev] METH_NOARGS calling convention Message-ID: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> The patch http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 introduces two new calling conventions, METH_O and METH_NOARGS. The rationale for METH_O has been discussed already; the rationale for METH_NOARGS is that it allows a convient simplification (plus a marginal speed-up) of functions which do either PyArg_NoArgs(args) or PyArg_ParseTuple(args, ":function_name"). Now, one open issue is whether the METH_NOARGS functions should have a signature of PyObject * (*unaryfunc)(PyObject *); or of PyObject *(*PyCFunction)(PyObject *, PyObject *); which then would be called with a NULL second argument; the first argument would be self in either case. IMO, the advantage of passing the NULL argument is that NOARGS methods don't need to be cast into PyCFunction in the method table; the advantage of the second approach is that it is clearer in the function implementation. Any opinions which signature to use? Regards, Martin From mal at lemburg.com Fri Jun 1 19:18:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 19:18:21 +0200 Subject: [Python-Dev] METH_NOARGS calling convention References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: <3B17CE5D.9D4CE8D4@lemburg.com> Martin von Loewis wrote: > > The patch > > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 > > introduces two new calling conventions, METH_O and METH_NOARGS. The > rationale for METH_O has been discussed already; the rationale for > METH_NOARGS is that it allows a convient simplification (plus a > marginal speed-up) of functions which do either PyArg_NoArgs(args) or > PyArg_ParseTuple(args, ":function_name"). > > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The second... I'm not sure how you will get extension writers who have to maintain packages for all three Python versions to ever change their code to use the new style calling scheme: there simply is no clean way to use the same code base unless you are willing to add tons of #ifdefs. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fdrake at acm.org Fri Jun 1 19:31:15 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Jun 2001 13:31:15 -0400 (EDT) Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <3B17CE5D.9D4CE8D4@lemburg.com> References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> <3B17CE5D.9D4CE8D4@lemburg.com> Message-ID: <15127.53603.87216.103262@cj42289-a.reston1.va.home.com> M.-A. Lemburg writes: > > Any opinions which signature to use? > > The second... Seconded. ;-) > I'm not sure how you will get extension writers who > have to maintain packages for all three Python versions to > ever change their code to use the new style calling scheme: > there simply is no clean way to use the same code base unless > you are willing to add tons of #ifdefs. You won't, and that's OK. Even if 3rd-party extensions never use it, there are plenty of functions/methods in the standard distribution which can use it, and I imagine those would be converted fairly quickly. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tismer at tismer.com Fri Jun 1 20:29:11 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:29:11 +0200 Subject: [Python-Dev] Marshal bug in 2.1? Message-ID: <3B17DEF7.3E7C6BC6@tismer.com> Hi friends, there is a script which generates encrypted passwords for Starship users. There is a series of marshal, zlib and base64 calls, which is reversed by the script. Is there a known bug in Marshal, or should I start the debugger now? The passwphrase for the attached script is "hey". cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ -------------- next part -------------- import marshal,base64,zlib exec marshal.loads(zlib.decompress(base64.decodestring(""" eJytVM+PGzUUfs6PzWZYwapAqbbAuiyF6Yqsqt2iomq1HGkvuQQJaS+pM3YzbjP2yHY6CdrVHNr+ Exz5L/gn4MidC2f+Az5Pkq0QlFMnmTf2s+d73/vmPWeEq43b/wxT498mSXSOwbskGZ0zqm+QbNF5 i+o9km16idU21bdIdUh26GmLrCRWf0ayS8+6dN6l+oAU0XcP689JbZHcohfA6VF9mxQj1SbVi57r 2PAFqS7p7bVH9+kFkew1mDvA/JJUCziGEYs3AozS7ch1yIiSg7dwJfjxzCkRVFml4Q7ng8F6zgUv hfeVdZLzJ84WXJgln+rnyvCgFuEIbzoV5s54/g3PcuFEFpTzvMp1lnPhFM9sUc6DklwboEmF5UIb 7YPO8PJkHvhz5ZbcWDOYaaOE45VYrmI18N/n2sctXlvDMczmPthC/wjEJ9bxUrtFTOBt6OAPoqSH h4c85MqrdUaeT1SoFDIenJ0OmpyWdu5AxDllwmuB8GLC33gNzm7700EytBWfA3s0esiD5TM7hTAY +IBIuS6PymXIrTkyKiRYjKL5+MI607nXZsrVAjLPlpHmFck0m+lyYgWIOAXRC2UkNHowuJMII+Mm M10zv2K8QosojUvy0tmpE0WyomQLFfK4o7BIGgUhxWSmjhJ/F/U3CdVX/BHPRKyE2SwiA0mEVQgI g49agXtmIVMWbmWMOvi1yZexyfaovhmb7BnRJWsGjC7RXh/TBZqgFdsO3XCJJvuELtqkO3RB0cPq T5v5VmyTSwDt00WLdI/CduxQNGbc14pNGm2H+Ajgo7SLoEPfhz25e3x8cv/eyX0wYuADRjepAQpE ga3jIP514H2E4SiNZ8NQj2E1h2nmPposd80TYnrUDi3SaFdD/37c8O9q9bF7T2eimEhxtk8+Hj6N 0XEh7W+wC/m134qT4PANGpdRVYMtm4V5KdGijSM0DqmnygffwfCp1WaFIsq0s+EU/gt4Bfh/ZDdn wx75JJ6U7EN2je2y91izOh4XQpvxeOj3MStnSqC88f1RsqtSiMXKy9zB/8DvYs/jH/46fWR+q3+v fv3lz5/+eJUmm5ylzRr6eB5vBif/4LAOaUShxuOrdKJoTlRjbXDWNN6wCFeSvdYmbcR+U65RiW9R Dh/gufNOP+m3dnq7bIdtI9VrbJ/9DYOcdyU= """))) From tismer at tismer.com Fri Jun 1 20:47:02 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:47:02 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> Message-ID: <3B17E326.41D82CCE@tismer.com> Christian Tismer wrote: > > Hi friends, > > there is a script which generates encrypted passwords for > Starship users. There is a series of marshal, zlib and base64 > calls, which is reversed by the script. > > Is there a known bug in Marshal, or should I start the debugger now? > The passwphrase for the attached script is "hey". Aehmmm... can it be that code objects are no longer compatible between Python 2.0 and 2.1? sigh - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mwh at python.net Fri Jun 1 20:52:17 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 19:52:17 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: barry@digicool.com's message of "Fri, 1 Jun 2001 11:12:33 -0400" References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: Warning! VERY SICK CODE INDEED ahead! barry at digicool.com (Barry A. Warsaw) writes: > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Not as nasty as this, though: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli: def __repr__(self): dict.clear() print # doesn't crash without this. don't know why return `"machiavelli"` def __hash__(self): return 0 dict[Machiavelli()] = Machiavelli() print dict gives, even with my posted patch to dictobject.c $ ./python crash2.py { Segmentation fault (core dumped) Any ideas what the above code should do? (Other than use the secret PSU website to hire a hitman and shoot whoever wrote the code, I mean). Cheers, M. -- Well, yes. I don't think I'd put something like "penchant for anal play" and "able to wield a buttplug" in a CV unless it was relevant to the gig being applied for... -- Matt McLeod, alt.sysadmin.recovery From mal at lemburg.com Fri Jun 1 21:01:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 21:01:38 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> Message-ID: <3B17E692.281A329B@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Hi friends, > > > > there is a script which generates encrypted passwords for > > Starship users. There is a series of marshal, zlib and base64 > > calls, which is reversed by the script. > > > > Is there a known bug in Marshal, or should I start the debugger now? > > The passwphrase for the attached script is "hey". > > Aehmmm... can it be that code objects are no longer compatible > between Python 2.0 and 2.1? Yes, not suprisingly though... AFAIK the pyc format changed in every single version between 1.5.2 and 2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 1 22:36:21 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 16:36:21 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: I suspect there are many ways to get the dict code to blow up, and always have been. I picked on dict compare a month or so ago mostly because nobody cares how fast that runs except in the == and != cases. Others are a real bitch; for example, the fundamental lookdict function caches dictentry *ep0 = mp->ma_table; at the start as if it were invariant -- but very unlikely sequences of collisions with identical hash codes combined with mutating comparisons can turn that into a bogus pointer. List objects used to have similar vulnerabilities during sorting (where comparison is the *norm*, not a one-in-a-billion freak occurrence), and no amount of slow-the-code paranoia sufficed to plug all conceivable holes. In the end we invented an internal "immutable list type", and replace the list object's type pointer for the duration of the sort (you can still try to mutate a list during a sort, but all the mutating list methods are redirected to raise an exception when you do). The dict code has even more holes and in more places, but they're generally much harder to provoke, so they've gone unnoticed for 10 years. All in all, seemed like a good tradeoff to me . From tim.one at home.com Sat Jun 2 00:08:32 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 18:08:32 -0400 Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: Cool! [Martin von Loewis] > ... > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The one that makes sense : delcare functions with the number of arguments they use. I don't care about needing to cast in the table: you do that once, but people read the *code* over and over, and an unused arg will be a mystery (or even a source of compiler warnings) every time you bump into one. The only way needing to cast could be "a problem" is if this remains an undocumented gimmick that developers have to reverse-engineer from staring at the (distributed all over the place) implementation. I like what the patch does, but I'd reject it just for continuing to leave this stuff Utterly Mysterious: please add comments saying what METH_NOARGS and METH_O *mean*: what's the point, why are these defined, how and when are you supposed to use them? That's where to explain the need to cast METH_NOARGS. From thomas at xs4all.net Sat Jun 2 00:42:35 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:42:35 +0200 Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org>; from barry@digicool.com on Fri, Jun 01, 2001 at 11:12:33AM -0400 References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <20010602004235.Q690@xs4all.nl> On Fri, Jun 01, 2001 at 11:12:33AM -0400, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > That is highly, highly nasty. Sounds to me like there ought to be an > emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if > necessary. Why bump 'my work' ? I'm just reviewing patches checked into the head. A fix for the above problems would fit in a patch release very nicely, and a release is a release. Besides, releasing 2.1.1 as 2.1 + dict fix would be a CVS nightmare. Unless you propose to keep it out of CVS, Barry ? :) > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) I'll channel Guido by saying he wouldn't even allow us to ship it with anything other than the PSF licence :) Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Sat Jun 2 00:47:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:47:16 +0200 Subject: [Python-Dev] Marshal bug in 2.1? In-Reply-To: <3B17E692.281A329B@lemburg.com>; from mal@lemburg.com on Fri, Jun 01, 2001 at 09:01:38PM +0200 References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> Message-ID: <20010602004716.R690@xs4all.nl> On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > Yes, not suprisingly though... AFAIK the pyc format changed > in every single version between 1.5.2 and 2.1. Worse, it's changed several times between each release :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at digicool.com Sat Jun 2 01:12:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 19:12:30 -0400 Subject: [Python-Dev] another dict crasher References: <15127.45281.435849.822222@anthem.wooz.org> <20010602004235.Q690@xs4all.nl> Message-ID: <15128.8542.51241.192412@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> That is highly, highly nasty. Sounds to me like there ought to >> be an emergency 2.1.1 patch made for this, bumping Thomas's >> work to 2.1.2 if necessary. TW> Why bump 'my work' ? I'm just reviewing patches checked into TW> the head. A fix for the above problems would fit in a patch TW> release very nicely, and a release is a release. Besides, TW> releasing 2.1.1 as 2.1 + dict fix would be a CVS TW> nightmare. Unless you propose to keep it out of CVS, Barry ? TW> :) Oh no! You know me, I like to release those maintenance releases early and often. :) Anyway, that's why /you're/ the 2.1.1 czar. >> And if we can trojan in the NAIPL (New And Improved Python >> License), I wouldn't mind. :) TW> I'll channel Guido by saying he wouldn't even allow us to ship TW> it with anything other than the PSF licence :) :) TW> Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly TW> y'rs Where'd you get /that/ idea? :) -Barry From mwh at python.net Sat Jun 2 01:20:26 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 00:20:26 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Fri, 1 Jun 2001 16:36:21 -0400" References: Message-ID: "Tim Peters" writes: > The dict code has even more holes and in more places, but they're > generally much harder to provoke, so they've gone unnoticed for 10 > years. All in all, seemed like a good tradeoff to me . Are you suggesting that we should just leave these crashers in? They're not *particularly* hard to provoke if you know the implementation - and I was inspired to look for them by someone's report of actually running into one. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From tim.one at home.com Sat Jun 2 03:04:36 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 21:04:36 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Are you suggesting that we should just leave these crashers in? > They're not *particularly* hard to provoke if you know the > implementation - and I was inspired to look for them by someone's > report of actually running into one. I certainly don't object to fixing ones that bite innocent users, but there are also costs of several kinds. In this case, I couldn't care less how long printing a dict takes -- go for it. When adversarial abuse starts interfering with the speed of crucial operations, though, I'm simply not a "safety at any cost" person. Guido is much more of one, although the number of holes remaining in Python could plausibly fill Albert Hall . short-of-50-easy-ways-to-crash-win98-just-think-hard-about-each-"+"-in- the-code-base-ly y'rs - tim From gstein at lyra.org Sat Jun 2 07:52:03 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:52:03 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 09:42:30PM -0400 References: <3B10D758.3741AC2F@lemburg.com> Message-ID: <20010601225203.R23560@lyra.org> On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote: >... > [Greg Ewing] > > I think it would be safe if: > > > > 1) it kept a reference to the underlying object, and > > That much it already does. > > > 2) it re-fetched the pointer and length info each time it was > > needed, using the underlying object's buffer interface. > > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Jun 2 07:54:23 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:54:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, May 26, 2001 at 02:44:04AM -0400 References: <3B0ED784.FC53D01@lemburg.com> Message-ID: <20010601225423.S23560@lyra.org> On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote: > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Jun 2 07:55:33 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:55:33 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com>; from mal@lemburg.com on Sat, May 26, 2001 at 05:47:47PM +0200 References: <3B0FD023.C4588919@lemburg.com> Message-ID: <20010601225533.T23560@lyra.org> On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote: >... > Even the idea of replacing the usage of strings as data buffers > with buffer object didn't get very far; common habits are simply > hard to break. That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Sat Jun 2 08:13:49 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:13:49 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Actually this crash was dict_print (I always forget about tp_print...). We all should . > It's pretty easy to mend: > > *** dictobject.c Fri Jun 1 13:08:13 2001 > --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 > *************** > *** 793,795 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { > if (ep->me_value != NULL) { > --- 793,796 ---- > any = 0; > ! for (i = 0; i < mp->ma_size; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > *************** > *** 833,835 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { > if (ep->me_value != NULL) { > --- 834,837 ---- > any = 0; > ! for (i = 0; i < mp->ma_size && v; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > > I'm not sure this stops still more Machiavellian behaviour from > crashing the interpreter, Alas, it doesn't. You can't trust *anything* about a container you're iterating over across any call that may call back into Python. In these cases, the call to PyObject_Repr() can execute any code at all, including code that mutates the dict you're crawling over. In particular, calling PyObject_Repr() to format the key means the ep = &mp->ma_table[i] pointer may be trash by the time PyObject_Repr() is called again to format the value. See characterize() for the pain it takes to guard against everything, including encouraging comments like: if (cmp > 0 || i >= a->ma_size || a->ma_table[i].me_value == NULL) { /* Not the *smallest* a key; or maybe it is * but the compare shrunk the dict so we can't * find its associated value anymore; or * maybe it is but the compare deleted the * a[thiskey] entry. */ Py_DECREF(thiskey); continue; } It should really add "or maybe it just shuffled the dict around and the value at ma_table[i] is no longer associated with the key that *used* to be at ma_table[i], but since there's still *some* non-NULL pointer there we'll just pretend that didn't happen and press onward". > and you can certainly get items being printed more than once or not > at all. I'm not sure this last is a problem; Those don't matter: in a long tradition, we buy "safety" not only at the cost of bloating the code, but also by making the true behavior in case of mutation unpredictable & inexplicable. That's why I *really* liked the "immutable list" trick in list.sort(): even if we could have made the code bulletproof without it, we couldn't usefully explain what the heck it actually did. It's not Pythonic to blow up, but neither is it Pythonic to be incomprehensible. You simply can't win here. > if the user's being this contrary there's only so much we can > do to help him or her. I'd prefer a similar internal immutable-dict trick that raised an exception if the user was pushing Python into a corner where "blow up or do something baffling" were its only choices. That would render the original example illegal, of course. But would that be a bad thing? What *should* it mean when the user invokes an operation on a container and mutates the container during that operation? There's almost no chance that Jython does the same thing as CPython in all these cases, so it's effectively undefined behavior no matter how you plug the holes (short of raising an exception). From tim.one at home.com Sat Jun 2 08:34:43 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:34:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010601225203.R23560@lyra.org> Message-ID: [Tim] > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. [Greg] > Huh? I don't think it would be all that slow. It is just a function > call. And I don't think that the getitem slot is really used all that > frequently (in a loop) for buffer type objects. I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python. > I've been thinking that refetching the ptr/len is the right fix. So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle. > Part of the neglect is also based on Guido's ambivalence. Part is > that I haven't needed more from it. The day that I do, then I'll > code it up :-) But that doesn't help the "generic" case, > unfortunately. I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising . From mwh at python.net Sat Jun 2 09:09:07 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 08:09:07 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 02:13:49 -0400" References: Message-ID: "Tim Peters" writes: > [Michael Hudson] > > Actually this crash was dict_print (I always forget about tp_print...). > > We all should . > > > It's pretty easy to mend: [snip] > > I'm not sure this stops still more Machiavellian behaviour from > > crashing the interpreter, > > Alas, it doesn't. No, that's what my "dict[Machiavelli()] = Machiavelli()" example was demonstrating. If noone beats me to it, I'll post a better fix to sf next week, complete with test-cases and suitably "encouraging" comments. I can't easily see other examples of the problem; there certainly might be things you could do with comparisons that could trigger crashes, but that code's so hairy that it's almost impossible for me to be sure. There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare > > and you can certainly get items being printed more than once or not > > at all. I'm not sure this last is a problem; > > Those don't matter: in a long tradition, we buy "safety" not only at the > cost of bloating the code, but also by making the true behavior in case of > mutation unpredictable & inexplicable. This is what I thought. [snip] > > if the user's being this contrary there's only so much we can > > do to help him or her. > > I'd prefer a similar internal immutable-dict trick that raised an exception > if the user was pushing Python into a corner where "blow up or do something > baffling" were its only choices. That would render the original example > illegal, of course. But would that be a bad thing? It's hard to see how. > What *should* it mean when the user invokes an operation on a > container and mutates the container during that operation? I don't think there's a meaning you can attach to this kind of behaviour. The "immutable dict trick" looks better the more I think about it, but I guess that will have to wait until Guido gets back from the sun... Cheers, M. -- incidentally, asking why things are "left out of the language" is a good sign that the asker is fairly clueless. -- Erik Naggum, comp.lang.lisp From gstein at lyra.org Sat Jun 2 09:40:05 2001 From: gstein at lyra.org (Greg Stein) Date: Sat, 2 Jun 2001 00:40:05 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, Jun 02, 2001 at 02:34:43AM -0400 References: <20010601225203.R23560@lyra.org> Message-ID: <20010602004005.F23560@lyra.org> On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote: > [Tim] > > If after > > > > b = buffer(some_object) > > > > b.__getitem__ needed to refetch the info between > > > > b[i] > > and > > b[i+1] > > > > I expect it would be so slow even Greg wouldn't want it anymore. > > [Greg] > > Huh? I don't think it would be all that slow. It is just a function > > call. And I don't think that the getitem slot is really used all that > > frequently (in a loop) for buffer type objects. > > I expect they index into the buffer memory directly then, right? Then for > buffers obtained from mutable objects, any such loop is unsafe in the > absence of the GIL, or even in its presence if the loop contains code that > may call back into Python. Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common. > > I've been thinking that refetching the ptr/len is the right fix. > > So is calling __getitem__ all the time then, unless you want to dance on the > razor's edge. The idea that you can safely "borrow" memory from a mutable > object without copying it is brittle. Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake. > > Part of the neglect is also based on Guido's ambivalence. Part is > > that I haven't needed more from it. The day that I do, then I'll > > code it up :-) But that doesn't help the "generic" case, > > unfortunately. > > I take that as "yes" to my "nobody cares about it enough to maintain it?". > In that light, Guido's ambivalence is indeed surprising . Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Sat Jun 2 10:17:39 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 04:17:39 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > ... > If noone beats me to it, I'll post a better fix to sf next week, > complete with test-cases and suitably "encouraging" comments. Ah, no need -- looks like I was doing that while you were writing this. Checked in already. So long as we're happy to settle for senseless results that simply don't blow up, the only other trick you really needed was to save away the value in a local vrbl and incref it across the key->string bit; then you don't have to worry about key->string deleting the value, or about the table entry it lived in going away (because you get the value from the (still-incref'ed) *local* vrbl later, not from the table again). > I can't easily see other examples of the problem; there certainly > might be things you could do with comparisons that could trigger > crashes, but that code's so hairy that it's almost impossible for me > to be sure. It's easy to be sure: any code that tries to remember anything about a dict (ditto any mutable object) across a "dangerous" call, other than the mere address of the object, is a place you *can* provoke a core dump. It may not be easy to provoke, and a given provoking test case may not fail across all platforms, or even every time you run it on a single platform, but it's "an obvious" hole all the same. From tismer at tismer.com Sat Jun 2 11:49:35 2001 From: tismer at tismer.com (Christian Tismer) Date: Sat, 02 Jun 2001 11:49:35 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> Message-ID: <3B18B6AE.88EA6926@tismer.com> Thomas Wouters wrote: > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > Yes, not suprisingly though... AFAIK the pyc format changed > > in every single version between 1.5.2 and 2.1. > > Worse, it's changed several times between each release :) But I didn't use .pyc at all, just a marshalled code object. There are no version headers or such. The same object worked in fact for Py 1.5.2 and 2.0, but no longer with 2.1 . I debugged the unmarshalling and saw what happened: The new code objects with their new scoping features were the problem. The new structures were simply added, and there is no way to skip these for older code objects, since there isn't any info. Some option for marshal to umarshal old-style code objects would ave helped. But then, I'm not sure if the opcodes are still assigned the same way in 2.1, or if there was some movement? This would kill it anyway. ciao - chris (now looking for another cheap way to do something invisible in Python without installing *anything* ) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mal at lemburg.com Sat Jun 2 13:09:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 02 Jun 2001 13:09:13 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> <3B18B6AE.88EA6926@tismer.com> Message-ID: <3B18C958.598A9891@lemburg.com> Christian Tismer wrote: > > Thomas Wouters wrote: > > > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > > > Yes, not suprisingly though... AFAIK the pyc format changed > > > in every single version between 1.5.2 and 2.1. > > > > Worse, it's changed several times between each release :) > > But I didn't use .pyc at all, just a marshalled code object. That's the point: the header in pyc files is meant to signal the incompatibility of the following code object. Perhaps we should moev this version information into the marshal format of code objects themselves... > There are no version headers or such. > The same object worked in fact for Py 1.5.2 and 2.0, but no > longer with 2.1 . > I debugged the unmarshalling and saw what happened: > The new code objects with their new scoping features were > the problem. The new structures were simply added, and there > is no way to skip these for older code objects, since there > isn't any info. > Some option for marshal to umarshal old-style code objects > would ave helped. > But then, I'm not sure if the opcodes are still assigned > the same way in 2.1, or if there was some movement? This would > kill it anyway. AFAIK, the assignments did not change, but several opcodes were added in 2.1, so code compiled in 2.1 will no run in 2.0. > ciao - chris > > (now looking for another cheap way to do something invisible in > Python without installing *anything* ) Why don't you use freeze or py2exe or Gordon's installer for these one file executables ? Alternatively, you should check the Python version and make sure that it matches the one used for compiling the byte code. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Sat Jun 2 13:40:56 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 12:40:56 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 04:17:39 -0400" References: Message-ID: "Tim Peters" writes: > > I can't easily see other examples of the problem; there certainly > > might be things you could do with comparisons that could trigger > > crashes, but that code's so hairy that it's almost impossible for me > > to be sure. > > It's easy to be sure: any code that tries to remember anything about a dict > (ditto any mutable object) across a "dangerous" call, other than the mere > address of the object, is a place you *can* provoke a core dump. It may not > be easy to provoke, and a given provoking test case may not fail across all > platforms, or even every time you run it on a single platform, but it's "an > obvious" hole all the same. Ah, like this one: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0 dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one). Cheers, M. Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { Here's another test case to work out the second of those new if statements: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli3: def __init__(self, id): self.id = id def __eq__(self, other): if self.id == other.id: dict.clear() return 1 else: return 0 def __repr__(self): return "%s(%s)"%(self.__class__.__name__, self.id) def __hash__(self): return 0 dict[Machiavelli3(1)] = Machiavelli3(0) dict[Machiavelli3(2)] = Machiavelli3(0) print dict[Machiavelli3(2)] -- M-x psych[TAB][RETURN] -- try it From pedroni at inf.ethz.ch Sat Jun 2 20:58:55 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Sat, 2 Jun 2001 20:58:55 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? Message-ID: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Hi. Is this a case that only the BDFL could know and pronounce on ... or I'm missing somenthing ... Thanks for any feedback, Samuele Pedroni. ----- Original Message ----- From: Samuele Pedroni To: Sent: Friday, June 01, 2001 2:49 PM Subject: [Python-Dev] __xxxattr__ caching semantic > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). > > regards, Samuele Pedroni. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > From tim.one at home.com Sun Jun 3 00:57:57 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 18:57:57 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > Is this a case that only the BDFL could know and pronounce on ... > or I'm missing somenthing ... The referenced URL http://www.python.org/doc/current/ref/customization.html appears irrelevant to me, so unsure what you're asking about. Perhaps http://www.python.org/doc/current/ref/attribute-access.html was intended? If so, the these methods are cached in the class object at class definition time; therefore, they cannot be changed after the class definition is executed. there doesn't mean exactly what it says: it's trying to say that the __XXXattr__ methods *inherited from base classes* (if any) are cached in the class object at class definition time, so that changing them in the base classes later has no effect on the derived class. It should be clearer. A direct class setattr can still change them; indirect assignment via class.__dict__ is ineffective for the __dict__, __bases__, __name__, __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create a dict entry then, but class getattr doesn't look in the dict to get the value of these specific keys). Didn't understand the program snippet. Much of this is due to hoary optimizations and I agree is ill-documented. I hope Guido's current rework of all this stuff will leave the endcases more explainable. > ----- Original Message ----- > From: Samuele Pedroni > To: > Sent: Friday, June 01, 2001 2:49 PM > Subject: [Python-Dev] __xxxattr__ caching semantic > > > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). From pedroni at inf.ethz.ch Sun Jun 3 01:46:42 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Sun, 3 Jun 2001 01:46:42 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? References: Message-ID: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Hi. Thanks a lot for the answer, and sorry for the ill-formed question. [Tim Peters] > [Samuele Pedroni] > > Is this a case that only the BDFL could know and pronounce on ... > > or I'm missing somenthing ... > > The referenced URL > > http://www.python.org/doc/current/ref/customization.html > > appears irrelevant to me, so unsure what you're asking about. Perhaps > > http://www.python.org/doc/current/ref/attribute-access.html > > was intended? If so, the Yes, pilot error with browser and copy&pasted, I intented the latter. > these methods are cached in the class object at class > definition time; therefore, they cannot be changed after > the class definition is executed. > > there doesn't mean exactly what it says: it's trying to say that the > __XXXattr__ methods *inherited from base classes* (if any) are cached in the > class object at class definition time, so that changing them in the base > classes later has no effect on the derived class. It should be clearer. > > A direct class setattr can still change them; indirect assignment via > class.__dict__ is ineffective for the __dict__, __bases__, __name__, > __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create > a dict entry then, but class getattr doesn't look in the dict to get the > value of these specific keys). > This matches what I understood reading CPython C code (yes I did that too ), and what the snippets was trying to point out. And I see the problem with derived classes too. > Didn't understand the program snippet. Sorry it is not one snippet, but the 4 variants should be considered indipendently. > > Much of this is due to hoary optimizations and I agree is ill-documented. I > hope Guido's current rework of all this stuff will leave the endcases more > explainable. That will be a lot to work for porting it to jython . In any case the manual is really not clear (euphemism ) about this. The point is that jython implements the letter of the manual, and even extend the caching opt to some others __magic__ methods. I wanted to know the intended behaviour in order to fix that in jython. regards Samuele Pedroni. From tim.one at home.com Sun Jun 3 01:56:34 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 19:56:34 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > ... > The point is that jython implements the letter of the manual, and even > extend the caching opt to some others __magic__ methods. I wanted to > know the intended behaviour in order to fix that in jython. You got that one right the first time: this requires BDFL pronouncement! As semantically significant optimizations (the only reason for caching __getattr__, e.g.) creep into the code but the docs lag behind, it gets more and more unclear what's mandatory behavior and what's implementation-defined. This came up a couple weeks ago again in the context of what, exactly, rich comparisons are supposed to do in all cases. After poking holes in everything Guido wrote, he turned it around and told me to write up what I think it should say (which I have yet to do, as it's time-consuming and it appears some of the current CPython behavior is at least partly accidental -- but unclear exactly which parts). So don't be surprised if the same trick gets played on you ... From tim.one at home.com Sun Jun 3 06:04:57 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 00:04:57 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Ah, like this one: > > dict = {} > > # let's force dict to malloc its table > for i in range(1,10): > dict[i] = i > > class Machiavelli2: > def __eq__(self, other): > dict.clear() > return 1 > def __hash__(self): > return 0 > > dict[Machiavelli2()] = Machiavelli2() > > print dict[Machiavelli2()] Told you it was easy . > I'll attach a patch, but it's another branch inside lookdict (though > not lookdict_string which is I guess the really performance sensitive > one). lookdict_string is crucial to Python's own performance. Dicts indexed by ints or class instances or ... are vital to other apps. > Index: dictobject.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v > retrieving revision 2.100 > diff -c -1 -r2.100 dictobject.c > *** dictobject.c 2001/06/02 08:27:39 2.100 > --- dictobject.c 2001/06/02 11:36:47 > *************** > *** 273,274 **** > --- 273,281 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { > *************** > *** 310,311 **** > --- 317,325 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { Then we have other problems. Note the comment before lookdict: Exceptions are never reported by this function, and outstanding exceptions are maintained. The patched code doesn't preserve that. Looking for "the first" unused or dummy slot isn't good enough either, as surely the user has the right to expect that after, e.g., d[m] = 1, d[m] retrieves 1. That is, picking a reusable slot "at random" doesn't respect the *semantics* of dict operations ("just because" the dict resized doesn't mean the key they're looking for went away!). It would be better in this case to go back to the top and start over. However, then an adversarial user can construct a case that never terminates. Unclear what to do. From tim.one at home.com Sun Jun 3 09:55:43 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 03:55:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010602004005.F23560@lyra.org> Message-ID: [Greg Stein] > ... > I was saying that, at the Python level, using a loop and doing b[i] into > a buffer/string/unicode object would seem to be relatively rare. b[0] > and stuff is reasonably common. Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed. >> I take that as "yes" to my "nobody cares about it enough to >> maintain it?". In that light, Guido's ambivalence is indeed >> surprising . > Eh? I'll maintain the thing, but you're confusing that with adding more > features into it. Different question. I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book . From gstein at lyra.org Sun Jun 3 11:10:36 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 3 Jun 2001 02:10:36 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, Jun 03, 2001 at 03:55:43AM -0400 References: <20010602004005.F23560@lyra.org> Message-ID: <20010603021036.U23560@lyra.org> On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote: > [Greg Stein] > > ... > > I was saying that, at the Python level, using a loop and doing b[i] into > > a buffer/string/unicode object would seem to be relatively rare. b[0] > > and stuff is reasonably common. > > Well, at the Python level buffer objects seem never to be used, probably I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often. > because all the people who know about them don't advertise it because it's > an easy way to provoke core dumps now. Easy? Depends on what you use them with. >... > >> I take that as "yes" to my "nobody cares about it enough to > >> maintain it?". In that light, Guido's ambivalence is indeed > >> surprising . > > > Eh? I'll maintain the thing, but you're confusing that with adding more > > features into it. Different question. > > I haven't asked for new features, just that what's already there get fixed: > Python-level buffer objects are unsafe, the docs remain incomplete, I'll fix the code. > there's > random stuff like file.readinto() that's not documented at all (could be > that's the only one -- it's certainly "discovered" on c.l.py often enough, > though), Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc. > and there are no buffer tests in the std test suite. The work to > introduce the type wasn't completed, nobody works on it, and finishing work > 3 years late doesn't count as "new feature" in my book . Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/ From dgoodger at bigfoot.com Sun Jun 3 16:39:42 2001 From: dgoodger at bigfoot.com (David Goodger) Date: Sun, 03 Jun 2001 10:39:42 -0400 Subject: [Python-Dev] new PEP candidates Message-ID: I have just posted three related PEP candidates to the Doc-SIG: - PEP: Docstring Processing System Framework http://mail.python.org/pipermail/doc-sig/2001-June/001855.html - PEP: DPS Generic Implementation Details http://mail.python.org/pipermail/doc-sig/2001-June/001856.html - PEP: Docstring Conventions http://mail.python.org/pipermail/doc-sig/2001-June/001857.html These are all part of the newly created Python Docstring Processing System project, http://docstring.sf.net. Barry: Please assign PEP numbers to these if possible. Once PEP numbers have been assigned, I will post to comp.lang.python. Thanks. A related project is the second draft of reStructuredText, a docstring markup syntax definition. The project is http://structuredtext.sf.net, and I've posted the following to Doc-SIG: - An Introduction to reStructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001858.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001859.html - reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001860.html - Python Extensions to the reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001861.html I am not seeking PEP status for reStructuredText at this time; I think it's one step too far removed from the Python language to warrant a PEP. If you think it *should* be a PEP, I will be happy to convert it. -- David Goodger dgoodger at bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net From mwh at python.net Sun Jun 3 23:47:48 2001 From: mwh at python.net (Michael Hudson) Date: 03 Jun 2001 22:47:48 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 00:04:57 -0400" References: Message-ID: "Tim Peters" writes: > It would be better in this case to go back to the top and start > over. Yes. What you checked in is obviously better. I'll stick to being the bearer of bad tidings... > However, then an adversarial user can construct a case that never > terminates. I seem to have done this - it was odd, though - it only loops when I bump the dict to fairly enormous preportions for reasons I don't really (want to) understand. > Unclear what to do. Not worrying about it seems entirely reasonable - I now have sitting on my hard drive the wierdest way of spelling "while 1: pass" *I've* ever seen. and-I'll-stop-poking-holes-now-ly y'rs m. -- The rapid establishment of social ties, even of a fleeting nature, advance not only that goal but its standing in the uberconscious mesh of communal psychic, subjective, and algorithmic interbeing. But I fear I'm restating the obvious. -- Will Ware, comp.lang.python From tim.one at home.com Mon Jun 4 01:03:31 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 19:03:31 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Tim] >> It would be better in this case to go back to the top and start >> over. [Michael Hudson] > Yes. What you checked in is obviously better. I'll stick to being > the bearer of bad tidings... Hey, if it's fun, do whatever what you want! If you hadn't provoked me, I would have let it slide. Guido only cares about the end result . >> However, then an adversarial user can construct a case that never >> terminates. > I seem to have done this - it was odd, though - it only loops when I > bump the dict to fairly enormous preportions for reasons I don't > really (want to) understand. Pass it on. I deliberately "started over" via a recursive call instead of a goto so that an offending program would eventually die with a stack fault instead of just running forever. So if you're seeing something run forever, it may be a different problem. >> Unclear what to do. > Not worrying about it seems entirely reasonable I don't think anyone is happy leaving an exploitable hole in Python -- we endure enormous pain to plug those. Except, I guess, for buffer objects . I simply haven't thought of a good and efficient way to plug this one. Implementing an "internal immutable dict" type appeals to me, but it conflicts with that the affected routines believe to the core of their souls that exceptions raised during comparisons are to be ignored -- and raising a "hey, you can't change the dict *now*!" exception doesn't do the user any good if they never see it. Would plug the hole, but an *innocent* user would never know why their program failed to work as (probably) expected. From tim.one at home.com Mon Jun 4 02:38:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 20:38:53 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010603021036.U23560@lyra.org> Message-ID: [Tim] >> because all the people who know about them don't advertise it >> because it's an easy way to provoke core dumps now. [Greg Stein] > Easy? Depends on what you use them with. "Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend). >> I haven't asked for new features, just that what's already there get >> fixed: Python-level buffer objects are unsafe > I'll fix the code. Thank you! >> the docs remain incomplete, there's random stuff like file.readinto() >> that's not documented at all (could be that's the only one -- it's >> certainly "discovered" on c.l.py often enough, though), > Find another goat to screw for that one. I don't know anything about it. > > Hmm... Using the "annotate" feature of ViewCVS, I see that Guido > added it. Go blame him if you want to scream about that function and > its lack of doc. I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess . >> and there are no buffer tests in the std test suite. The work to >> introduce the type wasn't completed, nobody works on it, and >> finishing work 3 years late doesn't count as "new feature" in my book > Now you're just being bothersome. You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them. > You want all that stuff, then feel free. "All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards. > I'll volunteer to do the code. You can go beat some heads, or find other > volunteers. Anyone else care to chip in? > I'll do the code fixing just to placate you, and to get all this ranting > about the buffer object to quiet down, but not because I'm joyful > to do it. OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick . From Barrett at stsci.edu Mon Jun 4 15:22:14 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon, 04 Jun 2001 09:22:14 -0400 Subject: [Python-Dev] strop vs. string References: <3B1214B3.9A4C295D@lemburg.com> Message-ID: <3B1B8B86.68E99328@STScI.Edu> "M.-A. Lemburg" wrote: > > Tim Peters wrote: > > > > [Tim] > > > About combining strop and buffers and strings, don't forget > > > unicodeobject.c: that's got oodles of basically duplicate code too. > > > /F suggested dealing with the minor differences via maintaining one > > > code file that gets compiled multiple times w/ appropriate #defines. > > > > [MAL] > > > Hmm, that only saves us a few kB in source, but certainly not > > > in the object files. > > > > That's not the point. Manually duplicated code blocks always get out of > > synch, as people fix bugs in, or enhance, one of them but don't even know > > about the others. /F brought this up after I pissed away a few hours trying > > to repair one of these in all places, and he noted that strop.replace() and > > string.replace() are woefully inefficient anyway. > > Ok, so what we'd need is a bunch of generic low-level string > operations: one set for 8-bit and one for 16-bit code. > > Looking at unicodeobject.c it seems that the section "Helpers" would > be a good start, plus perhaps a few bits from the method implementations > refactored to form a low-level string template library. > > Perhaps we should move this code into > a file stringhelpers.h which then gets included by stringobject.c > and unicodeobject.c with appropriate #defines set up for > 8-bit strings and for Unicode. > > > > The better idea would be making the types subclass from a generic > > > abstract string object -- I just don't know how this will be > > > possible with Guido's type patches. We'll just have to wait, > > > I guess. From fdrake at acm.org Mon Jun 4 16:07:37 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 10:07:37 -0400 (EDT) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> References: <3B1214B3.9A4C295D@lemburg.com> <3B1B8B86.68E99328@STScI.Edu> Message-ID: <15131.38441.301314.46009@cj42289-a.reston1.va.home.com> Paul Barrett writes: > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. I've seen no mention > of their use for binary data objects, such as multidimensional arrays > and matrices. Will the buffer object also support these objects? If > no, then I suggest it be renamed to one that is less generic and more > descriptive. In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure. > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, because the current design/implementation falls far > short of what I would expect for a buffer object. First, it is overly > complex: the support for multiple buffers does not appear necessary. > Second, the dangling pointer issue has not been resolved. I suggest I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection. > the addition of lock flag which indicates that the data is currently > inaccessible, ie. that data and/or data pointer is in the process of > being modified. > > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Mon Jun 4 18:29:53 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 11:29:53 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist Message-ID: <15131.46977.861815.323386@beluga.mojam.com> I recently upgraded to Mandrake 8.0. I find that the readline module is no longer getting built. When building, it builds rgbimb followed immediately by crypt. Readline, which is tested for in between, is not built. Apparently, it can't find one of the libraries required to build it. On my system, both readline and termcap are in /lib. Neither has a static version available and neither as a plain .so file available. The .so file always has a version number tacked onto the end: % ls -l /lib/libtermcap* /lib/libreadline* lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 If I create the necessary .so symlinks it builds okay. Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first one), but if it is valid for shared libraries to be installed with only a version-numbered .so file, then it seems to me that distutils ought to handle that. There are several programs in /usr/bin on my machine that seem to be dynamically linked to libreadline. In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, which suggests that the .so-without version number is valid as far as ld is concerned. Skip From Greg.Wilson at baltimore.com Mon Jun 4 19:33:29 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:33:29 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> The 'struct' module allows packing and unpacking orders to be specified, but doesn't provide a hook to report on the order used by the machine the script is running on. As I'm likely going to be using this module in future runs of my course, I'd like to add 'struct.getorder()', which would return either "<" or ">" (the characters used to signal little-endian and big-endian respectively). Does this duplicate something in some other standard module? Does it seem like a sensible idea? Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From fdrake at acm.org Mon Jun 4 19:42:28 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 13:42:28 -0400 (EDT) Subject: [Python-Dev] struct.getorder() ? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> Message-ID: <15131.51332.73137.795543@cj42289-a.reston1.va.home.com> Greg Wilson writes: > The 'struct' module allows packing and unpacking > orders to be specified, but doesn't provide a hook > to report on the order used by the machine the Python 2.0 introduced sys.byteorder; check it out: http://www.python.org/doc/current/lib/module-sys.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Greg.Wilson at baltimore.com Mon Jun 4 19:41:45 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:41:45 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1E@nsamcanms1.ca.baltimore.com> > Python 2.0 introduced sys.byteorder; check it out: > http://www.python.org/doc/current/lib/module-sys.html Woo hoo! Thanks, Fred --- should've guessed someone would be ahead of me :-). Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From barry at scottb.demon.co.uk Mon Jun 4 20:00:05 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Mon, 4 Jun 2001 19:00:05 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: <000201c0ed20$2f295c30$060210ac@private> Eric wrote: > While I'm at it, I should note that the design of the 11 was ancestral > to both the 8088 and 68000 microprocessors, and thus to essentially > every new general-purpose computer designed in the last fifteen years. The key to PDP-11 and VAX was lots of registers all a like and rich addressing modes for the instructions. The 8088 is very far from this design, its owes its design more to 4004 then the PDP-11. However the 68000 is the closer, but not as nice to program as there are too many special cases in its instruction set for my liking. BArry From mwh at python.net Mon Jun 4 20:05:10 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 19:05:10 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 11:29:53 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I recently upgraded to Mandrake 8.0. I find that the readline > module is no longer getting built. When building, it builds rgbimb > followed immediately by crypt. Readline, which is tested for in > between, is not built. Apparently, it can't find one of the > libraries required to build it. On my system, both readline and > termcap are in /lib. Neither has a static version available and > neither as a plain .so file available. The .so file always has a > version number tacked onto the end: > > % ls -l /lib/libtermcap* /lib/libreadline* > lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 > -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 > lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 > -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 > > If I create the necessary .so symlinks it builds okay. > > Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first > one), but if it is valid for shared libraries to be installed with > only a version-numbered .so file, then it seems to me that distutils > ought to handle that. Hmm. Does compiling a proggie $ gcc foo.c -lreadline work? It doesn't here if I move libreadline.so & libreadline.a out of the way. If the C compiler isn't going to find readline, there ain't much point distutils trying to find it... > There are several programs in /usr/bin on my machine that seem to be > dynamically linked to libreadline. Those things will be directly linked to libreadline.so.whatever; I believe the libfoo.so files are only for the (compile time) linker's benefit. > In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, > which suggests that the .so-without version number is valid as far > as ld is concerned. ld != ld.so. Do you need a readline-devel package or something? Cheers, M. -- It's actually a corruption of "starling". They used to be carried. Since they weighed a full pound (hence the name), they had to be carried by two starlings in tandem, with a line between them. -- Alan J Rosenthal explains "Pounds Sterling" on asr From mwh at python.net Mon Jun 4 21:01:10 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 20:01:10 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 19:03:31 -0400" References: Message-ID: "Tim Peters" writes: > >> However, then an adversarial user can construct a case that never > >> terminates. > > > I seem to have done this - it was odd, though - it only loops when I > > bump the dict to fairly enormous preportions for reasons I don't > > really (want to) understand. > > Pass it on. I deliberately "started over" via a recursive call instead of a > goto so that an offending program would eventually die with a stack fault > instead of just running forever. So if you're seeing something run forever, > it may be a different problem. I left it running overnight, and it terminated! (with a KeyError). I can't say I really understand what's going on, but I'm in Exam Hell at the moment (for the last time! Yippee!), so don't have any spare cycles to think about it hard. Anyway, this is what I was running: dict = {} # let's force dict to malloc its table for i in range(1,10000): dict[i] = i hashcode = 0 class Machiavelli2: def __eq__(self, other): global hashcode d2 = dict.copy() dict.clear() hashcode += 1 for k,v in d2.items(): dict[k] = v return 1 def __hash__(self): return hashcode dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] If you thought my last test case was contrived, I look forward to you finding adjectives for this one... Cheers, M. -- (ps: don't feed the lawyers: they just lose their fear of humans) -- Peter Wood, comp.lang.lisp From barry at digicool.com Mon Jun 4 21:42:34 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 4 Jun 2001 15:42:34 -0400 Subject: [Python-Dev] Status of 2.0.1? Message-ID: <15131.58538.121723.671374@anthem.wooz.org> I've just fixed two buglets in the regression test suite for Python 2.0.1 (release20-maint branch). Now I get the following results from regrtest: 88 tests OK. 20 tests skipped: test_al test_audioop test_cd test_cl test_dbm test_dl test_gl test_imageop test_imgfile test_largefile test_linuxaudiodev test_minidom test_nis test_pyexpat test_rgbimg test_sax test_sunaudiodev test_timing test_winreg test_winsound Has anybody else tested out the 2.0.1 branch on anything? I'm going to run some quick tests with Mailman 2.0.x on Python 2.0.1 over the next hour or so. I'm just wondering what's left to do for this release, and how I can help out. -Barry From esr at thyrsus.com Mon Jun 4 22:11:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 16:11:14 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <000201c0ed20$2f295c30$060210ac@private>; from barry@scottb.demon.co.uk on Mon, Jun 04, 2001 at 07:00:05PM +0100 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> Message-ID: <20010604161114.A20979@thyrsus.com> Barry Scott : > Eric wrote: > > While I'm at it, I should note that the design of the 11 was ancestral > > to both the 8088 and 68000 microprocessors, and thus to essentially > > every new general-purpose computer designed in the last fifteen years. > > The key to PDP-11 and VAX was lots of registers all a like and rich > addressing modes for the instructions. > > The 8088 is very far from this design, its owes its design more to > 4004 then the PDP-11. Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, which was descended from the 11. Admiitedly, in the chain of transmission here were two stages of redesign so bad that the connection got really tenuous. -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From skip at pobox.com Mon Jun 4 22:49:07 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 15:49:07 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: <15131.62531.595208.65994@beluga.mojam.com> [my readline woes snipped] Michael> Hmm. Does compiling a proggie Michael> $ gcc foo.c -lreadline Michael> work? It doesn't here if I move libreadline.so & libreadline.a Michael> out of the way. Yup, it does: beluga:tmp% cc -o foo foo.c -lreadline -ltermcap beluga:tmp% ./foo >>sdfsdfsdf sdfsdfsdf (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) In this case, foo.c is #include #include #include main() { printf("%s\n", readline(">>" )); } Michael> Do you need a readline-devel package or something? Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" does list readline-devel as the provider. I just reinstalled it using --force. Now the .so symlinks are there. Go figure... Oh well, probably ought to drop it unless another Mandrake user complains. I'm really amazed at how many packages Mandrake chose *not* to install even though I selected all the groups during install and was installing into fresh / and /usr partitions. I've been dribbling various packages in bit-by-bit as I've discovered omissions. In the past I've also noticed files apparently not installed even though the packages that were supposed to provide them were installed. Skip From guido at digicool.com Mon Jun 4 23:03:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 04 Jun 2001 17:03:35 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: Your message of "Tue, 29 May 2001 02:15:07 EDT." References: Message-ID: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > > used to extend Idle. We've used this extensively, building entire > > "applications" as Idle extensions. > > > > Now that we're moving to Python 2.1, we find the same old directions > > for extending Idle (in extend.txt), but there appears to be no > > extend.py in Idle-0.8. > > > > Does anyone know how we can add extensions to Idle-0.8? It's simpler than before. Extensions are now loaded simply by being named in config.txt (or any of the other custom configuration files). For example, ZoomHeight.py is a very simple extension; it is loaded because of the line [ZoomHeight] somewhere in config.txt. The interface for extensions is the same as before; ZoomHeight.py hasn't changed since 1999. I'll update extend.txt. Can someone forward this to the original asker of the question, or to the list where it was posted? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Jun 4 23:03:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 16:03:58 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> Message-ID: <15131.63422.695297.393477@beluga.mojam.com> Eric> Yes, but the 4004 was designed as a sort of lobotomized imitation Eric> of the 65xx, which was descended from the 11. Really? I was always under the impression the 4004 was considered the first microprocessor. The page below says that and gives a date of 1971 for it. I have no idea if the author is correct, just that what he says agrees with my memory. He does seem to have an impressive collection of old computer iron: http://www.piercefuller.com/collect/i4004/ I haven't found a statement about the origins of the 6502, but this page suggests that commercial computers were being made from 8080's before 6502's: http://www.speer.org/2backup/pcbs_pch.html Ah, wait a minute... This page: http://www.geocities.com/SiliconValley/Byte/6508/6502/english/versoes.htm says the 6502 was descended from the 6800. I'm getting less and less convinced that the 4004 somehow descended from the 65xx family. (Maybe we should shift this thread to the always entertaining folks at comp.arch... ;-) Skip From esr at thyrsus.com Mon Jun 4 23:19:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 17:19:08 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <15131.63422.695297.393477@beluga.mojam.com>; from skip@pobox.com on Mon, Jun 04, 2001 at 04:03:58PM -0500 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> Message-ID: <20010604171908.A21831@thyrsus.com> Skip Montanaro : > Really? I was always under the impression the 4004 was considered the first > microprocessor. The page below says that and gives a date of 1971 for it. First sentence is widely believed, but there was an earlier micro called the Star-8 designed at Burroughs that has been almost completely forgotten. I only know about it because I worked there in 1980 with one of the people who designed it. I think I had a brain fart and it's the Z80 that was descended from the 6502. I was going by a remark in some old lecture notes. I've got a copy of the definitive reference on history of computer architecture and will check. -- Eric S. Raymond "Extremism in the defense of liberty is no vice; moderation in the pursuit of justice is no virtue." -- Barry Goldwater (actually written by Karl Hess) From mwh at python.net Mon Jun 4 23:55:34 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 22:55:34 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 15:49:07 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: Skip Montanaro writes: > [my readline woes snipped] > > Michael> Hmm. Does compiling a proggie > > Michael> $ gcc foo.c -lreadline > > Michael> work? It doesn't here if I move libreadline.so & libreadline.a > Michael> out of the way. > > Yup, it does: > > beluga:tmp% cc -o foo foo.c -lreadline -ltermcap > beluga:tmp% ./foo > >>sdfsdfsdf > sdfsdfsdf > > (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) Odd. What does the output of $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose look like? In particular the bit at the end where you get things like: attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.so failed attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.a failed attempt to open /usr/i386-redhat-linux/lib/libreadline.so failed attempt to open /usr/i386-redhat-linux/lib/libreadline.a failed attempt to open /usr/bin/../lib/libreadline.so succeeded -lreadline (/usr/bin/../lib/libreadline.so) (this is more for my personal curiosity than any important reason). > Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" > does list readline-devel as the provider. I just reinstalled it using > --force. Now the .so symlinks are there. Go figure... No :-) > Oh well, probably ought to drop it unless another Mandrake user complains. Sounds reasonable. Cheers, M. -- After a heavy night I travelled on, my face toward home - the comma being by no means guaranteed. -- paraphrased from cam.misc From tim.one at home.com Mon Jun 4 23:58:48 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 4 Jun 2001 17:58:48 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can someone forward this to the original asker of the question, or to > the list where it was posted? Done. Thanks! From skip at pobox.com Tue Jun 5 03:01:01 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 20:01:01 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: <15132.12109.914981.110774@beluga.mojam.com> >> (This after deleting both /lib/libreadline.so and >> /lib/libhistory.so.) Michael> Odd. What does the output of Michael> $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose Michael> look like? Well, what it looks like is "Skip's a dunce...". Turns out there was a libreadline.so symlink /usr/lib also. It found that. When I deleted that it found /usr/lib/libreadline.a. Getting rid of that caused the link to (finally) fail. With just the version-based .so files cc apparently can't do the trick. Sorry to have wasted the bandwidth. Skip From skip at pobox.com Tue Jun 5 03:16:00 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 20:16:00 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604171908.A21831@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> <20010604171908.A21831@thyrsus.com> Message-ID: <15132.13008.429800.585157@beluga.mojam.com> Eric> Skip Montanaro : >> Really? I was always under the impression the 4004 was considered >> the first microprocessor. The page below says that and gives a date >> of 1971 for it. Eric> First sentence is widely believed, but there was an earlier micro Eric> called the Star-8 designed at Burroughs that has been almost Eric> completely forgotten. There was also a GE-8 (I think that was the name) developed at GE's R&D Center in the early 1970's timeframe - long before my time there. It was apparently very competitive with the other microprocessors produced about that time but never saw the light of day. I suspect that was at least due in part to the fact that GE built mainframes back then. Skip From tim.one at home.com Tue Jun 5 06:07:27 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 00:07:27 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson, taking a break from exams] > I left it running overnight, and it terminated! (with a KeyError). I > can't say I really understand what's going on, but I'm in Exam Hell at > the moment (for the last time! Yippee!), so don't have any spare > cycles to think about it hard. Good luck! I really shouldn't tell you this now, but the real reason people dread turning 30, 40, 50, 60-- and so on --is that every 10th birthday starting at 30 they test you *again*! On every course you ever took. It's grueling. The penalty for failure is severe: flunk just one review exam, and they pick a date at random over the following 10 years for you to die. No point fighting it, it's just civilization's nasty little secret. This is why life expectancy correlates with education, but it does appear that the human limit for remembering both plane geometry and the names of hundreds of dead psychopaths is about 120 years. In the meantime, I built a test case to tickle stack overflow directly, and it does so quickly: class Yuck: def __init__(self): self.i = 0 def make_dangerous(self): self.i = 1 def __hash__(self): # direct to slot 4 in table of size 8; slot 12 when size 16 return 4 + 8 def __eq__(self, other): if self.i == 0: # leave dict alone pass elif self.i == 1: # fiddle to 16 slots self.__fill_dict(6) self.i = 2 else: # fiddle to 8 slots self.__fill_dict(4) self.i = 1 return 1 def __fill_dict(self, n): self.i = 0 dict.clear() for i in range(n): dict[i] = i dict[self] = "OK!" y = Yuck() dict = {y: "OK!"} z = Yuck() y.make_dangerous() print dict[z] It just arranges to move y to a different slot in a different-sized table each time __eq__ is invoked, alternating between slot 4 in a size-8 table and slot 12 in a size-16 table. However, if I stick "print self.i" at the start of __eq__, it dies with a KeyError instead! That's why I'm mentioning it -- could be the same misdirection you're seeing. I can't account for the KeyError in any rational way: under Windows, it's actually hitting a stack overflow in the bowels of the system malloc() then. Windows "recovers" from that and presses on. Everything that happens after appears to be an accident. win98-as-usual-ly y'rs - tim PS: You'll be tested on this, too . From greg at cosc.canterbury.ac.nz Tue Jun 5 07:00:30 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Jun 2001 17:00:30 +1200 (NZST) Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> "Eric S. Raymond" : > I think it's significant that MMX > instructions and so forth entered the Intel line to support *games*, > not Navier-Stokes calculations. But when version 1.0 of FlashFlood! comes out, requiring high-quality real-time hydrodynamics simulation, Navier-Stokes calculations will suddenly become very important... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jun 5 07:18:50 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:18:50 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: [Paul Barrett] > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface. > I've seen no mention of their use for binary data objects, I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g., >>> import array >>> a = array.array('i') >>> a.append(2) >>> a.append(3) >>> a array('i', [2, 3]) >>> b = buffer(a) >>> len(b) 8 >>> [b[i] for i in range(len(b))] ['\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00'] >>> While of *some* conceivable use, that's not exactly destined to become wildly popular . > such as multidimensional arrays and matrices. Since core Python has no such things, of course it doesn't use buffers for those either. > Will the buffer object also support these objects? In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot. > ... > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, Or do you mean redesigned? > because the current design/implementation falls far short of what I > would expect for a buffer object. First, it is overly complex: the > support for multiple buffers does not appear necessary. AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there. > Second, the dangling pointer issue has not been resolved. I expect Greg will fix that now. > I suggest the addition of lock flag which indicates that the data is > currently inaccessible, ie. that data and/or data pointer is in the > process of being modified. To sell that (but please save it for the PEP ) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!". > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; > > But I'm guessing my proposal is way off base. Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. A PEP is always a good idea. From aahz at rahul.net Tue Jun 5 07:41:28 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 4 Jun 2001 22:41:28 -0700 (PDT) Subject: [Python-Dev] strop vs. string In-Reply-To: from "Tim Peters" at Jun 05, 2001 01:18:50 AM Message-ID: <20010605054129.933C199C83@waltz.rahul.net> Tim Peters wrote: > > If you do pursue this, please please please boost the rf_length field! An > int is too small to hold real-life sizes anymore, and "large files" are > becoming common even on 32-bit boxes. Python needs to grow a wholly > supported way to pass 8-byte ints around (and it looks like I'll be adding > that to the struct module, possibly to the array module and marshal too). Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From tim.one at home.com Tue Jun 5 07:53:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:53:26 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > So maybe there's a market for 128-bit floats after all. I think very small. There's a much larger market for 128-bit float *registers*, though -- in the "treat it as 2 64-bit, or 4 32-bit, floats, and operate on them in parallel" sense. That's the baby vector register view, and is already happening. > I'm still skeptical about how likely those applications are to > influence the architecture of general-purpose processors. I saw a > study once that said heavy-duty scientific floating point only > accounts for about 2% of the computing market -- and I think it's > significant that MMX instructions and so forth entered the Intel > line to support *games*, not Navier-Stokes calculations. Heh. I used to wonder about that, but not any more: games may have no more than entertainment (sometimes disguised as education ) in mind, but what do the latest & greatest games do? Strive to simulate physical reality (sometimes with altered physical laws), just as closely as possible. Whether it's ray-tracing, effective motion-compression, or N-body simulations, games are easily as demanding as what computational chemists do. A difference is that general-purpose *compilers* aren't being taught how to use these "new" architectural gimmicks. All that new hardware sits unused unless you've got an app dipping into assembler, or into a hand-coded utility library written in assembler. The *general* market for pure floating-point can barely support what's left of the supercomputer industry anymore (btw, Cray never became a billion-dollar company even in its heyday, and what's left of them gets passed around for peanuts now). > That 2% will have to get a lot bigger before I can see Intel doubling > its word size again. It's not just the processor design; the word size > has huge implications for buses, memory controllers, and the whole > system architecture. Intel is just now getting its foot wet with with 64-bit boxes. That was old news to me 20 years ago. All I hope to see 20 years from now is that somewhere along the way I got smart enough to drop computers and get a real life . by-then-the-whole-system-will-exist-in-the-superposition-of-a- single-plutonium-atom's-states-anyway-ly y'rs - tim From tim.one at home.com Tue Jun 5 07:55:48 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:55:48 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010605054129.933C199C83@waltz.rahul.net> Message-ID: [Aahz] > Hey! Are you discriminating against 128-bit ints? Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early . From MarkH at ActiveState.com Tue Jun 5 09:10:53 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 5 Jun 2001 17:10:53 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: > complex: the support for multiple buffers does not appear necessary. I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark. From mwh at python.net Tue Jun 5 10:39:24 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 5 Jun 2001 09:39:24 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: Haven't run your example yet as my machine's not on at the moment. On Tue, 5 Jun 2001, Tim Peters wrote: > However, if I stick "print self.i" at the start of __eq__, it dies > with a KeyError instead! That's why I'm mentioning it -- could be the > same misdirection you're seeing. I can't account for the KeyError in > any rational way: under Windows, it's actually hitting a stack > overflow in the bowels of the system malloc() then. Hmm. It's quite likely that PyMem_Malloc (or whatever) crapping out and returning NULL will get turned into a MemoryError, which will then get turned into a KeyError, isn't it? I could believe that malloc would set up some fancy sigsegv-type handlers for memory management purposes which then get called when it tramples all over the end of the stack. But I'm making this up as I go along... > Windows "recovers" from that and presses on. Everything that happens > after appears to be an accident. > > win98-as-usual-ly y'rs - tim Well, linux seems to be similarly inscrutable here. One problem is that this is a pig to run under the debugger - setting a breakpoint on lookdict isn't terribly interesting way to spend your time. I suppose you could just set the breakpoint on the recursive call... later. > PS: You'll be tested on this, too . Oh, piss off . Cheers, M. From guido at digicool.com Tue Jun 5 11:07:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 05:07:34 -0400 Subject: [Python-Dev] Happy event Message-ID: <200106050907.FAA08198@cj20424-a.reston1.va.home.com> I just wanted to send a note about a happy event in the Python family. Jeremy Hylton and his wife became the proud parents of twin girls on Sunday June 3rd. Please join Pythonlabs and Digital Creations in congratulating them, and wishing them much joy and luck. Also, don't expect Jeremy to be too responsive to email for the next 6-8 weeks. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji at fourthought.com Tue Jun 5 14:28:45 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:28:45 -0600 Subject: [Python-Dev] One more dict trick In-Reply-To: Message from Greg Ewing of "Tue, 05 Jun 2001 17:00:30 +1200." <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> Message-ID: <200106051228.f55CSjk18336@localhost.local> > "Eric S. Raymond" : > > > I think it's significant that MMX > > instructions and so forth entered the Intel line to support *games*, > > not Navier-Stokes calculations. > > But when version 1.0 of FlashFlood! comes out, requiring > high-quality real-time hydrodynamics simulation, > Navier-Stokes calculations will suddenly become very > important... Shoot, I thought that was what Microsoft Hailstorm was all about. Path integrals about the atmospheric isobars, and all that... -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji at fourthought.com Tue Jun 5 14:32:07 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:32:07 -0600 Subject: [Python-Dev] Happy event In-Reply-To: Message from Guido van Rossum of "Tue, 05 Jun 2001 05:07:34 EDT." <200106050907.FAA08198@cj20424-a.reston1.va.home.com> Message-ID: <200106051232.f55CW7618353@localhost.local> > I just wanted to send a note about a happy event in the Python family. > Jeremy Hylton and his wife became the proud parents of twin girls on > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > congratulating them, and wishing them much joy and luck. > > Also, don't expect Jeremy to be too responsive to email for the next > 6-8 weeks. :) *twin* girls? Try 6-8 years. Congrats and felicits of the highest order, of course, Jeremy. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Barrett at stsci.edu Tue Jun 5 14:53:46 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Tue, 05 Jun 2001 08:53:46 -0400 Subject: [Python-Dev] Happy event References: <200106051232.f55CW7618353@localhost.local> Message-ID: <3B1CD65A.595E8CD@STScI.Edu> Uche Ogbuji wrote: > > > I just wanted to send a note about a happy event in the Python family. > > Jeremy Hylton and his wife became the proud parents of twin girls on > > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > > congratulating them, and wishing them much joy and luck. > > > > Also, don't expect Jeremy to be too responsive to email for the next > > 6-8 weeks. :) > > *twin* girls? Try 6-8 years. > > Congrats and felicits of the highest order, of course, Jeremy. Actually girls are fine until about 13, after that I expect Jeremy won't be too responsive. Something about hormones and such. In any case, all the best, Jeremy! -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From aahz at rahul.net Tue Jun 5 16:41:10 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <3B1CD65A.595E8CD@STScI.Edu> from "Paul Barrett" at Jun 05, 2001 08:53:46 AM Message-ID: <20010605144110.DD90C99C84@waltz.rahul.net> Paul Barrett wrote: > Uche Ogbuji wrote: >> Guido: >>> >>> Also, don't expect Jeremy to be too responsive to email for the next >>> 6-8 weeks. :) >> >> *twin* girls? Try 6-8 years. > > Actually girls are fine until about 13, after that I expect Jeremy > won't be too responsive. Something about hormones and such. Are you trying to imply that there's a difference between girls and boys? compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr at thyrsus.com Tue Jun 5 16:55:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 10:55:59 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 07:41:10AM -0700 References: <3B1CD65A.595E8CD@STScI.Edu> <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: <20010605105559.A28963@thyrsus.com> Aahz Maruch : > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? Of course there's a difference. Girls, er, *mature* sooner. Congratulations, Jeremy! -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From pedroni at inf.ethz.ch Tue Jun 5 17:05:03 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Tue, 5 Jun 2001 17:05:03 +0200 (MET DST) Subject: [Python-Dev] Happy event Message-ID: <200106051505.RAA24810@core.inf.ethz.ch> > Subject: Re: [Python-Dev] Happy event > To: Barrett at stsci.edu (Paul Barrett) > Cc: python-dev at python.org > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > From: aahz at rahul.net (Aahz Maruch) > X-BeenThere: python-dev at python.org > X-Mailman-Version: 2.0.5 (101270) > List-Help: > List-Post: > List-Subscribe: , > List-Id: Python core developers > List-Unsubscribe: , > List-Archive: > Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) > > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? > > compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs > -- The simple fact that we are still moving from the previous bad habit of considering them different to considering them equal just implies/evolves differences. A neutral view-point would be: the N/S ratio between gender-phisiological- differences and the overall interpersonal differences is very big, at least when considering the whole personality and not single aspects. There is no established truth, we are just longing for equiblibrium: in the actual transition phase boys and girls are under different kind of cultural tensions related to self-identification,etc ... this makes differences. regards, Samuele Pedroni. From aahz at rahul.net Tue Jun 5 17:17:38 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 08:17:38 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <20010605105559.A28963@thyrsus.com> from "Eric S. Raymond" at Jun 05, 2001 10:55:59 AM Message-ID: <20010605151739.3864199C83@waltz.rahul.net> Eric S. Raymond wrote: > Aahz Maruch : >> >> Are you trying to imply that there's a difference between girls and >> boys? > > Of course there's a difference. Girls, er, *mature* sooner. Not legally. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr at thyrsus.com Tue Jun 5 17:30:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 11:30:08 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605151739.3864199C83@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 08:17:38AM -0700 References: <20010605105559.A28963@thyrsus.com> <20010605151739.3864199C83@waltz.rahul.net> Message-ID: <20010605113008.A29236@thyrsus.com> Aahz Maruch : > Eric S. Raymond wrote: > > Aahz Maruch : > >> > >> Are you trying to imply that there's a difference between girls and > >> boys? > > > > Of course there's a difference. Girls, er, *mature* sooner. > > Not legally. My point was that the hormone thing is likely to be an issue sooner with twin girls. Hey, Jeremy...fraternal or identical? -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From guido at digicool.com Tue Jun 5 19:21:32 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 13:21:32 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106051721.f55HLW729400@odiug.digicool.com> While thinking about metatypes, I had an interesting idea. In PEP 252 and 253 (which still need much work, please bear with me!) I describe making classes and types more similar to each other. In particular, you'll be able to subclass built-in object types in much the same way as you can subclass user-defined classes today. One nice property of classes is that a class is a factory function for its instances; in other words, if C is a class, C() returns a C instance. Now, for built-in types, it makes sense to do the same. In my current prototype, after "from types import *", DictType() returns an empty dictionary and ListType() returns an empty list. It would be nice take this much further: IntType() could return an integer, TupleType() could return a tuple, StringType() could return a string, and so on. These are immutable types, so to make this useful, these constructors need to take an argument to specify a specific value. What should the type of such an argument be? It's not very interesting to require that int(x) takes an integer argument! Most of the popular standard types already have a constructor function that's named after their type: int(), long(), float(), complex(), str(), unicode(), tuple(), list() We could make the constructor take the same argument(s) as the corresponding built-in function. Now invoke the Zen of Python: "There should be one-- and preferably only one --obvious way to do it." So why not make these built-in functions *be* the corresponding types? Then instead of >>> int you would see >>> int but otherwise the behavior would be identical. (Note that I don't require that a factory function returns a *new* object each time.) If we did this for all built-in types, we'd have to add maybe a dozen new built-in names -- I think that's no big deal and actually helps naming types. The types module, with its awkward names and usage, can be deprecated. There are details to be worked out, e.g. - Do we really want to have built-in names for code objects, traceback objects, and other figments of Python's internal workings? - What should the argument to dict() be? A list of (key, value) pairs, a list of alternating keys and values, or something else? - What else? Comments? --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Tue Jun 5 19:34:35 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 5 Jun 2001 19:34:35 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <001301c0ede5$cb804a10$e46940d5@hagrid> guido wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? +1 from here. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? nope. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? how about supporting the following: d == dict(d.items()) d == dict(d.keys(), d.values()) and also: d = dict(k=v, k=v, ...) Cheers /F From ping at lfw.org Tue Jun 5 19:41:22 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 5 Jun 2001 12:41:22 -0500 (CDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > I'm all in favour of this. In fact, i had the impression that you were planning to do exactly this all along. I seem to recall some conversation about this a long time ago -- am i dreaming? > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. I would love this. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Perhaps we would only provide built-in names for objects that are commonly constructed. For things like code objects that are never user-constructed, their type objects could be set aside in a module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A list of (key, value) pairs. It's the only sensible choice, given that dict.items() is the obvious way to get all the information out of a dictionary into a list. -- ?!ng From aahz at rahul.net Tue Jun 5 19:40:27 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 10:40:27 -0700 (PDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> from "Guido van Rossum" at Jun 05, 2001 01:21:32 PM Message-ID: <20010605174027.17A4199C83@waltz.rahul.net> I'm +1 on the general concept; I think it will make explaining Python easier in the long run. I'm not competent to vote on the details, but I'll complain if something seems too confused to me. Currently in the Decimal class I'm working on, I can take any of the following types in the constructor: Decimal, tuple, string, int, float. I'm wondering whether that approach makes sense, that any "compatible" type should be accepted in an explicit constructor. So for your question about dict(), perhaps any sequence/iterator type that returns 2-element sequences would be be accepted. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From donb at abinitio.com Tue Jun 5 19:50:34 2001 From: donb at abinitio.com (Donald Beaudry) Date: Tue, 05 Jun 2001 13:50:34 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <200106051750.NAA25458@localhost.localdomain> Guido van Rossum wrote, > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? I like it! > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) Of course... singletons (which would also break that requirement) are quite useful. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I dont think so. Having easy access to these things might be good but since they are implementation specific it might be best to discourage their use by putting them somewhere more implementation specific, like the newmodule or even sys. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? At a minimum, I'd like to see a list of key/value tuples. I seem to find myself reconstructing dicts from the .items() of other dicts. For 'something else', I'd like to be able to pass keyword arguments to initialize the new dict. Going really crazy, I'd like to be able to pass a dict as an argument to dict()... just another way to spell copy, but combined with keywords, it would be more like copy followed by an update. > - What else? Well, since you are asking ;) I havnt read the PEP, so perhaps I shouldnt be commenting just yet, but. I'd hope that the built-in types are sub-classable from C as well as from Python. This is most interesting for types like instance, class, method, but I can imagine reasons for doing it to tuple, list, dict, and even int. > Comments? Fantastic! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From mal at lemburg.com Tue Jun 5 19:53:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 19:53:18 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3B1D1C8E.B7770419@lemburg.com> Guido van Rossum wrote: > > While thinking about metatypes, I had an interesting idea. > > In PEP 252 and 253 (which still need much work, please bear with me!) > I describe making classes and types more similar to each other. In > particular, you'll be able to subclass built-in object types in much > the same way as you can subclass user-defined classes today. One nice > property of classes is that a class is a factory function for its > instances; in other words, if C is a class, C() returns a C instance. > > Now, for built-in types, it makes sense to do the same. In my current > prototype, after "from types import *", DictType() returns an empty > dictionary and ListType() returns an empty list. It would be nice > take this much further: IntType() could return an integer, TupleType() > could return a tuple, StringType() could return a string, and so on. > These are immutable types, so to make this useful, these constructors > need to take an argument to specify a specific value. What should the > type of such an argument be? It's not very interesting to require > that int(x) takes an integer argument! > > Most of the popular standard types already have a constructor function > that's named after their type: > > int(), long(), float(), complex(), str(), unicode(), tuple(), list() > > We could make the constructor take the same argument(s) as the > corresponding built-in function. > > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > > > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) -1 While this looks cute, I think it would break a lot of introspection code or other code which special cases Python functions for some reason since type(int) would no longer return types.BuiltinFunctionType. If you don't like the names, why not take the change and create a new module which then exposes the Python class hierarchy (much like we did with the exceptions.py module before it was intregrated as C module) ?! > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Not really. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? As function, I'd say: take either a sequence of tuples or another dictionary as argument. mxTools already has such a function, BTW. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Tue Jun 5 20:12:09 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 13:12:09 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <15133.8441.983687.572159@beluga.mojam.com> Just catching up on a little c.l.py and I noticed the effbot's response to the Unicode degree inquiry. I tried to create and print one and got this: % python Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 Type "copyright", "credits" or "license" for more information. >>> u"\N{DEGREE SIGN}" u'\xb0' >>> print u"\N{DEGREE SIGN}" Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Shouldn't I be able to print arbitrary Unicode objects? What am I missing (this time)? Skip From mwh at python.net Tue Jun 5 20:16:52 2001 From: mwh at python.net (Michael Hudson) Date: 05 Jun 2001 19:16:52 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 13:12:09 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Just catching up on a little c.l.py and I noticed the effbot's response to > the Unicode degree inquiry. I tried to create and print one and got this: > > % python > Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) > [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> u"\N{DEGREE SIGN}" > u'\xb0' > >>> print u"\N{DEGREE SIGN}" > > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Shouldn't I be able to print arbitrary Unicode objects? What am I missing > (this time)? The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") ? Cheers, Skippy's little helper. -- In case you're not a computer person, I should probably point out that "Real Soon Now" is a technical term meaning "sometime before the heat-death of the universe, maybe". -- Scott Fahlman From guido at digicool.com Tue Jun 5 20:26:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:26:22 -0400 Subject: [Python-Dev] SourceForget Python Foundry needs help Message-ID: <200106051826.f55IQMS29540@odiug.digicool.com> The Python Foundry at SF could use a hand. If you're interested in helping out, please write to Chuck Esterbrook, below! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Tue, 05 Jun 2001 14:12:07 -0400 From: Chuck Esterbrook To: guido at python.org Subject: SourceForget Python Foundry Hi Guido, I'm one of the admins of the SourceForge Python Foundry. In case you're not familiar with them, foundries are simply SF web portals centered around a particular topic. Admins can customize the HTML text and graphics and SourceForge stats are integrated on the side. I haven't had much time to give the Python Foundry the attention it deserves. I was wondering if you knew of anyone who had the inclination, time and energy to join the Foundry as an admin and expand it. If it becomes strong enough, we could possibly get it featured on the sidebar of the main SF page, which would then bring more attention to Python and its related projects. The foundry is at: http://sourceforge.net/foundry/python-foundry/ - -Chuck ------- End of Forwarded Message From barry at digicool.com Tue Jun 5 20:31:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 14:31:12 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.9584.871074.255497@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Now invoke the Zen of Python: "There should be one-- and GvR> preferably only one --obvious way to do it." So why not make GvR> these built-in functions *be* the corresponding types? Then GvR> instead of >> int GvR> GvR> you would see >> int GvR> +1 GvR> but otherwise the behavior would be identical. (Note that I GvR> don't require that a factory function returns a *new* object GvR> each time.) GvR> If we did this for all built-in types, we'd have to add maybe GvR> a dozen new built-in names -- I think that's no big deal and GvR> actually helps naming types. The types module, with its GvR> awkward names and usage, can be deprecated. I'm a little concerned about this, since the names that would be added are probably in common use as variable and/or argument names. I.e. At one point `list' was a very common identifier in Mailman, and I'm sure `dict' is used quite often still. I guess this would be okay as long as working code doesn't break because of it. OTOH, I've had fewer needs for a dict builtin (though not non-zero), and easily zero needs for traceback objects, code objects, etc. GvR> There are details to be worked out, e.g. GvR> - Do we really want to have built-in names for code objects, GvR> traceback objects, and other figments of Python's internal GvR> workings? I'd say no. However, we could probably C-ify the types module, a la, the exceptions module, and that would be the logical place to put the type factories. GvR> - What should the argument to dict() be? A list of (key, GvR> value) pairs, a list of alternating keys and values, or GvR> something else? You definitely want to at least accept a sequence of key/value 2-tuples, so that d.items() can be retransformed into a dictionary object. -Barry From guido at digicool.com Tue Jun 5 20:38:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:38:23 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 14:31:12 EDT." <15133.9584.871074.255497@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> Message-ID: <200106051838.f55IcNk29624@odiug.digicool.com> > I'm a little concerned about this, since the names that would be added > are probably in common use as variable and/or argument names. I.e. At > one point `list' was a very common identifier in Mailman, and I'm sure > `dict' is used quite often still. I guess this would be okay as long > as working code doesn't break because of it. It would be hard to see how this would break code, since built-ins are searched *after* all variables that the user defines. --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Tue Jun 5 20:46:04 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 05 Jun 2001 18:46:04 GMT Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3b1d2894.16564838@smtp.worldonline.dk> [Guido] >Now invoke the Zen of Python: "There should be one-- and preferably >only one --obvious way to do it." So why not make these built-in >functions *be* the corresponding types? Then instead of > > >>> int > > >you would see > > >>> int > > >but otherwise the behavior would be identical. (Note that I don't >require that a factory function returns a *new* object each time.) I think that it will be difficult to avoid creating a new object under jython because calling a type already directly calls the type's java constructor. >If we did this for all built-in types, we'd have to add maybe a dozen >new built-in names -- I think that's no big deal and actually helps >naming types. The types module, with its awkward names and usage, can >be deprecated. > >There are details to be worked out, e.g. > >- Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? > >- What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? Jython already interprets the arguments to the dict type as alternating key/values: >>> from types import DictType as dict >>> dict('a', 97, 'b', 98, 'c', 99) {'b': 98, 'a': 97, 'c': 99} >>> This behaviour isn't documented on the python side so it can be changed. However, it it is necessary to maintain this API on the java side and we have currently no way to prevent the type constructors from being visible and callable from python. Whatever is decided, I hope jython can keep the current semantics of its dict type. regards, finn From fdrake at acm.org Tue Jun 5 21:11:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 5 Jun 2001 15:11:58 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3b1d2894.16564838@smtp.worldonline.dk> References: <200106051721.f55HLW729400@odiug.digicool.com> <3b1d2894.16564838@smtp.worldonline.dk> Message-ID: <15133.12030.538647.295809@cj42289-a.reston1.va.home.com> Finn Bock writes: > >>> from types import DictType as dict > >>> dict('a', 97, 'b', 98, 'c', 99) > {'b': 98, 'a': 97, 'c': 99} > >>> > > This behaviour isn't documented on the python side so it can be changed. > However, it it is necessary to maintain this API on the java side and we > have currently no way to prevent the type constructors from being > visible and callable from python. This should not be a problem: If dict() is called with one arg, the new semantics can be used, but with an odd number of args, your existing semantics can be used. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Tue Jun 5 21:23:54 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 14:23:54 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: <15133.12746.666351.127286@beluga.mojam.com> Me> [what am I missing?] Michael> The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") ? Hmmm... I don't believe I've ever encountered an object in Python before that you couldn't simply print. Are Unicode objects unique in this respect? Seems like a bug (or at least a feature) to me. Skip From mwh at python.net Tue Jun 5 21:31:33 2001 From: mwh at python.net (Michael Hudson) Date: 05 Jun 2001 20:31:33 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 14:23:54 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Me> [what am I missing?] > > Michael> The encoding: > > >>> print u"\N{DEGREE SIGN}".encode("latin1") > ? > > Hmmm... I don't believe I've ever encountered an object in Python before > that you couldn't simply print. Are Unicode objects unique in this respect? > Seems like a bug (or at least a feature) to me. Well, what would you have >>> print u"\N{DEGREE SIGN}" (or equivalently str(u"\N{DEGREE SIGN}") since we're eventually going to have to stuff an 8-bit string down stdout) do? I don't think >>> print u"\N{DEGREE SIGN}" u'\xb0' is really an option. This is old news. It must have been discussed here before 1.6, I'd have thought. Cheers, M. -- 58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From barry at digicool.com Tue Jun 5 21:46:54 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 15:46:54 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> Message-ID: <15133.14126.221568.235269@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm a little concerned about this, since the names that would >> be added are probably in common use as variable and/or argument >> names. I.e. At one point `list' was a very common identifier >> in Mailman, and I'm sure `dict' is used quite often still. I >> guess this would be okay as long as working code doesn't break >> because of it. GvR> It would be hard to see how this would break code, since GvR> built-ins are searched *after* all variables that the user GvR> defines. Wasn't there talk about issuing warnings for locals shadowing built-ins (or was that globals?). If not, fergitaboutit. If so, that would fall under the category of "breaking". -Barry From tim at digicool.com Tue Jun 5 21:56:59 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 15:56:59 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: Just to reduce this to its most trivial point , > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? the middle one (perhaps generalized to "iterable object alternately producing keys and values") is most useful in practice. Perl gets a lot of mileage of that, e.g. think of using re.findall() to build a list of mail-header field, value, field, value, ... thingies to feed to a dict. A list of (key, value) pairs is prettiest, but almost nothing *produces* such a list except for dict.items(); we don't need another way to spell dict.copy(). From guido at digicool.com Tue Jun 5 21:56:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 15:56:05 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 15:46:54 EDT." <15133.14126.221568.235269@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> Message-ID: <200106051956.f55Ju5130078@odiug.digicool.com> > >>>>> "GvR" == Guido van Rossum writes: > > >> I'm a little concerned about this, since the names that would > >> be added are probably in common use as variable and/or argument > >> names. I.e. At one point `list' was a very common identifier > >> in Mailman, and I'm sure `dict' is used quite often still. I > >> guess this would be okay as long as working code doesn't break > >> because of it. > > GvR> It would be hard to see how this would break code, since > GvR> built-ins are searched *after* all variables that the user > GvR> defines. > > Wasn't there talk about issuing warnings for locals shadowing > built-ins (or was that globals?). If not, fergitaboutit. If so, that > would fall under the category of "breaking". > > -Barry You may be thinking of this: >>> def f(int): def g(): int :1: SyntaxWarning: local name 'int' in 'f' shadows use of 'int' as global in nested scope 'g' >>> This warns you when you override a built-in or global *and* you use that same name in a nested function. This code will mean something different in 2.2 anyway (g's reference to int will become a reference to f's int because of nested scopes). But this does not cause a warning: >>> def g(): int = 12 >>> Nor does this: >>> int = 12 >>> So we're safe. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Jun 5 22:01:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:01:47 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: <15133.15019.237484.605267@beluga.mojam.com> Michael> Well, what would you have >>>> print u"\N{DEGREE SIGN}" Michael> (or equivalently Michael> str(u"\N{DEGREE SIGN}") Michael> since we're eventually going to have to stuff an 8-bit string Michael> down stdout) do? How about if print calls the .encode("latin1") method for me it gets an ASCII encoding error? If "latin1" isn't a reasonable default choice, it could pick an encoding based on the current locale. Michael> I don't think >>>> print u"\N{DEGREE SIGN}" Michael> u'\xb0' Michael> is really an option. I agree. I'd like to see a little circle. Michael> This is old news. It must have been discussed here before 1.6, Michael> I'd have thought. Perhaps, but I suspect many people suffered from glazing over of the eyes reading all that the messages exchanged about Unicode arcana. I know I did. Skip From barry at digicool.com Tue Jun 5 22:01:29 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 16:01:29 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> <200106051956.f55Ju5130078@odiug.digicool.com> Message-ID: <15133.15001.19308.108288@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> You may be thinking of this: Yup. GvR> So we're safe. Cool! Count me as a solid +1 then. -Barry From aahz at rahul.net Tue Jun 5 22:10:06 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 13:10:06 -0700 (PDT) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <15133.15019.237484.605267@beluga.mojam.com> from "Skip Montanaro" at Jun 05, 2001 03:01:47 PM Message-ID: <20010605201006.15CAD99C83@waltz.rahul.net> Skip Montanaro wrote: > > Perhaps, but I suspect many people suffered from glazing over of the eyes > reading all that the messages exchanged about Unicode arcana. I know I did. Ditto. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From mal at lemburg.com Tue Jun 5 22:14:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:14:39 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> Message-ID: <3B1D3DAF.DAE727AE@lemburg.com> > > [Guido] > > Now invoke the Zen of Python: "There should be one-- and preferably > > only one --obvious way to do it." So why not make these built-in > > functions *be* the corresponding types? Then instead of > > > > >>> int > > > > > > you would see > > > > >>> int > > > > > > but otherwise the behavior would be identical. (Note that I don't > > require that a factory function returns a *new* object each time.) > > -1 > > While this looks cute, I think it would break a lot of introspection > code or other code which special cases Python functions for > some reason since type(int) would no longer return > types.BuiltinFunctionType. > > If you don't like the names, why not take the change and > create a new module which then exposes the Python class hierarchy > (much like we did with the exceptions.py module before it was > intregrated as C module) ?! Looks like I'm alone with my uncertain feeling about this move... oh well. BTW, we should consider having more than one contructor for an object rather than trying to stuff all possible options and parameters into one overloaded super-constructor. I've done this in many of my mx extensions and have so far had great success with it (better programming error detection, better docs, more intuitive interfaces, etc.). In that sense, more than one way to do something will actually help clarify what the programmer really wanted. Just a thought... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jun 5 22:16:02 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:16:02 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> Message-ID: <3B1D3E02.3C9AE1F4@lemburg.com> Skip Montanaro wrote: > > Michael> Well, what would you have > > >>>> print u"\N{DEGREE SIGN}" > > Michael> (or equivalently > > Michael> str(u"\N{DEGREE SIGN}") > > Michael> since we're eventually going to have to stuff an 8-bit string > Michael> down stdout) do? > > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. Please see Lib/site.py for details on how to enable all these goodies -- it's all there, just disabled and meant for super-users only ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Tue Jun 5 22:22:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 16:22:43 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 22:14:39 +0200." <3B1D3DAF.DAE727AE@lemburg.com> References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> Message-ID: <200106052022.f55KMhq30227@odiug.digicool.com> > > -1 > > > > While this looks cute, I think it would break a lot of introspection > > code or other code which special cases Python functions for > > some reason since type(int) would no longer return > > types.BuiltinFunctionType. > > Looks like I'm alone with my uncertain feeling about this move... > oh well. Well, I don't see how someone could be doing introspection on int and be confused when it's not a function -- either you (think you) know it's a function, so you use it as a function without introspecting it, and that continues to work; or you're open to all possibilities, and then you'll introspect it, and then you'll discover what it is. > BTW, we should consider having more than one contructor for an > object rather than trying to stuff all possible options and parameters > into one overloaded super-constructor. I've done this in many of > my mx extensions and have so far had great success with it (better > programming error detection, better docs, more intuitive interfaces, > etc.). In that sense, more than one way to do something will > actually help clarify what the programmer really wanted. Just > a thought... Yes, but the other ways are spelled as factory functions. Maybe, *maybe* the other factory functions could be class-methods, but don't hold your hopes high. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Tue Jun 5 22:30:18 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Jun 2001 22:30:18 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <200106052030.f55KUIu02762@mira.informatik.hu-berlin.de> > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. These are both bad ideas. First, there is no guarantee that your terminal is capable of displaying the circle at all. Maybe the typewriter connected to your computer doesn't even have a degree type. Further, maybe it does support displaying the degree sign, but then it likely fails for >>> print u"\N{EURO SIGN}" Or, worse, instead of displaying the EURO SIGN, it may just display the CURRENCY SIGN (since it may chose to use ISO-8859-15, but the terminal assumes ISO-8859-1). So unless you can come up with a really good way to find out what the terminal is capable of displaying (plus finding out how to make it display these things), I think Python is better off raising an exception than producing garbage output. In addition, what you see is the "default encoding", i.e. it doesn't just apply to print; it also applies to all places where Unicode objects are converted into byte strings. Assuming any default other than ASCII has been considered as a bad idea by the authors of the Unicode support. IMO, the next-most reasonable default would have been UTF-8, *not* Latin-1, since UTF-8 can represent the EURO SIGN and every other character in Unicode. Most likely, you terminal will have difficulties producing a circle symbol when it gets the UTF-8 representation of the DEGREE SIGN, though. So the best thing is still to give it into the hands of the application author. As MAL points out, the administrator can give a different default encoding in site.py. Since the default default is ASCII, applications assuming that the default is ASCII won't break on your system. OTOH, applications developed on your system may then break elsewhere, since the default in site.py might be different. Regards, Martin From sdm7g at Virginia.EDU Tue Jun 5 22:41:11 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Tue, 5 Jun 2001 16:41:11 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I would say to put all of the common constructors in __builtin__, and all of the odd ducks can go into the new module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A varargs list of (key,value) tuples would probably be most useful. Since most of these functions, before being classed as constructors, were considered coercion function, I wouldn't be against having it try to do something sensible with a variety of args. -- sdm From skip at pobox.com Tue Jun 5 22:47:17 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:47:17 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1D3E02.3C9AE1F4@lemburg.com> References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> Message-ID: <15133.17749.390756.115544@beluga.mojam.com> mal> Please see Lib/site.py for details on how to enable all these mal> goodies -- it's all there, just disabled and meant for super-users mal> only ;-) Okay, I found the encoding section. I changed the encoding variable assignment to be encoding = "latin1" and now the degree sign print works. What other side-effects will that have besides on printed representations? It appears I can create (but not see properly?) variable names containing latin1 characters: >>> ?mlaut = "?mlaut" >>> print locals().keys() ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] I am having trouble printing some strings containing latin1 characters: >>> print ?mlaut mlaut >>> type("?mlaut") >>> type(string.letters) >>> print "?mlaut" mlaut >>> print string.letters abcdefghijklmnopqrstuvwxyz?????????????????????????????????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? >>> print string.letters[55:] ????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? The above was pasted from Python running in a shell session in XEmacs, which is certainly latin1-aware. Why did I have trouble seeing the ? in some situations, but not in others? Are the ramifications of all this encoding stuff documented somewhere? Skip From skip at pobox.com Tue Jun 5 22:56:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:56:58 -0500 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.18330.910736.249838@beluga.mojam.com> Is the intent of using int and friends as constructors instead of just coercion functions that I should (eventually) be able to do this: class NonNegativeInt(int): def __init__(self, val): if int(val) < 0: raise ValueError, "Value must be >= 0" int.__init__(self, val) self.a = 47 ... ? Skip From tim at digicool.com Tue Jun 5 23:01:23 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:01:23 -0400 Subject: [Python-Dev] another dict crasher Message-ID: [Tim's dict-crasher dies w/ a stack overflow, but with a KeyError when he sticks a print inside __eq__] OK, I understand this now, at least on Windows. In PyObject_Print(), #ifdef USE_STACKCHECK if (PyOS_CheckStack()) { PyErr_SetString(PyExc_MemoryError, "stack overflow"); return -1; } #endif On Windows, PyOs_CheckStack() is __try { /* _alloca throws a stack overflow exception if there's not enough space left on the stack */ _alloca(PYOS_STACK_MARGIN * sizeof(void*)); return 0; } __except (EXCEPTION_EXECUTE_HANDLER) { /* just ignore all errors */ } return 1; The _alloca dies, so the __except falls thru and PyOs_CheckStack returns 1. PyObject_Print sets the "stack overflow" error and returns -1. This winds its way thru the rich comparison attempt, until lookdict() sees it and says, Hmm. I can't compare this thing without raising error. So this can't be the key I'm looking for. First I'll clear the error. Hmm. Can't find it anywhere else in the dict either. Hmm. There were no errors pending at the time I got called, so I'll leave things that way and return "not found". At that point about 15,000 levels of recursion unwind, and KeyError gets raised. I don't believe PyOS_CheckStack() is implemented on Unixoid systems (just Windows and Macs), so some other accident must account for the KeyError on Linux. Remains unclear what to do about it; the idea that all errors raised by dict lookup comparisons are ignorable is sure a tempting target. From mal at lemburg.com Tue Jun 5 23:00:23 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 23:00:23 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1D4866.A40AAB1C@lemburg.com> Skip Montanaro wrote: > > mal> Please see Lib/site.py for details on how to enable all these > mal> goodies -- it's all there, just disabled and meant for super-users > mal> only ;-) > > Okay, I found the encoding section. I changed the encoding variable > assignment to be > > encoding = "latin1" > > and now the degree sign print works. What other side-effects will that have > besides on printed representations? It appears I can create (but not see > properly?) variable names containing latin1 characters: > > >>> ?mlaut = "?mlaut" Huh ? That should not be possible ! Python literals are still ASCII. >>> ?mlaut = '?mlaut' File "", line 1 ?mlaut = '?mlaut' ^ SyntaxError: invalid syntax > >>> print locals().keys() > ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] > > I am having trouble printing some strings containing latin1 characters: > > >>> print ?mlaut > mlaut > >>> type("?mlaut") > > >>> type(string.letters) > > >>> print "?mlaut" > mlaut > >>> print string.letters > abcdefghijklmnopqrstuvwxyz?????????????????????????????????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? > >>> print string.letters[55:] > ????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? > > The above was pasted from Python running in a shell session in XEmacs, which > is certainly latin1-aware. Why did I have trouble seeing the ? in some > situations, but not in others? No idea what's going on there... the encoding parameter should not have any effect on printing normal 8-bit strings. It only defines the standard encoding used in coercion and auto-conversion from Unicode to 8-bit strings and vice-versa. > Are the ramifications of all this encoding stuff documented somewhere? The basic things can be found in Misc/unicode.txt, on the i18n sig page and some resources on the web. I'll give a talk in Bordeaux about Unicode too, which will probably provide some additional help as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Tue Jun 5 23:14:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 17:14:07 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 16:59:01 EDT." References: Message-ID: <200106052114.f55LE7P30481@odiug.digicool.com> > Is the intent of using int and friends as constructors instead of just > coercion functions that I should (eventually) be able to do this: > > class NonNegativeInt(int): > def __init__(self, val): > if int(val) < 0: > raise ValueError, "Value must be >= 0" > int.__init__(self, val) > self.a = 47 > ... > > ? Yes, sort-of. The details will be slightly different. I'm not comfortable with letting a user-provided __init__() method change the value of self, so I am brooding on a work-around that separates allocation and one-time initialization from __init__(). Watch PEP 253. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim at digicool.com Tue Jun 5 23:16:03 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:16:03 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: [MAL, to Skip] > Huh ? That should not be possible ! Python literals are still > ASCII. > > >>> ?mlaut = '?mlaut' > File "", line 1 > ?mlaut = '?mlaut' > ^ > SyntaxError: invalid syntax That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug . From gward at python.net Wed Jun 6 00:29:49 2001 From: gward at python.net (Greg Ward) Date: Tue, 5 Jun 2001 18:29:49 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com>; from guido@digicool.com on Tue, Jun 05, 2001 at 01:21:32PM -0400 References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <20010605182949.A7545@gerg.ca> On 05 June 2001, Guido van Rossum said: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 from me too. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. Cool! > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Probably not, as long as they are accessible somewhere. I could live with either a C-ified 'types' module or shoving these into the 'new' module, although I think I prefer the latter slightly. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? I love /F's suggestion dict(k=v, k=v, ...) but that's icing on the cake -- cool feature, looks pretty, etc. (And *finally* Python will have all the syntactic sugar that Perl programmers like to have. ;-) I think the real answer should be dict(k, v, k, v) like Jython. If both can be supported, that would be swell. Greg -- Greg Ward - Linux geek gward at python.net http://starship.python.net/~gward/ Does your DRESSING ROOM have enough ASPARAGUS? From barry at digicool.com Wed Jun 6 00:45:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 18:45:00 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <15133.24812.791796.557452@anthem.wooz.org> >>>>> "GW" == Greg Ward writes: GW> I love /F's suggestion GW> dict(k=v, k=v, ...) One problem with this syntax is that the `k's can only be valid Python identifiers, so you'd at least need /some/ other syntax to support construction with arbitrary hashable keys. -Barry From fredrik at pythonware.com Wed Jun 6 00:57:43 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 6 Jun 2001 00:57:43 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <011f01c0ee12$eeda9ba0$0900a8c0@spiff> greg wrote: > > - What should the argument to dict() be? A list of (key, value) > > pairs, a list of alternating keys and values, or something else? > > I love /F's suggestion > > dict(k=v, k=v, ...) > > but that's icing on the cake -- cool feature, looks pretty, etc. note that the python interpreter builds that dictionary for you if you use the METH_KEYWORDS flag... > I think the real answer should be > > dict(k, v, k, v) > > like Jython. given that Jython already gives a meaning to dict with more than one argument, I suggest: dict(d) # consistency dict(k, v, k, v, ...) # jython compatibility dict(*[k, v, k, v, ...]) # convenience dict(k=v, k=v, ...) # common pydiom and maybe: dict(d.items()) # symmetry > If both can be supported, that would be swell. how about: if (PyTuple_GET_SIZE(args)) { assert PyDict_GET_SIZE(kw) == 0 if (PyTuple_GET_SIZE(args) == 1) { args = PyTuple_GET_ITEM(args, 0); if (PyDict_Check(args)) dict = args.copy() else if (PySequence_Check(args)) dict = {} for k, v in args: dict[k] = v } else { assert (PySequence_Size(args) & 0) == 0 # maybe dict = {} for i in range(len(args)): dict[args[i]] = args[i+1] } } else { assert PyDict_GET_SIZE(kw) > 0 # probably dict = kw } From MarkH at ActiveState.com Wed Jun 6 01:13:27 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 6 Jun 2001 09:13:27 +1000 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: [Paul] > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. As a father of a 14 year old girl, I can relate to that!! [Aahz] > Are you trying to imply that there's a difference between girls and > boys? It would seem a safe assumption that you are not a parent of a teenager. :) Mark. From gward at python.net Wed Jun 6 03:03:33 2001 From: gward at python.net (Greg Ward) Date: Tue, 5 Jun 2001 21:03:33 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <011f01c0ee12$eeda9ba0$0900a8c0@spiff>; from fredrik@pythonware.com on Wed, Jun 06, 2001 at 12:57:43AM +0200 References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> <011f01c0ee12$eeda9ba0$0900a8c0@spiff> Message-ID: <20010605210333.B7687@gerg.ca> On 06 June 2001, Fredrik Lundh said: > given that Jython already gives a meaning to dict with more > than one argument, I suggest: > > dict(d) # consistency > dict(k, v, k, v, ...) # jython compatibility > dict(*[k, v, k, v, ...]) # convenience > dict(k=v, k=v, ...) # common pydiom Yikes. I still think that #2 is the "essential" spelling. I think Tim was speaking of #1 when he said we don't need another way to spell copy() -- I'm inclined to agree. I think the fact that you can say int(3) or str("foo") are not strong arguments in favour of dict({...}), because of mutability, because of the overhead of dicts, because we already have the copy module, maybe other factors as well. > and maybe: > > dict(d.items()) # symmetry I think this is massive overloading. Two interfaces to a single function ought to be enough. I for one have long wished for syntactic sugar like Perl's => operator, which lets you do this: %band = { geddy => "bass", alex => "guitar", neil => "drums" } ...and keyword arg syntax is really the natural thing here. Being able to say band = dict(geddy="bass", alex="guitar", neil="drums") would be good enough for me. And it's less mysterious than Perl's =>, which is just a magic comma that forces its LHS to be interpreted as a string. Weird. Greg -- Greg Ward - Linux geek gward at python.net http://starship.python.net/~gward/ If you and a friend are being chased by a lion, it is not necessary to outrun the lion. It is only necessary to outrun your friend. From mal at lemburg.com Wed Jun 6 10:03:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 10:03:13 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1DE3C1.90BA3DD6@lemburg.com> Tim Peters wrote: > > [MAL, to Skip] > > Huh ? That should not be possible ! Python literals are still > > ASCII. > > > > >>> ?mlaut = '?mlaut' > > File "", line 1 > > ?mlaut = '?mlaut' > > ^ > > SyntaxError: invalid syntax > > That was Guido's intent, and what the Ref Man says, but the tokenizer uses > C's isalpha() so in reality it's locale-dependent. I think at least one > German on Python-Dev has already threatened to kill him if he ever fixes > this bug . Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Wed Jun 6 13:24:32 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:24:32 +0200 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: Message by "Eric S. Raymond" , Mon, 4 Jun 2001 17:19:08 -0400 , <20010604171908.A21831@thyrsus.com> Message-ID: <20010606112432.C4A43303181@snelboot.oratrix.nl> The early microcomputers (8008, 6800, 6502) are actually a lot more like the PDP-8 than the PDP-11: a single (or possibly double) accumulator register and a few special purpose registers hardwired to various instructions. The 68000, Z8000 and NS16032 were the first true successors of the PDP-11, sharing (to an extent) the unique characteristics of it's design with general purpose registers (with even SP and PC being general purpose registers with only very little magic attached to them) and an orthogonal design. The 68000 still had lots of little quirks in the instruction set, the latter two actually improved on the PDP-11 set (where a couple of instructions like XOR would only work with register-destination because it was added to the design in a stage where there weren't enough bits left in the instruction space, I guess). And the 8086 was just a souped-up 8080/8008: each register had a different function, no orthogonality, etc. Intel didn't get it "right" until the 386 32-bit instruction set (and even there some of the old baggage can still be seen). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Wed Jun 6 13:39:56 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:39:56 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Message by "Fredrik Lundh" , Tue, 5 Jun 2001 19:34:35 +0200 , <001301c0ede5$cb804a10$e46940d5@hagrid> Message-ID: <20010606113957.4A395303181@snelboot.oratrix.nl> For the dictionary initializer I would definitely want to be able to give an object that adheres to the dictionary protocol, so that I can to things like import anydbm f = anydbm.open("foo", "r") incore = dict(f) Hmm, I guess this goes for most types: list() and tuple() should take any iterable object, etc. The one question is what "dictionary protocol" mean. Should it support items()? Is only x.keys()/x[] good enough? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Jun 6 20:36:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 20:36:48 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> <200106052022.f55KMhq30227@odiug.digicool.com> Message-ID: <3B1E7840.C93EA788@lemburg.com> Guido van Rossum wrote: > > > > -1 > > > > > > While this looks cute, I think it would break a lot of introspection > > > code or other code which special cases Python functions for > > > some reason since type(int) would no longer return > > > types.BuiltinFunctionType. > > > > Looks like I'm alone with my uncertain feeling about this move... > > oh well. > > Well, I don't see how someone could be doing introspection on int and > be confused when it's not a function -- either you (think you) know > it's a function, so you use it as a function without introspecting it, > and that continues to work; or you're open to all possibilities, and > then you'll introspect it, and then you'll discover what it is. Ok, let's put it in another way: The point is that your are changing the type of very basic building parts in Python and that is likely to cause failure in places which will most likely be hard to find to fix. Becides we don't really gain anything from replacing builtin functions with classes (to the contrary: we lose some, since we can no longer use the function call optimizations for builtins and have to go through all the generic call mechanism code instead). Also, have you considered the effects this has on restricted execution mode ? What will happen if someone replaces the builtins with special versions which hide some security relevant objects, e.g. open() is a prominent candidate for this. Why not put the type objects into a separate module instead of reusing the builtins ? > > BTW, we should consider having more than one contructor for an > > object rather than trying to stuff all possible options and parameters > > into one overloaded super-constructor. I've done this in many of > > my mx extensions and have so far had great success with it (better > > programming error detection, better docs, more intuitive interfaces, > > etc.). In that sense, more than one way to do something will > > actually help clarify what the programmer really wanted. Just > > a thought... > > Yes, but the other ways are spelled as factory functions. Maybe, > *maybe* the other factory functions could be class-methods, but don't > hold your hopes high. No... why make things complicated when simple functions work just fine as factories. Multilpe constructors on a class would make subclassing a pain... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Wed Jun 6 21:00:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 12:00:07 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1E7DB7.408BC089@ActiveState.com> Skip Montanaro wrote: > >... > > Okay, I found the encoding section. I changed the encoding variable > > assignment to be > > encoding = "latin1" Danger, Will Robinson! You can now write software that will work great on your version of Python and will crash on everyone else's. You haven't just changed the behavior of "print" but of EVERY attempted automatic coercion from Unicode to an 8-bit string. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim.one at home.com Wed Jun 6 21:27:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 6 Jun 2001 15:27:59 -0400 Subject: [Python-Dev] -U option? Message-ID: http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 python -U breaks import with 2.1 Anyone understand -U? Like, should it work, why is it there if it doesn't and isn't expected to, and are there docs for it beyond the "python -h" blurb? Last mention of it I found in c.l.py was """ Date: Tue, 06 Feb 2001 16:09:46 +0100 From: "M.-A. Lemburg" Subject: Re: [Python-Dev] Pre-PEP: Python Character Model ... Well, with -U on, Python will compile "" into u"", ... last I tried, Python didn't even start up :-( ... """ An earlier msg (08 Sep 2000) said: """ Note that many thing fail when Python is started with -U... that switch was introduced to be able to get an idea of which parts of the standard fail to work in a mixed string/Unicode environment. """ If this is just an internal development switch, python -h probably shouldn't advertise it. From barry at digicool.com Wed Jun 6 21:37:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 6 Jun 2001 15:37:26 -0400 Subject: [Python-Dev] -U option? References: Message-ID: <15134.34422.62060.936788@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Anyone understand -U? Like, should it work, why is it there TP> if it doesn't and isn't expected to, and are there docs for it TP> beyond the "python -h" blurb? Nope, except that /for me/ an installed Python 2.1 seems to start up just fine with -U. My uninstalled (i.e. run from the source tree) 2.2a0 fails when given -U: @anthem[[~/projects/python:1068]]% ./python Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1069]]% ./python -U 'import site' failed; use -v for traceback Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1070]]% ./python -U -v # ./Lib/site.pyc matches ./Lib/site.py import site # precompiled from ./Lib/site.pyc # ./Lib/os.pyc matches ./Lib/os.py import os # precompiled from ./Lib/os.pyc import posix # builtin # ./Lib/posixpath.pyc matches ./Lib/posixpath.py import posixpath # precompiled from ./Lib/posixpath.pyc # ./Lib/stat.pyc matches ./Lib/stat.py import stat # precompiled from ./Lib/stat.pyc # ./Lib/UserDict.pyc matches ./Lib/UserDict.py import UserDict # precompiled from ./Lib/UserDict.pyc 'import site' failed; traceback: Traceback (most recent call last): File "./Lib/site.py", line 91, in ? from distutils.util import get_platform ImportError: No module named distutils.util Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> # clear __builtin__._ # clear sys.path # clear sys.argv # clear sys.ps1 # clear sys.ps2 # clear sys.exitfunc # clear sys.exc_type # clear sys.exc_value # clear sys.exc_traceback # clear sys.last_type # clear sys.last_value # clear sys.last_traceback # restore sys.stdin # restore sys.stdout # restore sys.stderr # cleanup __main__ # cleanup[1] signal # cleanup[1] site # cleanup[1] posix # cleanup[1] exceptions # cleanup[2] stat # cleanup[2] posixpath # cleanup[2] UserDict # cleanup[2] os # cleanup sys # cleanup __builtin__ # cleanup ints: 1 unfreed int in 1 out of 3 blocks # cleanup floats -Barry From mal at lemburg.com Wed Jun 6 22:27:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 22:27:19 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1E9227.7F67971E@lemburg.com> Tim Peters wrote: > > http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 > python -U breaks import with 2.1 > > Anyone understand -U? Like, should it work, why is it there if it doesn't > and isn't expected to, and are there docs for it beyond the "python -h" > blurb? The -U option is there to be able to test drive Python into the Unicode age. As you and many others have noted, there's still a long way to go... > Last mention of it I found in c.l.py was > > """ > Date: Tue, 06 Feb 2001 16:09:46 +0100 > From: "M.-A. Lemburg" > Subject: Re: [Python-Dev] Pre-PEP: Python Character Model > > ... > Well, with -U on, Python will compile "" into u"", > ... > last I tried, Python didn't even start up :-( > ... > """ > > An earlier msg (08 Sep 2000) said: > > """ > Note that many thing fail when Python is started with -U... that > switch was introduced to be able to get an idea of which parts of > the standard fail to work in a mixed string/Unicode environment. > """ > > If this is just an internal development switch, python -h probably shouldn't > advertise it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Wed Jun 6 22:34:30 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 6 Jun 2001 22:34:30 +0200 Subject: [Python-Dev] -U option? Message-ID: <200106062034.f56KYUI02246@mira.informatik.hu-berlin.de> [Tim] > Anyone understand -U? Like, shoulQd it work, why is it there if it > doesn't and isn't expected to, and are there docs for it beyond the > "python -h" blurb? I'm not surprised it doesn't work, but I think it could be made working in many cases. I also think it would be worthwhile making that work; in the process, many places will be taught to accept Unicode strings which currently don't. [Barry] > Nope, except that /for me/ an installed Python 2.1 seems to start up > just fine with -U. [...] Sure, but it won't work martin at mira:~ > python -U [22:29] Python 2.2a0 (#336, May 29 2001, 09:28:57) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import string Traceback (most recent call last): File "", line 1, in ? ImportError: No module named string >>> import sys >>> sys.path ['', u'/usr/src/omni/lib/python', u'/usr/src/omni/lib/i586_linux_2.0_glibc2.1', u'/usr/ilu-2.0b1/lib', u'/home/martin', u'/usr/local/lib/python2.2', u'/usr/local/lib/python2.2/plat-linux2', u'/usr/local/lib/python2.2/lib-tk', u'/usr/local/lib/python2.2/lib-dynload', u'/usr/local/lib/python2.2/site-packages', u'/usr/local/lib/site-python'] The main problem (also with the SF bug report) seems to be that Unicode objects in sys.path are not accepted, but I think they should. Regards, Martin From tim.one at home.com Wed Jun 6 22:52:02 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 6 Jun 2001 16:52:02 -0400 Subject: [Python-Dev] -U option? In-Reply-To: <3B1E9227.7F67971E@lemburg.com> Message-ID: [MAL] > The -U option is there to be able to test drive Python into > the Unicode age. As you and many others have noted, there's > still a long way to go... That's cool. My question is why we're advertising (via -h) an option that end users have no chance of using successfully. From mal at lemburg.com Wed Jun 6 23:47:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 23:47:25 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1EA4ED.38BEB1AA@lemburg.com> Tim Peters wrote: > > [MAL] > > The -U option is there to be able to test drive Python into > > the Unicode age. As you and many others have noted, there's > > still a long way to go... > > That's cool. My question is why we're advertising (via -h) an option that > end users have no chance of using successfully. I guess I just added the flag to the -h message without thinking much about it... it was added in some alpha release. Anyway, these bug reports will keep hitting us which is good in the sense that it'll eventually push Python into the Unicode arena. We could need some funding for this, though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Thu Jun 7 01:00:52 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 16:00:52 -0700 Subject: [Python-Dev] urllib2 Message-ID: <3B1EB624.563DABE0@ActiveState.com> Tim asked me to look into test_urllib2 failure. I notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>: http://www.faqs.org/rfcs/rfc1738.html Anyhow, there are a variety of problems. :( First, test_urllib2 says: file_url = "file://%s" % urllib2.__file__ This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it. So that's problem #1, we aren't compatible with mildly broken Windows file URLs. Problem #2 is that the test program generates mildly broken URLs on Windows. That begs the question of what IS the right way to construct file urls in a cross-platform manner. I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this: """Convert a DOS path name to a file url. C:\foo\bar\spam.foo becomes ///C|/foo/bar/spam.foo """ is not really constructing a URL! And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry at scottb.demon.co.uk Thu Jun 7 01:31:51 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:31:51 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> Message-ID: <000a01c0eee0$dcfe9250$060210ac@private> Eric, As others have pointed out your time line is wrong... BArry p.s. I'm ex-DEC and old enough to have seen the introduction of the 6502 (got mine at university for $25 inc postage to the U.K.), Z80 and VAX (worked on product for V1.0 of VMS). Also for my sins argued with Gordon Bell and Dave Cutler about CPU architecture. > -----Original Message----- > From: Eric S. Raymond [mailto:esr at thyrsus.com] > Sent: 04 June 2001 21:11 > To: Barry Scott > Cc: python-dev (E-mail) > Subject: Re: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... > > > Barry Scott : > > Eric wrote: > > > While I'm at it, I should note that the design of the 11 was ancestral > > > to both the 8088 and 68000 microprocessors, and thus to essentially > > > every new general-purpose computer designed in the last fifteen years. > > > > The key to PDP-11 and VAX was lots of registers all a like and rich > > addressing modes for the instructions. > > > > The 8088 is very far from this design, its owes its design more to > > 4004 then the PDP-11. > > Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, > which was descended from the 11. Admiitedly, in the chain of transmission here > were two stages of redesign so bad that the connection got really tenuous. > -- > Eric S. Raymond > > ...Virtually never are murderers the ordinary, law-abiding people > against whom gun bans are aimed. Almost without exception, murderers > are extreme aberrants with lifelong histories of crime, substance > abuse, psychopathology, mental retardation and/or irrational violence > against those around them, as well as other hazardous behavior, e.g., > automobile and gun accidents." > -- Don B. Kates, writing on statistical patterns in gun crime > > From barry at scottb.demon.co.uk Thu Jun 7 01:57:11 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:57:11 +0100 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3B1E7840.C93EA788@lemburg.com> Message-ID: <000b01c0eee4$66f8a7e0$060210ac@private> Adding the atomic types of python as classes I'm +1 on. Perfomance is a problem for the parser to handle. If you have not already done so I suggest that you look at what MicroSoft .NET is doing in this area. In .NET, for example, int is a class and they have the technology to define the interface to an int and optimize the performace of the none derived cases. Barry From barry at scottb.demon.co.uk Thu Jun 7 02:03:54 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 01:03:54 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: <001001c0eee5$571a8090$060210ac@private> > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! If you embrace the world then NO. If America is you world then maybe. Barry From paulp at ActiveState.com Thu Jun 7 02:42:03 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 17:42:03 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> Message-ID: <3B1ECDDB.F1E8B19D@ActiveState.com> Barry Scott wrote: > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > If you embrace the world then NO. If America is you world then maybe. Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From MarkH at ActiveState.com Thu Jun 7 03:09:51 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 7 Jun 2001 11:09:51 +1000 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <000b01c0eee4$66f8a7e0$060210ac@private> Message-ID: > If you have not already done so I suggest that you look at > what MicroSoft .NET is doing in this area. In .NET, for example, > int is a class and they have the technology to define the > interface to an int and optimize the performace of the none > derived cases. Actually, that is not completely true. There is a "value type" and a class version. The value type is just the bits. The VM has instructions that work in the value type. As far as I am aware, you can not use a derived class with these instructions. They also have the concept of "sealed" meaning they can not be subclassed. Last time I looked, strings were an example of sealed classes. Mark. From greg at cosc.canterbury.ac.nz Thu Jun 7 04:16:00 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:16:00 +1200 (NZST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <20010606113957.4A395303181@snelboot.oratrix.nl> Message-ID: <200106070216.OAA02594@s454.cosc.canterbury.ac.nz> Jack Jansen : > Should it support > items()? Is only x.keys()/x[] good enough? Check for items(), and fall back on x.keys()/x[] if necessary. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Jun 7 04:19:03 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:19:03 +1200 (NZST) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <200106070219.OAA02597@s454.cosc.canterbury.ac.nz> > if we were really going to embrace the world we'd need to > handle more than a few European languages! -1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Jun 7 04:22:33 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:22:33 +1200 (NZST) Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... Message-ID: <200106070222.OAA02600@s454.cosc.canterbury.ac.nz> Jack Jansen : > with even SP and PC being general purpose registers The PC is not a general purpose register in the 68000. I've heard that this was because DEC had a patent on the idea. > the latter two actually improved on the PDP-11 The 16032 was certainly extremely orthogonal. I wrote an assembler and a compiler for it once, and it was a joy after coming from the Z80! It wasn't quite perfect, though - its lack of a "top-of-stack-indirect" addressing mode was responsible for the one wart in my otherwise-beautiful code generation strategy. Also, it must have been the most CISCy instruction set the world has ever seen, with the possible exception of the VAX... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu Jun 7 06:54:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 00:54:42 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: <3B1EB624.563DABE0@ActiveState.com> Message-ID: [Paul Prescod] > Tim asked me to look into test_urllib2 failure. Wow! I'm going to remember that. Have to ask people to do things more often . > notice that Guido's name is in the relevant RFC so I guess he's the > real expert <0.5 wink>: > > http://www.faqs.org/rfcs/rfc1738.html > > Anyhow, there are a variety of problems. :( I'm going to add one more. The spec says this is a file URL: fileurl = "file://" [ host | "localhost" ] "/" fpath But on Windows, urllib2.urlopen() throws up even on URLs like: file:///c:/bootlog.txt and file://localhost/c:/bootlog.txt AFAICT, those conform to the spec (the first with an empty host, the second with the special reserved hostname), Windows has no problem with either of them (heck, in Outlook I can click on them while I'm typing this email -- works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which Windows has no idea what to do with. Hard to see why it should, either. > First, test_urllib2 says: > > file_url = "file://%s" % urllib2.__file__ > > This is not going to construct a strictly standards conforming URL on > Windows but that form is still common enough and obvious enough that > maybe we should support it. Common among what? > So that's problem #1, we aren't compatible with mildly broken Windows > file URLs. I haven't found a sense in which Windows file URLs are broken. test_urllib2 creates bad URLs on Windows, and urllib2 itself transforms legit file URLs into broken ones on Windows, but both of those appear to be our (Python's) fault. Until std stuff works, worrying about extensions to the std seems premature. > Problem #2 is that the test program generates mildly broken URLs > on Windows. Yup. > That begs the question of what IS the right way to construct file urls > in a cross-platform manner. The spec seems vaguely clear to me on this point (it's vaguely unclear to me whether a colon is allowed in an fpath -- the text seems to say one thing but the BNF another). > I would have thought that urllib.pathname2url was the way but I note > that it isn't documented. Plus it is poorly named. A function that > does this: > > """Convert a DOS path name to a file url. > > C:\foo\bar\spam.foo > > becomes > > ///C|/foo/bar/spam.foo > """ > > is not really constructing a URL! Or anything else recognizable . > And the semantics of the function on multiple platforms do not seem > to me to be identical. On Windows it adds a bunch of leading slashes > and mac and Unix seem not to. So you can't safely paste a "file:" or > "file://" on the front. I don't know how widely pathname2url has been > used even though it is undocumented....should we fix it and document > it or write a new function? Maybe it's just time to write urllib3.py <0.8 wink>. no-conclusions-from-me-ly y'rs - tim From tim at digicool.com Thu Jun 7 07:16:37 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 7 Jun 2001 01:16:37 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: [M.-A. Lemburg] > Wasn't me for sure... even in the Unicode age, I believe that > Python source code should maintain readability by not allowing > all alpha(numeric) characters for use in identifiers (there are > lots of them in Unicode). > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week ). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class ). From fredrik at pythonware.com Thu Jun 7 07:50:35 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 7 Jun 2001 07:50:35 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-) From tim.one at home.com Thu Jun 7 08:15:35 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 02:15:35 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Message-ID: [/F] > I don't get it. If people use non-ascii characters, they're clearly not > using Python. from the language reference: My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>. > ... > either change the specification, and break every single tool written by > anyone who actually bothered to read the specification [1], or add a > warning to 2.2. This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here). > ... > 1) I assume the specification didn't exist when GvR wrote the first > CPython implementation ;-) Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms. From mal at lemburg.com Thu Jun 7 10:29:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:29:52 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <3B1F3B80.DB8F4117@lemburg.com> Paul Prescod wrote: > > Barry Scott wrote: > > > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > > and 'A'...'Z' ?! (same for digits) ?! > > > > If you embrace the world then NO. If America is you world then maybe. > > Actually, if we were really going to embrace the world we'd need to > handle more than a few European languages! I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu Jun 7 10:42:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:42:40 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1F3E80.F8CC16D7@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Wasn't me for sure... even in the Unicode age, I believe that > > Python source code should maintain readability by not allowing > > all alpha(numeric) characters for use in identifiers (there are > > lots of them in Unicode). > > > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. OTOH, nobody would come to > its defense with a hearty "whew! I'm so glad *that* hole finally got > plugged!". I'm sure it would cause less trouble to take away <> as an > alternative spelling of != (except that Barry is actually close enough to > strangle Guido a few days each week ). Is it worth the hassle? I > don't know, but I'd *guess* Guido would rather endure the complaints for > something more substantial (like, say, breaking 10 lines of an expert's > obscure code that relies on int() being a builtin instead of a class > ). Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas at xs4all.net Thu Jun 7 14:03:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 7 Jun 2001 14:03:20 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1F3E80.F8CC16D7@lemburg.com>; from mal@lemburg.com on Thu, Jun 07, 2001 at 10:42:40AM +0200 References: <3B1F3E80.F8CC16D7@lemburg.com> Message-ID: <20010607140320.Z690@xs4all.nl> On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote: > still, it's funny sometimes how pydevs are willing to break perfectly > valid code in some areas while not considering pointing users to clean up > invalid code in other areas. Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh at python.net Thu Jun 7 14:54:55 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 7 Jun 2001 13:54:55 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-24 - 2001-06-07 Message-ID: This is a summary of traffic on the python-dev mailing list between May 24 and Jun 7 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the ninth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 305 50 | [|] | [|] | [|] | [|] 40 | [|] | [|] | [|] | [|] [|] [|] 30 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-018-014-011-014-020-019-034-035-032-014-008-020-051-015 Thu 24| Sat 26| Mon 28| Wed 30| Fri 01| Sun 03| Tue 05| Fri 25 Sun 27 Tue 29 Thu 31 Sat 02 Mon 04 Wed 06 Another busy-ish fortnight. I've been in Exam Hell(tm) and am writing this when hungover, this so summary might be a bit sketchier than normal. Apologies in advance. * strop vs. string * Greg Stein leapt up to defend the slated-to-be-deprecated strop module by pointing out that it's functions work on any object that supports the buffer API, whereas the 1.6-era string.py only works with objects that sprout the right methods: The discussion quickly degenerated into the usual griping about the fact that the buffer API is flawed and undocumented and not really well understood by many people. * Special-casing "O" * As a followup to the discussion mentioned in the last summary, Martin von Loewis posted a patch to sf enabling functions written in C that expect zero or one object arguments to dispense with the time wasting call to PyArg_ParseTuple: The first version of the patch was criticized for being overly general, and for not being general enough . It seems the forces of simplicity have won, but I don't think the patch has been checked in yet. * the late, unlamented, yearly list.append panic * Tim Peters posted c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). And then ameliorated the worst-case behaviour. So that one was easy. * making dicts ... * You might think that as dictionaries are so central to Python that their implementation would be bulletproof and one the areas of the source that would be least likely to change. This might be true *now*; Tim Peters seems to have spent most of the last fortnight implementing performance improvements one after the other and fixing core-dumping holes in the implementation pointed out by Michael Hudson. The first improvement was to "using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play." If you don't understand what that means, ignore it because Tim came up with a more radical rewrite: which seems to be a win, but sadly removes the shock of finding comments about Galois theory in dictobject.c... Most of the discussion in the thread following Tim's patch was about whether we need 128-bit floats or ints, which is another way of saying everyone liked it :-) This one hasn't been checked in either. * ... and breaking dicts * Inspired by a post to comp.lang.python by Wolfgang Lipp and driven slightly insane by revision, Michael Hudson posted a short program that used a hole in the dict implementation to trigger a core dump: This got fixed, so he did it again: The cause of both problems was C code assuming things about dictionaries remained the same across calls to code that ended up executing arbitrary Python code, which could mutate the dict exactly as much as it pleased, which in turn caused pointers to dangle. This problem has a history in Python; the .sort() method on lists has to fight the same issues. These holes have been plugged, although it is still possible to crash Python with exceptionally contrived code: There's another approach, which is was the .sort() method uses: >>> list = range(10) >>> def c(x,y): ... del list[:] ... return cmp(x, y) ... >>> list.sort(c) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in c TypeError: a list cannot be modified while it is being sorted The .sort() method magically changes the type of the list being sorted to one that doesn't support mutation while it's sorting the list. This approach would have some merit to use with dictionaries too; for one thing we could lose all the contrived code in dictobject.c protecting against this sort of silliness... * arbitrary radix formatting * Greg Wilson made a plea for the addition of a "%b" formatting operator to display integers in binary, e.g: >>> print "%d %x %o %b"%(10,10,10,10) 10 a 12 1010 There was general support for the idea, but Tim Peters and Greg Ewing pointed out that it would be neater to invent a general format code that would enable one to format an integer into an arbitrary base, so that >>>> int("1111", 7) 400 has an inverse at long last. But no-one could think of a spelling that wasn't in general use, and the discussion died :-(. * quick poll * Guido asked if anyone would object violently to the builtin conversion functions becoming type objects on the descr-branch: in analogy to class objects. There was general support and only a few concerns, and the changes have begun to hit descr-branch. I'm sure I'm not the only one who wishes they had the time to understand what is going on in there... Cheers, M. From gmcm at hypernet.com Thu Jun 7 15:06:55 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 7 Jun 2001 09:06:55 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: References: <3B1EB624.563DABE0@ActiveState.com> Message-ID: <3B1F442F.26920.1ECC32A9@localhost> [Tim & Paul on file URLs] [Tim] > But on Windows, urllib2.urlopen() throws up even on URLs like: > > file:///c:/bootlog.txt Curiously enough, url = "file:///" + urllib.quote_plus(fnm) seems to work on Windows. It even seems to work on mac, if you first turn '/' into '%2f', then undo the double quoting (turn '%252f' back into '%2f' in the ensuing url). It even seems to work on mac directory names with Unicode characters in them (though I haven't looked too closely, in fear of jinxing it). eye-of-newt-considered-helpful-ly y'rs - Gordon From pedroni at inf.ethz.ch Thu Jun 7 15:56:30 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Thu, 7 Jun 2001 15:56:30 +0200 (MET DST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106071356.PAA04511@core.inf.ethz.ch> Hi. [GvR] > > Is the intent of using int and friends as constructors instead of just > > coercion functions that I should (eventually) be able to do this: > > > > class NonNegativeInt(int): > > def __init__(self, val): > > if int(val) < 0: > > raise ValueError, "Value must be >= 0" > > int.__init__(self, val) > > self.a = 47 > > ... > > > > ? > > Yes, sort-of. The details will be slightly different. I'm not > comfortable with letting a user-provided __init__() method change the > value of self, so I am brooding on a work-around that separates > allocation and one-time initialization from __init__(). Watch PEP > 253. jython already supports vaguely this: from types import IntType as Int class NonNegInt(Int): def __init__(self,val,annot=None): if int(val)<0: raise ValueError,"val<0" Int.__init__(self,val) self._annot = annot def neg(self): return -self def __add__(self,b): if type(b) is NonNegInt: return NonNegInt(Int.__add__(self,b)) return Int.__add__(self,b) def annot(self): return self._annot Jython 2.0 on java1.3.0 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> from NonNegInt import NonNegInt >>> x=NonNegInt(-2) Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 5, in __init__ ValueError: val<0 >>> x=NonNegInt(2) >>> y=NonNegInt(3,"foo") >>> y._annot Traceback (innermost last): File "", line 1, in ? AttributeError: 'int' object has no attribute '_annot' >>> y.annot() Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 15, in annot AttributeError: 'int' object has no attribute '_annot' >>> x+y, type(x+y) (5, ) >>> x.neg() -2 >>> x+(-2),type(x+(-2)) (0, ) >>> As one can see, the semantic is not without holes. The support for this is mainly a side-effect of the fact that internally jython objects are instances of java classes and jython allows to subclass java classes. I have no idea whether someone is already using this kind of stuff, I just remember that someone reported a bug concerning subclassing ListType so ... By the way int, long being types seems nice and elegant to me. A more general note FYI: I have read the PEP drafts about descrs and type as classes, I have not played with the descr-branch yet. I think that the descr and metaclasses stuff can help on jython side to put a lot of things (dealing with java classes, subclassing from them, etc) in a more precise framework polishing up many design aspects and the code. First I suppose that backward compatibility on the jython side is not a real problem, this aspects are so much under-documented that there are no promises about them. On the other hand until we start coding things on jython side (it's complex stuff and jython internals are already complex) it will be really difficult to make constructive comments on possible problems for jython, or toward a design that better fits both jython and CPython needs. Given that we are still working on jython 2.1, maybe we will be able to start working on jython 2.2 only late in 2.2 release cycle when things are somehow fixed and we can only do our best to re-implemnt them. regards Samuele Pedroni. From Greg.Wilson at baltimore.com Thu Jun 7 18:03:44 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Thu, 7 Jun 2001 12:03:44 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Prompted in part by the comment in Michael Hudson's python-dev summary about this discussion having died, I'd like to summarize: 1. Most people who commented felt that a base-2 format would be useful, if only for teaching and debugging. With regard to questions about byte order: A. Integer values are printed as base-2 numbers, so byte order is irrelevant. B. Floating-point numbers are printed as: [sign] [mantissa] [exponent] The mantissa and exponent are shown according to rule A. 2. Inventing a format for converting to arbitrary bases is dubious hypergeneralization (to borrow a phrase). 3. Implementation should mirror octal and hexadecimal support, e.g. a 'bin()' function to go with 'oct()' and 'hex()'. 4. The desirability or otherwise of a "%b" format specifier has nothing to do with the relative merits of any early microprocessor :-). If no-one has strong objections, I'll put together a PEP on this basis. Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From greg at cosc.canterbury.ac.nz Fri Jun 8 02:55:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Jun 2001 12:55:05 +1200 (NZST) Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Message-ID: <200106080055.MAA02711@s454.cosc.canterbury.ac.nz> Greg Wilson : [good stuff about binary format support] > If no-one has strong objections, I'll put together a > PEP on this basis. Sounds okay to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Fri Jun 8 03:39:53 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 21:39:53 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <20010607140320.Z690@xs4all.nl> Message-ID: [Thomas Wouters] > ... > I'm also not terribly worried about the use of non-ASCII characters in > identifiers in Python, though a warning for the next one or two releases > would be a good thing -- if anything, it should warn that that trick > won't work for people with different locale settings! Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first. From skip at pobox.com Fri Jun 8 16:51:27 2001 From: skip at pobox.com (Skip Montanaro) Date: Fri, 8 Jun 2001 09:51:27 -0500 Subject: [Python-Dev] sys.modules["__main__"] in Jython Message-ID: <15136.58991.72069.433197@beluga.mojam.com> Would someone with Jython experience check to see if it interprets sys.modules["__main__"] in the same manner as Python? I'm interested to see if doctest's normal usage can be simplified slightly. The doctest documentation states: In normal use, end each module M with: def _test(): import doctest, M # replace M with your module's name return doctest.testmod(M) # ditto if __name__ == "__main__": _test() I'm wondering if this works for Jython as well as Python: def _test(): import doctest, sys return doctest.testmod(sys.modules["__main__"]) if __name__ == "__main__": _test() If so, then I think doctest.testmod's signature can be changed to def testmod(m=None, name=None, globs=None, verbose=None, isprivate=None, report=1): with the following extra code added to the start of the function: if m is None: import sys m = sys.modules["__main__"] That way the most common doctest usage can be changed to def _test(): import doctest return doctest.testmod() if __name__ == "__main__": _test() (I ran into a problem with a module that had initialization code that barfed if executed more than once.) Of course, these changes are ultimately Tim's decision. I'm just trying to knock down various potential hurdles. Thx, Skip From guido at digicool.com Fri Jun 8 18:06:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 08 Jun 2001 12:06:19 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: Your message of "Fri, 08 Jun 2001 12:01:37 EDT." References: Message-ID: <200106081606.f58G6Jj11829@odiug.digicool.com> > Prompted in part by the comment in Michael Hudson's > python-dev summary about this discussion having died, > I'd like to summarize: > > 1. Most people who commented felt that a base-2 format > would be useful, if only for teaching and debugging. > With regard to questions about byte order: > > A. Integer values are printed as base-2 numbers, so > byte order is irrelevant. > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > > The mantissa and exponent are shown according > to rule A. Why bother with floats at all? We can't print floats as hex either. If I were doing any kind of float-representation fiddling, I'd probably want to print it in hex anyway (I can read hex). But as I say, that's not for the general public. > 2. Inventing a format for converting to arbitrary > bases is dubious hypergeneralization (to borrow a > phrase). Agreed. > 3. Implementation should mirror octal and hexadecimal > support, e.g. a 'bin()' function to go with 'oct()' > and 'hex()'. > > 4. The desirability or otherwise of a "%b" format > specifier has nothing to do with the relative > merits of any early microprocessor :-). > > If no-one has strong objections, I'll put together a > PEP on this basis. Go for it. Or just submit a patch to SF -- this seems almost too small for a PEP to me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jun 8 18:10:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 8 Jun 2001 12:10:50 -0400 Subject: [Python-Dev] re: %b format (no, really) References: <200106081606.f58G6Jj11829@odiug.digicool.com> Message-ID: <15136.63754.927103.77358@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Go for it. Or just submit a patch to SF -- this seems almost GvR> too small for a PEP to me. :-) Since we all seem to agree, I'd agree. :) From Greg.Wilson at baltimore.com Fri Jun 8 18:14:14 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 12:14:14 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> > > Greg: > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > Guido: > Why bother with floats at all? For teaching purposes, which is what started me on this in the first place --- I would like an easy way to show people the bit patterns corresponding to basic types. > Guido: > Go for it. Or just submit a patch to SF -- this seems almost too > small for a PEP to me. :-) Thanks, Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at snark.thyrsus.com Fri Jun 8 18:23:34 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 12:23:34 -0400 Subject: [Python-Dev] Glowing endorsement of open source and Python Message-ID: <200106081623.f58GNYf22712@snark.thyrsus.com> It doesn't get much better than this: http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From mal at lemburg.com Fri Jun 8 19:08:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 08 Jun 2001 19:08:53 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <3B2106A5.FD16D95C@lemburg.com> "Eric S. Raymond" wrote: > > It doesn't get much better than this: > > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html I wonder what those MS Office XP ads are doing on that page... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 8 19:21:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:21:10 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> Message-ID: [Guido] > Why bother with floats at all? [Greg Wilson] > For teaching purposes, which is what started me on this > in the first place --- I would like an easy way to show > people the bit patterns corresponding to basic types. I'm confused by this: while for integers the bits correspond very clearly to what's stored in the machine, if you separate the mantissa and exponent for floats the result won't "look like" the storage at all. Please give an example first, like what do you intend to produce for print "%b" % 0.1 print "%b" % -42e300 ? You have to make decisions about whether or not to unbias the exponent for display (if you don't, it's incomprehensible; if you do, it's not really what's stored); whether or not to materialize the implicit most-significant mantissa bit in 754 normalized values (pretty much ditto); and what to do about Infs, NaNs, signed zeroes and denormal numbers. The kicker is that, to be truly useful for teaching floats, you need a way to select among all combinations of "yes" and "no" for each such decision. A single fixed set of answers will confound more than clarify; e.g., it's important to know what the "true exponent" is, but also to know what biased exponents look like inside the box. This is too much for %b -- write a float-format module instead. From Greg.Wilson at baltimore.com Fri Jun 8 19:34:13 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 13:34:13 -0400 Subject: [Python-Dev] RE: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> > [Guido] > > Why bother with floats at all? > > [Greg Wilson] > > For teaching purposes > [Tim Peters] > if you separate the mantissa and exponent > for floats the result won't "look like" the storage at all. > Please give an example first This is part of what was going to go into the PEP, along with what to do about character data (I've had a couple of emails from people who'd like to be able to look at 8-bit and Unicode characters as bit patterns). > This is too much for %b -- write a float-format module instead. How about a quick patch to do "%b" for int and long-int, and a PEP for a generic "format" module --- arbitrary radix, options for IEEE numbers, etc.? Any objections? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at thyrsus.com Fri Jun 8 19:44:40 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 13:44:40 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Fri, Jun 08, 2001 at 01:34:13PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: <20010608134440.A23160@thyrsus.com> Greg Wilson : > How about a quick patch to do "%b" for int and long-int, and a > PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? I like it. -- Eric S. Raymond The people cannot delegate to government the power to do anything which would be unlawful for them to do themselves. -- John Locke, "A Treatise Concerning Civil Government" From tim.one at home.com Fri Jun 8 19:51:50 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:51:50 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > How about a quick patch to do "%b" for int and long-int, Don't know how quick it will be (it should cover type slots and bin() and __bin__ and 0b1101 notation too, right?), but +1 from me. That much is routinely requested. > and a PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? None here. From bckfnn at worldonline.dk Fri Jun 8 21:15:14 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Fri, 08 Jun 2001 19:15:14 GMT Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <15136.58991.72069.433197@beluga.mojam.com> References: <15136.58991.72069.433197@beluga.mojam.com> Message-ID: <3b212431.21754982@smtp.worldonline.dk> [Skip] >Would someone with Jython experience check to see if it interprets >sys.modules["__main__"] in the same manner as Python? To me it seems like Jython defines sys.modules["__main__"] in the same way as CPython. >I'm wondering if this works for Jython as well as Python: > > def _test(): > import doctest, sys > return doctest.testmod(sys.modules["__main__"]) > > if __name__ == "__main__": > _test() It works for Jython. regards, finn From thomas at xs4all.net Fri Jun 8 23:41:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 8 Jun 2001 23:41:02 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python In-Reply-To: <200106081623.f58GNYf22712@snark.thyrsus.com>; from esr@snark.thyrsus.com on Fri, Jun 08, 2001 at 12:23:34PM -0400 References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <20010608234102.B690@xs4all.nl> On Fri, Jun 08, 2001 at 12:23:34PM -0400, Eric S. Raymond wrote: > It doesn't get much better than this: > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html It's a nice (and very flattering!) piece, but it's a tad buzzword heavy. "[Python] supports XML for e-commerce and mobile applications" ? Well, shit, so *that*'s what XML is for :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sat Jun 9 00:02:06 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 18:02:06 -0400 Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <3b212431.21754982@smtp.worldonline.dk> Message-ID: [Finn Bock] > To me it seems like Jython defines sys.modules["__main__"] in the same > way as CPython. Thank you, Finn! doctest has always avoided introspection tricks for which Jython doesn't work "exactly the same way" as CPython. However, in the past it achieved this by not paying any attention , then ripping out bad ideas when a Jython user reported failure. But now that it's in the std library, I want to proceed more carefully. Skip's idea is much more attractive now that you've confirmed it will work there too. From tim.one at home.com Sun Jun 10 03:10:53 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 9 Jun 2001 21:10:53 -0400 Subject: [Python-Dev] Struct schizophrenia Message-ID: I'm adding "long long" integral types to struct (in native mode, "long long" or __int64 on platforms that have them; in standard mode, 64 bits). This is proving harder than it should be, because the code that's already there is schizophrenic across boundaries, so is failing as a base to build on (raises more questions than it answers). Like: >>> x = 256 >>> struct.pack("b", x) # complains about magnitude in native mode Traceback (most recent call last): File "", line 1, in ? struct.error: byte format requires -128<=number<=127 >>> struct.pack("=b", x) # but doesn't with native order + std align '\x00' >>> struct.pack(">> struct.pack(">> struct.pack("", line 1, in ? OverflowError: long int too large to convert >>> Much the same is true of other small int sizes: you can't predict what will happen without trying it; and once you get to ints, no range-checking is performed even in native mode. Surely this can't stand, but what do people *want*? My preference is to raise the same "byte format requires -128<=number<=127" exception in all these cases; OTOH, the code structure fights that, working with Python longs is clumsy in C, and there are other "undocumented features" here that may or may not be accidents: >>> struct.pack("B", 234.3) '\xea' >>> That is, did we *intend* to accept floats packed via integer typecodes? Feature or bug? In the other (unpack) direction, the docs say for 'I' (unsigned int): The "I" conversion code will convert to a Python long if the C int is the same size as a C long, which is typical on most modern systems. If a C int is smaller than a C long, an Python integer will be created instead. That's in a footnote. In another part, they say: For the "I" and "L" format characters, the return value is a Python long integer. The footnote is wrong -- but is the footnote what was intended (somebody went to a fair bit of work to write all the stuff )? From tim.one at home.com Sun Jun 10 06:25:51 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 10 Jun 2001 00:25:51 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb Message-ID: Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its extension language. but-then-what-doesn't-ly y'rs - tim -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Skip Montanaro Sent: Saturday, June 09, 2001 12:31 AM To: python-list at python.org Subject: printing Python stack info from gdb From tim.one at home.com Sun Jun 10 21:36:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 10 Jun 2001 15:36:50 -0400 Subject: [Python-Dev] FW: list-display semantics? Message-ID: I opened a bug on this: If anyone's keen to play with the grammar, have at it! Everyone at PythonLabs would +1 it. -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of jainweiwu Sent: Sunday, June 10, 2001 2:30 PM To: python-list at python.org Subject: list-display semantics? Hi all: I tried the one-line command in a interaction mode: [x for x in [1, 2, 3], y for y in [4, 5, 6]] and the result surprised me, that is: [[1,2,3],[1,2,3],[1,2,3],9,9,9] Who can explain the behavior? Since I expected the result should be: [[1,4],[1,5],[1,6],[2,4],...] -- Pary All Rough Yet. parywu at seed.net.tw -- http://mail.python.org/mailman/listinfo/python-list From dan at cgsoftware.com Sun Jun 10 22:30:24 2001 From: dan at cgsoftware.com (Daniel Berlin) Date: 10 Jun 2001 16:30:24 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb In-Reply-To: ("Tim Peters"'s message of "Sun, 10 Jun 2001 00:25:51 -0400") References: Message-ID: <87n17grsbj.fsf@cgsoftware.com> "Tim Peters" writes: > Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next > time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its > extension language. HP has patches to do this, actually. Works quite nicely. And trust me, i've tried to get them to do it more than once. As I pointed out to skip, if he can profile gdb and tell me where the slowness is, it's likely I can make it a ton faster. GDB could use major optimizations almost everywhere. And i've done quite a lot of them, they just haven't been reviewed/integrated yet. --Dan C++ support maintainer - GDB DWARF2 reader person - GDB Symbol table patch submitting weirdo - GDB etc > > but-then-what-doesn't-ly y'rs - tim > > -----Original Message----- > From: python-list-admin at python.org > [mailto:python-list-admin at python.org]On Behalf Of Skip Montanaro > Sent: Saturday, June 09, 2001 12:31 AM > To: python-list at python.org > Subject: printing Python stack info from gdb > > >>From time to time I've wanted to be able to print the Python stack from gdb. > Today I broke down and spent some time actually implementing something. > > set $__trimpath = 1 > define ppystack > set $__fr = 0 > select-frame $__fr > while !($pc > Py_Main && $pc < Py_GetArgcArgv) > if $pc > eval_code2 && $pc < set_exc_info > set $__fn = PyString_AsString(co->co_filename) > set $__n = PyString_AsString(co->co_name) > if $__n[0] == '?' > set $__n = "" > end > if $__trimpath > set $__f = strrchr($__fn, '/') > if $__f > set $__fn = $__f + 1 > end > end > printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n > end > set $__fr = $__fr + 1 > select-frame $__fr > end > select-frame 0 > end > > Output looks like this (and dribbles out *quite slowly*): > > Text_Editor.py (147): apply_tag > Text_Editor.py (152): apply_tag_by_name > Script_GUI.py (302): push_help > Script_GUI.py (113): put_help > Script_GUI.py (119): focus_enter > Signal.py (34): handle_signal > Script_GUI.py (324): main > Script_GUI.py (338): > > If you don't want to trim the paths from the filenames, set $__trimpath to > 0. > > Warning: I've only tried this with a very recent CVS version of Python on a > PIII-based Linux system with an interpreter compiled using gcc. I rely on > the ordering of functions within the while loop to detect when to exit the > loop and when the frame I'm examining is an eval_code2 frame. I'm sure > there are plenty of people out there with more gdb experience than me. I > welcome any feedback on ways to improve this little bit of code. > > -- > Skip Montanaro (skip at pobox.com) > (847)971-7098 > > -- > http://mail.python.org/mailman/listinfo/python-list > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- "I saw a man with a wooden leg, and a real foot. "-Steven Wright From greg at cosc.canterbury.ac.nz Mon Jun 11 04:44:54 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 11 Jun 2001 14:44:54 +1200 (NZST) Subject: [Python-Dev] FW: list-display semantics? In-Reply-To: Message-ID: <200106110244.OAA03090@s454.cosc.canterbury.ac.nz> parywu at seed.net.tw: > [x for x in [1, 2, 3], y for y in [4, 5, 6]] > and the result surprised me, that is: > [[1,2,3],[1,2,3],[1,2,3],9,9,9] Did you by any chance execute that in an environment where y was previously bound to 9? It will be parsed as [x for x in ([1, 2, 3], y) for y in [4, 5, 6]] which should give a NameError if y is previously unbound, since it will try to evaluate ([1, 2, 3], y) before y is bound by the inner loop. But executing y = 9 beforehand will give the results you got. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From gstein at lyra.org Mon Jun 11 13:31:59 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 11 Jun 2001 04:31:59 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jun 06, 2001 at 07:34:15AM -0700 References: Message-ID: <20010611043158.E26210@lyra.org> On Wed, Jun 06, 2001 at 07:34:15AM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv17474 > > Modified Files: > Tag: descr-branch > object.c > Log Message: > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > where __dict__ is stored in an object. The simplest case is to add > tp_dictoffset to the start of the object, but there are comlications: > tp_flags may tell us that tp_dictoffset is not defined, or the offset > may be negative: indexing from the end of the object, where > tp_itemsize may have to be taken into account. Why would you ever have a negative size in there? That seems like an unnecessary "feature". The offsets are easily set up by the compiler as positive values. (not even sure how you'd come up with a proper/valid negative value) Cheers, -g > > > Index: object.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v > retrieving revision 2.124.4.11 > retrieving revision 2.124.4.12 > diff -C2 -r2.124.4.11 -r2.124.4.12 > *** object.c 2001/06/06 14:27:54 2.124.4.11 > --- object.c 2001/06/06 14:34:13 2.124.4.12 > *************** > *** 1074,1077 **** > --- 1074,1111 ---- > } > > + /* Helper to get a pointer to an object's __dict__ slot, if any */ > + > + PyObject ** > + _PyObject_GetDictPtr(PyObject *obj) > + { > + #define PTRSIZE (sizeof(PyObject *)) > + > + long dictoffset; > + PyTypeObject *tp = obj->ob_type; > + > + if (!(tp->tp_flags & Py_TPFLAGS_HAVE_CLASS)) > + return NULL; > + dictoffset = tp->tp_dictoffset; > + if (dictoffset == 0) > + return NULL; > + if (dictoffset < 0) { > + dictoffset += tp->tp_basicsize; > + assert(dictoffset > 0); /* Sanity check */ > + if (tp->tp_itemsize > 0) { > + int n = ((PyVarObject *)obj)->ob_size; > + if (n > 0) { > + dictoffset += tp->tp_itemsize * n; > + /* Round up, if necessary */ > + if (tp->tp_itemsize % PTRSIZE != 0) { > + dictoffset += PTRSIZE - 1; > + dictoffset /= PTRSIZE; > + dictoffset *= PTRSIZE; > + } > + } > + } > + } > + return (PyObject **) ((char *)obj + dictoffset); > + } > + > /* Generic GetAttr functions - put these in your tp_[gs]etattro slot */ > > *************** > *** 1082,1086 **** > PyObject *descr; > descrgetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1116,1120 ---- > PyObject *descr; > descrgetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1097,1103 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject *dict = * (PyObject **) ((char *)obj + dictoffset); > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > --- 1131,1137 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > ! PyObject *dict = *dictptr; > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > *************** > *** 1129,1133 **** > PyObject *descr; > descrsetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1163,1167 ---- > PyObject *descr; > descrsetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1143,1149 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject **dictptr = (PyObject **) ((char *)obj + dictoffset); > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > --- 1177,1182 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Greg Stein, http://www.lyra.org/ From guido at digicool.com Mon Jun 11 14:57:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 08:57:18 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: Your message of "Mon, 11 Jun 2001 04:31:59 PDT." <20010611043158.E26210@lyra.org> References: <20010611043158.E26210@lyra.org> Message-ID: <200106111257.IAA03505@cj20424-a.reston1.va.home.com> > > Modified Files: > > Tag: descr-branch > > object.c > > Log Message: > > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > > where __dict__ is stored in an object. The simplest case is to add > > tp_dictoffset to the start of the object, but there are comlications: > > tp_flags may tell us that tp_dictoffset is not defined, or the offset > > may be negative: indexing from the end of the object, where > > tp_itemsize may have to be taken into account. > > Why would you ever have a negative size in there? That seems like an > unnecessary "feature". The offsets are easily set up by the compiler as > positive values. (not even sure how you'd come up with a proper/valid > negative value) When extending a type like tuple or string, the __dict__ has to be added to the end, after the last item, because we can't change the starting offset of the first item. This is not at a fixed offset from the start of the structure. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jun 11 18:50:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:50:11 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <3B24F6C3.C911C0BF@lemburg.com> I would like to add a .decode() method to Unicode objects and also enable the builtin unicode() to accept Unicode object as input. The .decode() method will work just like the .encode() method except that it interfaces to the decode API of the codec in question. While this may seem useless for the currently available encodings, it does have some use for codecs which recode Unicode to Unicode, e.g. codecs which do XML escaping or Unicode compression. Any objections ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jun 11 18:57:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:57:12 +0200 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <3B24F868.A3DFA649@lemburg.com> Tamito KAJIYAMA recently announced that he changed the licenses on his Japanese codecs from GPL to a BSD variant. This is great news since this would allow adding the codecs to the Python core which would certainly attract more users to Python in Asia. The codecs are available at: http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ The codecs are 280kB when compressed as .tar.gz file. Thoughts ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From aahz at rahul.net Mon Jun 11 19:42:30 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 11 Jun 2001 10:42:30 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B24F868.A3DFA649@lemburg.com> from "M.-A. Lemburg" at Jun 11, 2001 06:57:12 PM Message-ID: <20010611174230.0625E99C8D@waltz.rahul.net> M.-A. Lemburg wrote: > > Tamito KAJIYAMA recently announced that he changed the licenses > on his Japanese codecs from GPL to a BSD variant. This is great > news since this would allow adding the codecs to the Python core > which would certainly attract more users to Python in Asia. > > The codecs are 280kB when compressed as .tar.gz file. +0 I like the idea, am uncomfortable with that amount of space. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From fdrake at cj42289-a.reston1.va.home.com Mon Jun 11 21:15:06 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 11 Jun 2001 15:15:06 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Substantial additional material on floating point arithmetic in the tutorial, written by Tim Peters to explain why FP can fail to reflect the decimal world presented to the user. Lots of additional updates and corrections. From guido at digicool.com Mon Jun 11 22:07:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 16:07:40 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline Message-ID: <200106112007.f5BK7eW22506@odiug.digicool.com> Please comment on the following. This came up a while ago in python-dev and I decided to follow through. I'm making this a PEP because of the risk of breaking code (which everybody on Python-dev seemed to think was acceptable). --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 259 Title: Omit printing newline after newline Version: $Revision: 1.1 $ Author: guido at python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 11-Jun-2001 Post-History: 11-Jun-2001 Abstract Currently, the print statement always appends a newline, unless a trailing comma is used. This means that if we want to print data that already ends in a newline, we get two newlines, unless special precautions are taken. I propose to skip printing the newline when it follows a newline that came from data. In order to avoid having to add yet another magic variable to file objects, I propose to give the existing 'softspace' variable an extra meaning: a negative value will mean "the last data written ended in a newline so no space *or* newline is required." Problem When printing data that resembles the lines read from a file using a simple loop, double-spacing occurs unless special care is taken: >>> for line in open("/etc/passwd").readlines(): ... print line ... root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin: daemon:x:2:2:daemon:/sbin: (etc.) >>> While there are easy work-arounds, this is often noticed only during testing and requires an extra edit-test roundtrip; the fixed code is uglier and harder to maintain. Proposed Solution In the PRINT_ITEM opcode in ceval.c, when a string object is printed, a check is already made that looks at the last character of that string. Currently, if that last character is a whitespace character other than space, the softspace flag is reset to zero; this suppresses the space between two items if the first item is a string ending in newline, tab, etc. (but not when it ends in a space). Otherwise the softspace flag is set to one. The proposal changes this test slightly so that softspace is set to: -1 -- if the last object written is a string ending in a newline 0 -- if the last object written is a string ending in a whitespace character that's neither space nor newline 1 -- in all other cases (including the case when the last object written is an empty string or not a string) Then, the PRINT_NEWLINE opcode, printing of the newline is suppressed if the value of softspace is negative; in any case the softspace flag is reset to zero. Scope This only affects printing of 8-bit strings. It doesn't affect Unicode, although that could be considered a bug in the Unicode implementation. It doesn't affect other objects whose string representation happens to end in a newline character. Risks This change breaks some existing code. For example: print "Subject: PEP 259\n" print message_body In current Python, this produces a blank line separating the subject from the message body; with the proposed change, the body begins immediately below the subject. This is not very robust code anyway; it is better written as print "Subject: PEP 259" print print message_body In the test suite, only test_StringIO (which explicitly tests for this feature) breaks. Implementation A patch relative to current CVS is here: http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From BPettersen at NAREX.com Mon Jun 11 22:20:38 2001 From: BPettersen at NAREX.com (Bjorn Pettersen) Date: Mon, 11 Jun 2001 14:20:38 -0600 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <6957F6A694B49A4096F7CFD0D900042F27D452@admin56.narex.com> > From: Guido van Rossum [mailto:guido at digicool.com] > > Subject: PEP 259: Omit printing newline after newline This would probably break most of the cgi scripts I did at my last job without giving any useful error message. But then again... why should I care ? -- bjorn From skip at pobox.com Mon Jun 11 22:20:33 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 11 Jun 2001 15:20:33 -0500 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> References: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> Message-ID: <15141.10257.487549.196538@beluga.mojam.com> Fred> Substantial additional material on floating point arithmetic in Fred> the tutorial, written by Tim Peters to explain why FP can fail to Fred> reflect the decimal world presented to the user. I took a quick look at that appendix. One thing that confused me a bit was that if 0.1 is approximated by something ever-so-slightly larger than 0.1, how is it that if you add ten of them together you wind up with a result that is ever-so-slightly less than 1.0? I didn't expect it to be exactly 1.0. Other floating point naifs may be confused in the same way: >>> "%.55f" % 0.5 '0.5000000000000000000000000000000000000000000000000000000' >>> "%.55f" % 0.1 '0.1000000000000000055511151231257827021181583404541015625' >>> "%.55f" % (0.5+0.1) '0.5999999999999999777955395074968691915273666381835937500' I guess the explanation is that not only can't most decimals be represented exactly, but that summing the same approximation multiple times doesn't always skew the error in the same direction either: >>> "%.55f" % (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1) '0.7999999999999999333866185224906075745820999145507812500' >>> "%.55f" % (0.8) '0.8000000000000000444089209850062616169452667236328125000' IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, Skip From mal at lemburg.com Mon Jun 11 22:55:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 22:55:13 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <3B253031.AB1954CB@lemburg.com> Guido van Rossum wrote: > > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 259 > Title: Omit printing newline after newline > ... > Scope > > This only affects printing of 8-bit strings. It doesn't affect > Unicode, although that could be considered a bug in the Unicode > implementation. It doesn't affect other objects whose string > representation happens to end in a newline character. I guess I should fix the Unicode stuff ;-) > Risks > > This change breaks some existing code. For example: > > print "Subject: PEP 259\n" > print message_body > > In current Python, this produces a blank line separating the > subject from the message body; with the proposed change, the body > begins immediately below the subject. This is not very robust > code anyway; it is better written as > > print "Subject: PEP 259" > print > print message_body > > In the test suite, only test_StringIO (which explicitly tests for > this feature) breaks. Hmm, I think the above is a very typical idiom for RFC822 style content and used in CGI scripts a lot. I'm not sure whether this change is worth getting the CGI crowd upset... Wouldn't it make sense to only use this technique in inter- active mode ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 00:00:54 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 00:00:54 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> > I would like to add a .decode() method to Unicode objects and also > enable the builtin unicode() to accept Unicode object as input. -1. What is this good for? > While this may seem useless for the currently available encodings, > it does have some use for codecs which recode Unicode to Unicode, > e.g. codecs which do XML escaping or Unicode compression. I still can see the value. If you think the codec API is good for such transformation, why not use it? I.e. enc,dec,_,_ = codecs.lookup("compress-form-foo") s = dec(s) Furthermore, this seems like a form of hypergeneralization. If you have this, why not also add s = s.decode("capitalize") # instead of s.capitalize() i = s.decode("int") # instead of int(s) > Any objections ? Yes, I think this should not be added. Regards, Martin From paulp at ActiveState.com Tue Jun 12 01:38:55 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 11 Jun 2001 16:38:55 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25568F.B766E00D@ActiveState.com> "Martin v. Loewis" wrote: > >... > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) IMO, there is a huge usability difference between the above and mystr.decode("base64"). I think that we've done a good job of providing better ways to get at codecs than the codecs.lookup function. I don't see how this is any different. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Tue Jun 12 01:51:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 11:51:55 +1200 (NZST) Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: <200106112351.LAA03197@s454.cosc.canterbury.ac.nz> Skip Montanaro : > One thing that confused me a bit was > that if 0.1 is approximated by something ever-so-slightly larger than 0.1, > how is it that if you add ten of them together you wind up with a result > that is ever-so-slightly less than 1.0? I think what's happening is that the exact binary result of adding 0.1_plus_a_little to itself has one more bit than there is room for, so it gets shifted right and one bit falls off the end. The amount you lose when that happens a few times ends up outweighing the extra that you would expect. Whether it's worth trying to explain *that* in the tutorial I don't know! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Tue Jun 12 02:00:33 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 12:00:33 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Guido: > I propose to skip printing the newline when it follows a newline > that came from data. -1 There's too much magic in the way print handles spaces and newlines already. Making it even more magical and inconsistent seems like exactly the wrong direction to be going in. If there are to be any changes to the way print works, I would prefer to see one that removes the need for the softspace flag altogether. The behaviour of a given print should not depend on state left behind by some previous one. Neither should it depend on whether the characters being printed come directly from a string or not. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jun 12 04:17:24 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 11 Jun 2001 22:17:24 -0400 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: [Skip Montanaro, on the in-progess 2.2 Tutorial appendix] > I took a quick look at that appendix. One thing that confused me > a bit was that if 0.1 is approximated by something ever-so-slightly > larger than 0.1, how is it that if you add ten of them together you > wind up with a result that is ever-so-slightly less than 1.0? Good for you, Skip! In all the years I've been explaining this stuff, I only recall one other picking up on that immediately. I'm not writing a book here, though , and any intro numeric programming text emphasizes that n*x is a better bet than adding x together n times. >>> .1 * 10 1.0 >>> Greg Ewing put you on the right track, if you want to figure it out yourself (as Deep Throat said, "follow the bits, Skip -- follow the bits"). > I didn't expect it to be exactly 1.0. Other floating point naifs > may be confused in the same way: > > >>> "%.55f" % 0.5 > '0.5000000000000000000000000000000000000000000000000000000' > >>> "%.55f" % 0.1 > '0.1000000000000000055511151231257827021181583404541015625' > >>> "%.55f" % (0.5+0.1) > '0.5999999999999999777955395074968691915273666381835937500' Note that this output is platform-dependent. For example, the last on Windows is >>> "%.55f" % (0.5+0.1) '0.5999999999999999800000000000000000000000000000000000000' > ... > IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, All computer arithmetic is; and among binary fp systems, 754 has got to be the best-behaved there is. Know how many irksome bugs I've fixed in Python mucking with different sizes of integers across platforms, and what C does and doesn't guarantee about them? About 20x more than fp bugs. Of course there's 10000x as much integer code in Python too . god-created-the-integers-from-1-through-3-inclusive-and-that's-it-ly y'rs - tim From barry at digicool.com Tue Jun 12 05:00:52 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 11 Jun 2001 23:00:52 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Message-ID: <15141.34276.191510.708654@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> There's too much magic in the way print handles spaces and GE> newlines already. Making it even more magical and inconsistent GE> seems like exactly the wrong direction to be going in. I tend to agree. I'm sometimes bitten by the double newlines, but as I think Andrew brought up in c.l.py, I'd rather see a way to tell readlines() to strip the newlines than to add more magic to print. print-has-all-the-magic-it-needs-now-<>-ly y'rs, -Barry From fredrik at pythonware.com Tue Jun 12 08:21:55 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 08:21:55 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> guido wrote: > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). when was this discussed on python-dev? From mal at lemburg.com Tue Jun 12 09:09:05 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:09:05 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25C011.125B6462@lemburg.com> "Martin v. Loewis" wrote: > > > I would like to add a .decode() method to Unicode objects and also > > enable the builtin unicode() to accept Unicode object as input. > > -1. What is this good for? See below :) > > While this may seem useless for the currently available encodings, > > it does have some use for codecs which recode Unicode to Unicode, > > e.g. codecs which do XML escaping or Unicode compression. > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) Sure and that's the point. I would like to add the .decode() method to make this just as simple as encoding Unicode to UTF-8. Note that strings already have this method: str.encode() str.decode() uni.encode() #uni.decode() # still missing > Furthermore, this seems like a form of hypergeneralization. If you > have this, why not also add > > s = s.decode("capitalize") # instead of s.capitalize() > i = s.decode("int") # instead of int(s) No, that's not the intention. One very useful application for this method is XML unescaping which turns numeric XML entities into Unicode chars. Others are Unicode decompression (using the Unicode compression algorithm) and certain forms of Unicode normalization. The key argument for these interfaces is that they provide an extensible transformation mechanism for string and binary data. > > Any objections ? > > Yes, I think this should not be added. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Tue Jun 12 09:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 12 Jun 2001 03:29:02 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: [/F] > when was this discussed on python-dev? It wasn't -- it actually came up on one of the SourceForge mailing lists ... ah, of course, tried to search but "Geocrawler is down for nightly database maintenance". They sure have long nights . I'm guessing it's the python-iterators list. It spun off of a thread where Guido was wondering whether one of the new ways to spell "iterate over a file" should return lines without trailing \n, so that e.g. for line in sys.stdin: print line wasn't a surprise. I opined it would be better to make all ways of iterating a file do the same thing, but change print instead. We both agreed that couldn't happen. But then I couldn't find any code it would break, only code of the form print line, where the "," was trying to suppress the extra newline, and that would continue to work the same way even if print were changed. The notion that legions of people are using print line as an obscure way to get double-spacing is taking me by surprise. Nobody on the iterators list had this objection. win-some-lose-some-lose-some-lose-some-lose-some-ly y'rs - tim From mal at lemburg.com Tue Jun 12 09:35:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:35:08 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010611174230.0625E99C8D@waltz.rahul.net> Message-ID: <3B25C62C.969B40B3@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > > > Tamito KAJIYAMA recently announced that he changed the licenses > > on his Japanese codecs from GPL to a BSD variant. This is great > > news since this would allow adding the codecs to the Python core > > which would certainly attract more users to Python in Asia. > > > > The codecs are 280kB when compressed as .tar.gz file. > > +0 > > I like the idea, am uncomfortable with that amount of space. Tamito corrected me about the size (his file includes the .pyc byte code files): the correct size for the sources is 143kB -- almost half of what I initially wrote. If that should still be too much, there are probably some ways to further compress the size of the mapping tables which could be investigated. PS: Tamito is very thrilled about getting his codecs into the core and I am quite certain that he is also prepared to maintain them (I have put him on CC). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Tue Jun 12 09:37:55 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 12 Jun 2001 03:37:55 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Include longobject.h,2.19,2.20 In-Reply-To: <3B25C116.3E65A32D@lemburg.com> Message-ID: [M.-A. Lemburg] > I have tried to compile longobject.c/h on a HP-UX box and am getting > warnings about MIN/MAX being redefined. Perhaps you should add > an #undef for these before the #define ?! I changed nothing relevant here. Are you certain this is a new problem? The MIN/MAX macros have been in longobject.c for a long time, and I didn't touch them. In any case, I'm not inclined to fiddle things on a box where I can't see a problem so can't know whether I'm fixing it or just creating new problems. If you can figure out why it's happening on that box, and it's a legit problem there, feel free to fix it. From SBrunning at trisystems.co.uk Tue Jun 12 10:25:19 2001 From: SBrunning at trisystems.co.uk (Simon Brunning) Date: Tue, 12 Jun 2001 09:25:19 +0100 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <31575A892FF6D1118F5800600846864D78BD25@intrepid> > From: Guido van Rossum [SMTP:guido at digicool.com] > In order to avoid having to add yet another magic variable to file > objects, I propose to give the existing 'softspace' variable an > extra meaning: a negative value will mean "the last data written > ended in a newline so no space *or* newline is required." Better another magic variable than a magic value for an old one, I think. Cheers, Simon Brunning TriSystems Ltd. sbrunning at trisystems.co.uk ----------------------------------------------------------------------- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. TriSystems Ltd. cannot accept liability for statements made which are clearly the senders own. From thomas at xs4all.net Tue Jun 12 10:33:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 10:33:30 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: ; from tim.one@home.com on Tue, Jun 12, 2001 at 03:29:02AM -0400 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <20010612103330.D690@xs4all.nl> On Tue, Jun 12, 2001 at 03:29:02AM -0400, Tim Peters wrote: > [/F] > > when was this discussed on python-dev? > It wasn't -- it actually came up on one of the SourceForge mailing lists ... > I'm guessing it's the python-iterators list. I'm guessing the same thing, because I *did* see the proposal somewhere. I recall thinking 'that might work' but not much else, anyway. > The notion that legions of people are using > print line > as an obscure way to get double-spacing is taking me by surprise. Bah, humbug! (And you can quote me on that.) Backward compatibility is not an issue -- that's why we have future-imports and warning mechanisms. Import smart-print from future to get the new behaviour, and warn whenever print *would* *have* printed one newline less otherwise. Regardless, I'm -1 on this change. Not because of backward compatibility problem, but because of what GregE said. Let's not make print even more magically unpredictably confusing than it already is, with comma's that do something magical, softspace to control that magic, and shifting the print operator to the right :-) Why can't we use for line in file: print line, to print all lines in a file ? Softspace doesn't seem to add a space (though I had to write a testcase to make sure ;) and 'explicit is better than implicit'. I'd also prefer special syntax to control the softspace behaviour, like say: print "spam:", "ham" : "and" : "eggs" to print 'spamandeggs' without a space inbetween. Too late for that, I 'spose :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 11:42:52 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 11:42:52 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: "mal@lemburg.com"'s message of Tue, 12 Jun 2001 09:09:05 +0200 Message-ID: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> > str.encode() > str.decode() > uni.encode() > #uni.decode() # still missing It's not missing. str.decode and uni.encode go through a single codec; that's easy. str.encode is somewhat more confusing, because it really is unicode(str).encode. Now, you are not proposing that uni.decode is str(uni).decode, are you? If not that, what else would it mean? And if it means something else, it is clearly not symmetric to str.encode, so it is not "missing". > One very useful application for this method is XML unescaping > which turns numeric XML entities into Unicode chars. Ok. Please show me how that would work. More precisely, please write a PEP describing the rationale for this feature, including use case examples and precise semantics of the proposed addition. > The key argument for these interfaces is that they provide > an extensible transformation mechanism for string and binary > data. That is too general for me to understand; I need to see detailed examples that solve real-world problems. Regards, Martin P.S. I don't think that unescaping XML characters entities into Unicode characters is a useful application in itself. This is normally done by the XML parser, which not only has to deal with character entities, but also with general entities and a lot of other markup. Very few people write XML parsers, and they are using the string methods and the sre module successfully (if the parser is written in Python - a C parser would do the unescaping before even passing the text to Python). From thomas at xs4all.net Tue Jun 12 12:02:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 12:02:03 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl>; from thomas@xs4all.net on Tue, Jun 12, 2001 at 10:33:30AM +0200 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> Message-ID: <20010612120203.E690@xs4all.nl> On Tue, Jun 12, 2001 at 10:33:30AM +0200, Thomas Wouters wrote: > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. Err. I meant "hamandeggs" with no space inbetween. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue Jun 12 12:13:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:13:21 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> Message-ID: <3B25EB41.807C2C51@lemburg.com> "Martin v. Loewis" wrote: > > > str.encode() > > str.decode() > > uni.encode() > > #uni.decode() # still missing > > It's not missing. str.decode and uni.encode go through a single codec; > that's easy. str.encode is somewhat more confusing, because it really > is unicode(str).encode. Now, you are not proposing that uni.decode is > str(uni).decode, are you? No. uni.decode() will (just like the other methods) directly interface to the codecs decoder -- there is no magic conversion involved. It is meant to be used by Unicode-Unicode codecs > If not that, what else would it mean? And if it means something else, > it is clearly not symmetric to str.encode, so it is not "missing". It is in the sense that strings support this method and Unicode currently doesn't. > > One very useful application for this method is XML unescaping > > which turns numeric XML entities into Unicode chars. > > Ok. Please show me how that would work. More precisely, please write a > PEP describing the rationale for this feature, including use case > examples and precise semantics of the proposed addition. There's no need for a PEP. This addition is much too simple to require a PEP on its own. As for use cases: I have already given a whole bunch of them (Unicode compression, normalization, escaping in various ways). Codecs are in no way constrained to only interface between strings and Unicode. There are many other possibilities for their usage out there. Just look at the latest checkins for a bunch of string-string codecs for examples of codecs which solve common real-life problems and do not interface to Unicode. > > The key argument for these interfaces is that they provide > > an extensible transformation mechanism for string and binary > > data. > > That is too general for me to understand; I need to see detailed > examples that solve real-world problems. > > Regards, > Martin > > P.S. I don't think that unescaping XML characters entities into > Unicode characters is a useful application in itself. This is normally > done by the XML parser, which not only has to deal with character > entities, but also with general entities and a lot of other markup. > Very few people write XML parsers, and they are using the string > methods and the sre module successfully (if the parser is written in > Python - a C parser would do the unescaping before even passing the > text to Python). True, but not all XML text out there is meant for XML parsers to read ;-). Preprocessing of e.g. XML text in Python is a rather common thing to do and this is what the direct codec access methods are meant for. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue Jun 12 12:46:36 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:46:36 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> mal wrote: > > Ok. Please show me how that would work. More precisely, please write a > > PEP describing the rationale for this feature, including use case > > examples and precise semantics of the proposed addition. > > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. we'd been better off if you'd written a PEP before you started adding decode and encode stuff. what's currently implemented is ugly enough; adding more warts won't make it any prettier. -1 on anything except a PEP that covers *all* aspects of encode/decode (including things that are already implemented) From fredrik at pythonware.com Tue Jun 12 12:47:49 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:47:49 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> Message-ID: <00ba01c0f32d$208d4160$0900a8c0@spiff> Thomas Wouters wrote: > > print "spam:", "ham" : "and" : "eggs" > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. and "+" (or plain whitespace) instead of ":", right? From fredrik at pythonware.com Tue Jun 12 12:55:27 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:55:27 +0200 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline References: <31575A892FF6D1118F5800600846864D78BD25@intrepid> Message-ID: <00c301c0f32e$31cd7ed0$0900a8c0@spiff> simon wrote: > > > In order to avoid having to add yet another magic variable to file > > objects, I propose to give the existing 'softspace' variable an > > extra meaning: a negative value will mean "the last data written > > ended in a newline so no space *or* newline is required." > > Better another magic variable than a magic value for an old one, I think. many file-like C types (e.g. cStringIO) already have special code to deal with a softspace integer attribute. From mal at lemburg.com Tue Jun 12 12:57:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:57:32 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <3B25F59C.9AAF604A@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Ok. Please show me how that would work. More precisely, please write a > > > PEP describing the rationale for this feature, including use case > > > examples and precise semantics of the proposed addition. > > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > we'd been better off if you'd written a PEP before you started > adding decode and encode stuff. what's currently implemented > is ugly enough; adding more warts won't make it any prettier. Could you please be more specific about what is "ugly" in the current implementation ? The .encode/.decode methods are a direct interface to the codecs encoder and decoder APIs. I can't find anything ugly about this in general except maybe some of the constraints which were originally put into these interface on the grounds of using them for string/Unicode conversions -- I have already removed most of these and would like to clean this up completely before 2.2 gets out. > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Gee, Guido starts breaking code and nobody objects; I try to clean up some left-overs in the Unicode implementation and people start huge discussions about it. Something is backwards here... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 13:00:40 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 13:00:40 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B25EB41.807C2C51@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> > > > str.encode() > > > str.decode() > > > uni.encode() > > > #uni.decode() # still missing > > > > It's not missing. str.decode and uni.encode go through a single codec; > > that's easy. str.encode is somewhat more confusing, because it really > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > str(uni).decode, are you? > > No. uni.decode() will (just like the other methods) directly > interface to the codecs decoder -- there is no magic conversion > involved. It is meant to be used by Unicode-Unicode codecs When invoking "Hallo".encode("utf-8"), two conversions are executed: first the default decoding into Unicode, then the UTF-8 encoding. Of course, that is not the intended use (but then, is the intended use documented anywhere?): instead, people should write "Hallo".encode("base64") instead. This is an example I can understand, although I'm not sure why it is inherently better to write this instead of writing base64.encodestring("Hallo"). > > If not that, what else would it mean? And if it means something else, > > it is clearly not symmetric to str.encode, so it is not "missing". > > It is in the sense that strings support this method and Unicode > currently doesn't. The rationale for string.encode is weak: it argues that string->string conversions are frequent enough to justify this API, even though these conversions have nothing to do with coded character sets. So far, I can see *no* rationale for unicode.decode. > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. PEP 1 says: # We intend PEPs to be the primary mechanisms for proposing new # features, for collecting community input on an issue, and for # documenting the design decisions that have gone into Python. The # PEP author is responsible for building consensus within the # community and documenting dissenting opinions. So we have a proposal for a new feature, and we have dissenting opinions. Who are you to decide that this additions is too simple to require a PEP on its own? > As for use cases: I have already given a whole bunch of them > (Unicode compression, normalization, escaping in various ways). I was asking for specific examples: Names of specific codecs that you want to implement, and application code fragments using these specific codecs. I don't know how to use Unicode compression if I had such this proposed feature, for example. I know what XML escaping is, and I cannot see how this feature would help. > True, but not all XML text out there is meant for XML parsers to > read ;-). Preprocessing of e.g. XML text in Python is a rather common > thing to do and this is what the direct codec access methods are > meant for. Can you give an example of an application which processes XML without a parser, but with converting character entities (preferably open-source, so I can study its code)? I wonder whether they get CDATA sections right... MAL, I really mean that: Please don't make claims that something is common or useful without giving an *exact* example. Regards, Martin P.S. This insistence on adding Unicode and string methods makes it appear as if the author of the codecs module now thinks that the API of it sucks. From thomas at xs4all.net Tue Jun 12 13:16:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 13:16:05 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <00ba01c0f32d$208d4160$0900a8c0@spiff> References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> <00ba01c0f32d$208d4160$0900a8c0@spiff> Message-ID: <20010612131605.Q22849@xs4all.nl> On Tue, Jun 12, 2001 at 12:47:49PM +0200, Fredrik Lundh wrote: > Thomas Wouters wrote: > > > print "spam:", "ham" : "and" : "eggs" > > > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. > and "+" (or plain whitespace) instead of ":", right? Not really. That would only work for string-types. Print auto-converts, remember ? At least the ':' is unambiguous. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue Jun 12 13:42:31 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 13:42:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> Message-ID: <3B260027.7DD33246@lemburg.com> "Martin v. Loewis" wrote: > > > > > str.encode() > > > > str.decode() > > > > uni.encode() > > > > #uni.decode() # still missing > > > > > > It's not missing. str.decode and uni.encode go through a single codec; > > > that's easy. str.encode is somewhat more confusing, because it really > > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > > str(uni).decode, are you? > > > > No. uni.decode() will (just like the other methods) directly > > interface to the codecs decoder -- there is no magic conversion > > involved. It is meant to be used by Unicode-Unicode codecs > > When invoking "Hallo".encode("utf-8"), two conversions are executed: > first the default decoding into Unicode, then the UTF-8 encoding. Of > course, that is not the intended use (but then, is the intended use > documented anywhere?): instead, people should write > "Hallo".encode("base64") instead. This is an example I can understand, > although I'm not sure why it is inherently better to write this > instead of writing base64.encodestring("Hallo"). Please note that the conversion from string to Unicode is done by the codec, not the .encode() interface. > > > If not that, what else would it mean? And if it means something else, > > > it is clearly not symmetric to str.encode, so it is not "missing". > > > > It is in the sense that strings support this method and Unicode > > currently doesn't. > > The rationale for string.encode is weak: it argues that string->string > conversions are frequent enough to justify this API, even though these > conversions have nothing to do with coded character sets. You still don't get it: codecs can be used for much more than just character set conversion ! > So far, I can see *no* rationale for unicode.decode. > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > PEP 1 says: > > # We intend PEPs to be the primary mechanisms for proposing new > # features, for collecting community input on an issue, and for > # documenting the design decisions that have gone into Python. The > # PEP author is responsible for building consensus within the > # community and documenting dissenting opinions. > > So we have a proposal for a new feature, and we have dissenting > opinions. Who are you to decide that this additions is too simple to > require a PEP on its own? So you want a PEP for each and every small addition to in the core ?! (I am not talking about features which might break code !) > > As for use cases: I have already given a whole bunch of them > > (Unicode compression, normalization, escaping in various ways). > > I was asking for specific examples: Names of specific codecs that you > want to implement, and application code fragments using these specific > codecs. I don't know how to use Unicode compression if I had such this > proposed feature, for example. I know what XML escaping is, and I > cannot see how this feature would help. I think I have given enough examples in this thread already. See below for some more. > > True, but not all XML text out there is meant for XML parsers to > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > thing to do and this is what the direct codec access methods are > > meant for. > > Can you give an example of an application which processes XML without > a parser, but with converting character entities (preferably > open-source, so I can study its code)? I wonder whether they get CDATA > sections right... MAL, I really mean that: Please don't make claims > that something is common or useful without giving an *exact* example. Yes, I am using these feature in real code and no, I can't show it to you because it's closed source. XML is only one example where this would be useful, HTML is another text format which would benefit from it, URL encoding is yet another application. You basically find these applications in all situations where some form of escaping is needed. What I am trying to do here is simplify codec access and usage for the casual user. .encode() and .decode() are very intuitive ways to deal with data transformation, IMHO. > Regards, > Martin > > P.S. This insistence on adding Unicode and string methods makes it > appear as if the author of the codecs module now thinks that the API > of it sucks. No comment. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Tue Jun 12 16:22:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:22:26 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <15142.9634.842402.241225@anthem.wooz.org> >>>>> "M" == M writes: M> Codecs are in no way constrained to only interface between M> strings and Unicode. There are many other possibilities for M> their usage out there. Just look at the latest checkins for a M> bunch of string-string codecs for examples of codecs which M> solve common real-life problems and do not interface to M> Unicode. Having just followed this thread tangentially, I do have to say it seems quite cool to be able to do something like the following in Python 2.2: >>> s = msg['from'] >>> parts = s.split('?') >>> if parts[2].lower() == 'q': ... name = parts[3].decode('quopri') ... elif parts[2].lower() == 'b': ... name = parts[3].decode('base64') ... -Barry From fredrik at pythonware.com Tue Jun 12 16:45:16 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 16:45:16 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> barry wrote: > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') uhuh? and how exactly is this cooler than being able to do something like the following: import quopri, base64 s = msg['from'] parts = s.split('?') if parts[2].lower() == 'q': name = quopri.decodestring(parts[3]) elif parts[2].lower() == 'b': name = base64.decodestring(parts[3]) (going through the codec registry is slower, and imports more modules, but what's so cool with that?) From barry at digicool.com Tue Jun 12 16:50:01 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:50:01 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <15142.11289.16053.424966@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> uhuh? and how exactly is this cooler than being able to do FL> something like the following: | import quopri, base64 | s = msg['from'] | parts = s.split('?') | if parts[2].lower() == 'q': | name = quopri.decodestring(parts[3]) | elif parts[2].lower() == 'b': | name = base64.decodestring(parts[3]) FL> (going through the codec registry is slower, and imports more FL> modules, but what's so cool with that?) -------------------- snip snip -------------------- Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import quopri >>> quopri.decodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'decodestring' >>> quopri.encodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'encodestring' -------------------- snip snip -------------------- Much cooler :) Okay, okay, so we /could/ add encodestring/decodestring to quopri.py, which isn't a bad idea. But it seems to me that the s.encode() s.decode() API is nicely universal for any supported encoding. but-what-do-i-know?-ly y'rs, -Barry From skip at pobox.com Tue Jun 12 17:32:11 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 12 Jun 2001 10:32:11 -0500 Subject: [Python-Dev] Re: metaclasses -- aka Don Beaudry hook/hack In-Reply-To: References: Message-ID: <15142.13819.477491.993419@beluga.mojam.com> James> Before I head too deeply into Zope dependencies, I would be James> interested in knowing whether or not "type(MyClass) == James> types.ClassType" and "isinstance(myInstance,MyClass)" work for James> classes derived from ExtensionClass. Straight from the horse's mouth: >>> type(gtk.GtkButton) >>> type(gtk.GtkButton) == types.ClassType 0 >>> isinstance(gtk.GtkButton(), gtk.GtkButton) 1 James> (And if so, why do these work for C extension classes using the James> Don Beaudry hook but not for Python classes using the same hook?) You'll have to ask someone with more subject knowledge. (Don would probably be a good start. ;-) I've cc'd python-dev because the experts in this area are all there. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From skip at pobox.com Tue Jun 12 17:53:24 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 12 Jun 2001 10:53:24 -0500 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <15142.15092.57490.275201@beluga.mojam.com> Tim> The notion that legions of people are using Tim> print line Tim> as an obscure way to get double-spacing is taking me by surprise. Tim> Nobody on the iterators list had this objection. I suspect that most CGI scripts that didn't use any abstraction for HTTP responses suffer from this potential problem. I've been using one abstraction or another for quite awhile now, but I still have a few CGI scripts laying around that still use print to emit headers and bodies of HTTP responses. Skip From barry at digicool.com Tue Jun 12 18:06:53 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 12:06:53 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <15142.15092.57490.275201@beluga.mojam.com> Message-ID: <15142.15901.223641.151562@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> I suspect that most CGI scripts that didn't use any SM> abstraction for HTTP responses suffer from this potential SM> problem. I've been using one abstraction or another for quite SM> awhile now, but I still have a few CGI scripts laying around SM> that still use print to emit headers and bodies of HTTP SM> responses. Same here. From paulp at ActiveState.com Tue Jun 12 19:22:31 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:22:31 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <3B264FD7.86ACB034@ActiveState.com> "Barry A. Warsaw" wrote: > >... > > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... I think that the central point is that if code like the above is useful and supported then it needs to be the same for Unicode strings as for 8-bit strings. If the code above is NOT useful and should NOT be supported then we need to undo it before 2.2 ships. This unicode.decode argument is just a proxy for the real argument about the above. I don't feel strongly one way or another about this (ab?)use of the codecs concept, myself, but I do feel strongly that Unicode strings should behave as much as possible like 8-bit strings. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Tue Jun 12 19:31:54 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:31:54 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <3B26520A.C579D00C@ActiveState.com> Fredrik Lundh wrote: > >... > > uhuh? and how exactly is this cooler than being able to do > something like the following: > > import quopri, base64 >... > > (going through the codec registry is slower, and imports more > modules, but what's so cool with that?) One argument in favor is that the base64 and quopri modules are not standardized today. In fact, Python has a huge problem with standardization of access paradigms in the standard library. We get the best standardization (i.e. of the "file interface") when we force module authors to conform to a standard in order to get some "extra feature" of the standard library. A counter argument is that the conflation of the concept of Unicode encoding/decoding and other forms of encoding/decoding could be confusing. MAL would not have to keep pointing out that "codecs are for more than Unicode encoding/decoding" if it was obvious. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry at digicool.com Tue Jun 12 20:24:25 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:24:25 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <15142.24153.921774.610559@anthem.wooz.org> >>>>> "PP" == Paul Prescod writes: PP> I don't feel strongly one way or another about this (ab?)use PP> of the codecs concept, myself, but I do feel strongly that PP> Unicode strings should behave as much as possible like 8-bit PP> strings. I'd agree with both statements. time-to-add-{encode,decode}string()-to-quopri-ly y'rs, -Barry From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:00:19 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:00:19 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B260027.7DD33246@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> <3B260027.7DD33246@lemburg.com> Message-ID: <200106121800.f5CI0Jw00946@mira.informatik.hu-berlin.de> > > So we have a proposal for a new feature, and we have dissenting > > opinions. Who are you to decide that this additions is too simple to > > require a PEP on its own? > > So you want a PEP for each and every small addition to in the > core ?! (I am not talking about features which might break code !) No, additions that find immediate consent and come with complete patches (including documentation and test cases) don't need this overhead. Features that find resistance should go through the full process. > > I was asking for specific examples: Names of specific codecs that you > > want to implement, and application code fragments using these specific > > codecs. I don't know how to use Unicode compression if I had such this > > proposed feature, for example. I know what XML escaping is, and I > > cannot see how this feature would help. > > I think I have given enough examples in this thread already. See > below for some more. I haven't seen a single example involving actual Python code. > > > True, but not all XML text out there is meant for XML parsers to > > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > > thing to do and this is what the direct codec access methods are > > > meant for. > > > > Can you give an example of an application [...] > > Yes, I am using these feature in real code and no, I can't show it to > you because it's closed source. Not very convincing... If this is "a rather common thing to do", it shouldn't be hard to find examples in other people's code, shouldn't it? > XML is only one example where this would be useful, HTML is another > text format which would benefit from it, URL encoding is yet another > application. You basically find these applications in all situations > where some form of escaping is needed. These are all not specific examples. I'm still looking for a specific application that might use this feature, and specific codec names and implementations. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:08:31 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:08:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.9634.842402.241225@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... What is the type of parts[3] here? If it is a plain string, it is already possible: >>> 'SGVsbG8=\n'.decode("base64") 'Hello' I doubt you'd ever have a Unicode string that represents a base64-encoded byte string, and if you had, .decode would probably do the wrong thing: >>> import codecs >>> enc,dec,_,_ = codecs.lookup("base64") >>> dec(u'SGVsbG8=\n') ('Hello', 9) Note that this returns a byte string, not a Unicode string. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:18:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:18:45 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B264FD7.86ACB034@ActiveState.com> (message from Paul Prescod on Tue, 12 Jun 2001 10:22:31 -0700) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> > > Having just followed this thread tangentially, I do have to say it > > seems quite cool to be able to do something like the following in > > Python 2.2: > > > > >>> s = msg['from'] > > >>> parts = s.split('?') > > >>> if parts[2].lower() == 'q': > > ... name = parts[3].decode('quopri') > > ... elif parts[2].lower() == 'b': > > ... name = parts[3].decode('base64') > > ... > > I think that the central point is that if code like the above is useful > and supported then it needs to be the same for Unicode strings as for > 8-bit strings. Why is that? An encoding, by nature, is something that produces a byte sequence from some input. So you can only decode byte sequences, not character strings. > If the code above is NOT useful and should NOT be supported then we > need to undo it before 2.2 ships. This unicode.decode argument is > just a proxy for the real argument about the above. No, it isn't. The code is useful for byte strings, but not for Unicode strings. > I don't feel strongly one way or another about this (ab?)use of the > codecs concept, myself, but I do feel strongly that Unicode strings > should behave as much as possible like 8-bit strings. Not at all. Byte strings and character strings are as different as are byte strings and lists of DOM child nodes (i.e. the only common thing is that they are sequences). Regards, Martin From barry at digicool.com Tue Jun 12 20:35:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:35:10 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> Message-ID: <15142.24798.941322.762791@anthem.wooz.org> >>>>> "MvL" == Martin v Loewis writes: MvL> What is the type of parts[3] here? If it is a plain string, MvL> it is already possible: >> 'SGVsbG8=\n'.decode("base64") MvL> 'Hello' But only in Python 2.2a0 currently, right? And yes, the type is plain string. MvL> I doubt you'd ever have a Unicode string that represents a MvL> base64-encoded byte string, and if you had, .decode would MvL> probably do the wrong thing: >> import codecs enc,dec,_,_ = codecs.lookup("base64") >> dec(u'SGVsbG8=\n') MvL> ('Hello', 9) MvL> Note that this returns a byte string, not a Unicode string. I trust you on that. ;) I've only played with this tangentially since this thread cropped up. -Barry From paulp at ActiveState.com Tue Jun 12 20:51:25 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 11:51:25 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> Message-ID: <3B2664AD.B560D685@ActiveState.com> "Martin v. Loewis" wrote: > >... > > Why is that? An encoding, by nature, is something that produces a byte > sequence from some input. So you can only decode byte sequences, not > character strings. According to this logic, it is not logical to "encode" a Unicode string into a base64'd Unicode string or "decode" a Unicode string from a base64'd Unicode string. But I have seen circumstances where one XML document is base64'd into another. In that circumstance, it would be useful to say node.nodeValue.decode("base64"). Let me turn the argument around? What would the *harm* in having 8-bit strings and Unicode strings behave similarly in this manner? >... > Not at all. Byte strings and character strings are as different as are > byte strings and lists of DOM child nodes (i.e. the only common thing > is that they are sequences). 8-bit strings are not purely byte strings. They are also "character strings". That's why they have methods like "capitalize", "isalpha", "lower", "swapcase", "title" and so forth. DOM nodes and byte strings have virtually no methods in common. We could argue angels on the head of a pin until the cows come home but 90% of all Python users think of 8-bit strings as strings of characters. So arguments based on the idea that they are not "really" character strings are wishful thinking. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 22:01:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 22:01:39 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.24798.941322.762791@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> <15142.24798.941322.762791@anthem.wooz.org> Message-ID: <200106122001.f5CK1de01350@mira.informatik.hu-berlin.de> > MvL> What is the type of parts[3] here? If it is a plain string, > MvL> it is already possible: > > >> 'SGVsbG8=\n'.decode("base64") > MvL> 'Hello' > > But only in Python 2.2a0 currently, right? Exactly, since MAL's last patch. If people think that byte strings must behave exactly as Unicode strings, I'd rather prefer to back out this patch instead of adding unicode.decode. Personally, I think the status quo is fine and should not be changed. Regards, Martin From aahz at rahul.net Wed Jun 13 01:48:14 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 12 Jun 2001 16:48:14 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B25C62C.969B40B3@lemburg.com> from "M.-A. Lemburg" at Jun 12, 2001 09:35:08 AM Message-ID: <20010612234815.2C90599C82@waltz.rahul.net> M.-A. Lemburg wrote: > Aahz Maruch wrote: >> M.-A. Lemburg wrote: >>> >>> Tamito KAJIYAMA recently announced that he changed the licenses >>> on his Japanese codecs from GPL to a BSD variant. This is great >>> news since this would allow adding the codecs to the Python core >>> which would certainly attract more users to Python in Asia. >>> >>> The codecs are 280kB when compressed as .tar.gz file. >> >> +0 >> >> I like the idea, am uncomfortable with that amount of space. > > Tamito corrected me about the size (his file includes the .pyc > byte code files): the correct size for the sources is 143kB -- > almost half of what I initially wrote. That makes me +0.5, possibly a bit higher. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From greg at cosc.canterbury.ac.nz Wed Jun 13 01:57:35 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 11:57:35 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl> Message-ID: <200106122357.LAA03316@s454.cosc.canterbury.ac.nz> Thomas Wouters : > I'd also prefer special syntax to control the softspace > behaviour... Too late for that, I 'spose Maybe not. I'd suggest spelling "don't add a newline or a space after this" as: print a, b, c... This could coexist with the current softspace behaviour, and the use of a trailing comma could be deprecated. After a suitable warning period, the softspace flag could then be removed. > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. I don't think it's so important to have a special syntax for that, since it can be accomplished in other ways without too much difficulty, e.g. print "%s: %s%s%s" % ("spam", "ham", "and", "eggs")... The main thing I'd like is to get rid of the statefulness of the current behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Wed Jun 13 02:02:40 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 12:02:40 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Particularly, it should clearly explain why we need a completely new and separate namespace mechanism for these codec things, and provide a firm rationale for deciding whether any proposed new form of encoding or decoding should be placed in this namespace or the module namespace. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From paulp at ActiveState.com Wed Jun 13 02:32:17 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 17:32:17 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B26B491.CA8536BD@ActiveState.com> Aahz Maruch wrote: > >.... > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We really shouldn't consider the Japanese without Chinese and Korean. And those both seem *larger* than the Japanese. :( What if we add them to CVS and formally maintain them as part of the core but distribute them as a separate download? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Wed Jun 13 04:25:23 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:25:23 -0700 Subject: [Python-Dev] Pure Python strptime Message-ID: <3B26CF13.2A337AC6@ActiveState.com> Should this strptime implementation be added to the standard library? http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/56036 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Wed Jun 13 04:41:53 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:41:53 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> Message-ID: <3B26D2F1.8840FB1A@ActiveState.com> Greg Ewing wrote: > > > -1 on anything except a PEP that covers *all* aspects of > > encode/decode (including things that are already implemented) > > Particularly, it should clearly explain why we need a > completely new and separate namespace mechanism for these > codec things, I don't know whether MAL will write the PEP or not but the rationale for a new namespace is trivial. The namespace exists and is maintained by the Internet Assigned Names Association. You can't work with Unicode without working with names from this list: http://www.iana.org/assignments/character-sets MAL is basically exending it to include names from this list: http://www.iana.org/assignments/transfer-encodings and others. > and provide a firm rationale for deciding > whether any proposed new form of encoding or decoding > should be placed in this namespace or the module namespace. *My* answer would be that any function that has strings (8-bit or Unicode) as both domain and range is potentially a codec. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Wed Jun 13 06:45:36 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 16:45:36 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <200106130445.QAA03370@s454.cosc.canterbury.ac.nz> Paul Prescod : > The namespace exists and is maintained by > the Internet Assigned Names Association. Hmmm... so, is the only reason that we're not using the module namespace the fact that these names can contain non-alphanumeric characters? Or is there more to it than that? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From skip at pobox.com Wed Jun 13 07:09:38 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 00:09:38 -0500 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B26B491.CA8536BD@ActiveState.com> References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <15142.62866.180570.158325@beluga.mojam.com> Paul> What if we add them to CVS and formally maintain them as part of Paul> the core but distribute them as a separate download? That seems to make sense to me. I suspect most Linux distributions (for example) bundle Python into multiple pieces already. My Mandrake system splits the core into (I think) four pieces. It also bundles several other RPMs for PIL, NumPy, Postgres and RPM. Adding another package for a set of codecs doesn't seem like a big deal. Skip From mal at lemburg.com Wed Jun 13 09:02:05 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:02:05 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B270FED.8E2A4ECB@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > Aahz Maruch wrote: > >> M.-A. Lemburg wrote: > >>> > >>> Tamito KAJIYAMA recently announced that he changed the licenses > >>> on his Japanese codecs from GPL to a BSD variant. This is great > >>> news since this would allow adding the codecs to the Python core > >>> which would certainly attract more users to Python in Asia. > >>> > >>> The codecs are 280kB when compressed as .tar.gz file. > >> > >> +0 > >> > >> I like the idea, am uncomfortable with that amount of space. > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We will be working on reducing the size of the mapping tables. Can't promise anything, but I believe that Tamito can squeeze them into under 100k using some compression technique (which one is yet to be determined ;). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 09:05:31 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:05:31 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <3B2710BB.CFD8215@lemburg.com> Paul Prescod wrote: > > Aahz Maruch wrote: > > > >.... > > > > > > Tamito corrected me about the size (his file includes the .pyc > > > byte code files): the correct size for the sources is 143kB -- > > > almost half of what I initially wrote. > > > > That makes me +0.5, possibly a bit higher. > > We really shouldn't consider the Japanese without Chinese and Korean. > And those both seem *larger* than the Japanese. :( Unfortunately, these aren't available under a usable (=non-GPL) license yet. > What if we add them to CVS and formally maintain them as part of the > core but distribute them as a separate download? Good idea. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 09:17:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:17:14 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <3B27137A.E7BFC4EC@lemburg.com> Paul Prescod wrote: > > Greg Ewing wrote: > > > > > -1 on anything except a PEP that covers *all* aspects of > > > encode/decode (including things that are already implemented) > > > > Particularly, it should clearly explain why we need a > > completely new and separate namespace mechanism for these > > codec things, > > I don't know whether MAL will write the PEP or not With the kind of attitude towards the proposed extensions which I am currently getting in this forum, I'd rather spend my time on something more useful. > but the rationale for > a new namespace is trivial. The namespace exists and is maintained by > the Internet Assigned Names Association. You can't work with Unicode > without working with names from this list: > > http://www.iana.org/assignments/character-sets > > MAL is basically exending it to include names from this list: > > http://www.iana.org/assignments/transfer-encodings > > and others. Right. Since these codecs live in the encoding package, I don't think we have a namespace problem here. Codecs which are hooked into the codec registry by the encoding package's search function will have to provide a getregentry() entry point. If this API is not available, the codec won't load. Since the encoding package's search function is using standard Python imports for loading the codecs, we can also benefit from a nice side-effect: codec names can use Python's dotted names (which then map to standard Python packages). This allows codec writers like Tamito to place their codecs into Python package thereby avoiding any conflict with other authors of codecs with similar names. > > and provide a firm rationale for deciding > > whether any proposed new form of encoding or decoding > > should be placed in this namespace or the module namespace. > > *My* answer would be that any function that has strings (8-bit or > Unicode) as both domain and range is potentially a codec. Right. (Hey, the first time *we* agree on something ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 14:53:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 14:53:50 +0200 Subject: [Python-Dev] Weird message to stderr Message-ID: <3B27625E.F18046F7@lemburg.com> Running Python 2.1 using a .pyc file I get these weird messages printed to stderr: run_pyc_file: nested_scopes: 0 These originate in pythonrun.c: static PyObject * run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) { PyCodeObject *co; PyObject *v; long magic; long PyImport_GetMagicNumber(void); magic = PyMarshal_ReadLongFromFile(fp); if (magic != PyImport_GetMagicNumber()) { PyErr_SetString(PyExc_RuntimeError, "Bad magic number in .pyc file"); return NULL; } (void) PyMarshal_ReadLongFromFile(fp); v = PyMarshal_ReadLastObjectFromFile(fp); fclose(fp); if (v == NULL || !PyCode_Check(v)) { Py_XDECREF(v); PyErr_SetString(PyExc_RuntimeError, "Bad code object in .pyc file"); return NULL; } co = (PyCodeObject *)v; v = PyEval_EvalCode(co, globals, locals); if (v && flags) { if (co->co_flags & CO_NESTED) flags->cf_nested_scopes = 1; fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", flags->cf_nested_scopes); } Py_DECREF(co); return v; } Is this is left over debug printf or should I be warned in some way ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed Jun 13 16:41:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 10:41:37 -0400 Subject: [Python-Dev] Re: Adding .decode() method to Unicode In-Reply-To: Your message of "Tue, 12 Jun 2001 22:40:01 EDT." References: Message-ID: <200106131441.KAA16557@cj20424-a.reston1.va.home.com> Wow, this almost looks like a real flamefest. ("Flame" being defined as the presence of metacomments.) (In the following, s is an 8-bit string, u is a Unicode string, and e is an encoding name.) The original design of the encode() methods of string and Unicode objects (in 2.0 and 2.1) is asymmetric, and clearly geared towards Unicode codecs only: to decode an 8-bit string you *have* to use unicode(s, encoding) while to encode a Unicode string into a specific 8-bit encoding you *have* to use u.encode(e). 8-bit strings also have an encode() method: s.encode(e) is the same as unicode(s).encode(e). (This is useful since code that expects Unicode strings should also work when it is passed ASCII-encoded 8-bit strings.) I'd say there's no need for s.decode(e), since this can already be done with unicode(s, e) -- and to me that API looks better since it clearly states that the result is Unicode. We *could* have designed the encoding API similarly: str(u, e) is available, symmetric with unicode(s, e), and a logical extension of str(u) which uses the default encoding. But I accept the argument that u.encode(e) is better because it emphasizes the encoding action, and because it means no API changes to str(). I guess what I'm saying here is that 'str' does not give enough of a clue that an encoding action is going on, while 'unicode' *does* give a clue that a decoding action is being done: as soon as you read "Unicode" you think "Mmm, encodings..." -- but "str" is pretty neutral, so u.encode(e) is needed to give a clue. Marc-Andre proposes (and has partially checked in) changes that stretch the meaning of the encode() method, and add a decode() method, to be basically interfaces to anything you can do with the codecs module. The return type of encode() and decode() is now determined by the codec (formerly, encode() always returned an 8-bit string). Some new codecs have been added that do things like gzip and base64. Initially, I liked this, and even contributed a codec. But questions keep coming up. What is the problem being solved? True, the codecs module has a clumsy interface if you just want to invoke a codec on some data. But that can easily be remedied by adding convenience functions encode() and decode() to codecs.py -- which would have the added advantage that it would work for other datatypes that support the buffer interface, e.g. codecs.encode(myPILobject, "base64"). True, the "codec" pattern can be used for other encodings than Unicode. But it seems to me that the entire codecs architecture is rather strongly geared towards en/decoding Unicode, and it's not clear how well other codecs fit in this pattern (e.g. I noticed that all the non-Unicode codecs ignore the error handling parameter or assert that it is set to 'strict'). Is it really right that x.encode("gzip") and x.encode("utf-8") look similar, while the former requires an 8-bit string and the latter only makes sense if x is a Unicode string? Another (minor) issue is that Unicode encoding names are an IANA namespace. Is it wise to add our own names to this? I'm not forcing a decision here, but I do ask that we consider these issues before forging ahead with what might be a mistake. A PEP would be most helpful to focus the discussion. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jun 13 17:19:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 11:19:03 -0400 Subject: [Python-Dev] Releasing 2.0.1 Message-ID: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> I think it's now or never with the 2.0.1 release. Moshe seems to have disappeared from the face of the earth. His last mail to me (May 23) suggested that it was good to go except for the SRE checkin and the NEWS file. I did the SRE checkin today (making it identical to what's in 2.1, per /F's recommendation) and added a note about that to the NEWS file -- I wouldn't know what else would be needed there. So I think it's good to go now. I can release a 2.0.1c1 this week (indicating a release candidate) and a final 2.0.1 next week. If you know a good reason why I should hold off on releasing this, or if you have a patch that absolutely should make it into 2.0.1, please let me know NOW! This project is way overdue. (Thomas is ready to release 2.1.1 as soon as this goes out, I believe. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 13 17:29:19 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 17:29:19 +0200 Subject: [Python-Dev] Releasing 2.0.1 References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <023f01c0f41d$9dfb87b0$0900a8c0@spiff> guido wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 From skip at pobox.com Wed Jun 13 17:49:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 10:49:58 -0500 Subject: [Python-Dev] on announcing point releases Message-ID: <15143.35750.837420.376281@beluga.mojam.com> (Just thinking out loud) I wonder if it would help gain wider distribution for the point releases if explicit announcements were sent to the various Linux distributors so they could create updated packages (RPMs, debs, whatever) for their users. On a related note, I see one RedHat email address on python-dev (and one Debian address on python-list). Are there other Linux distributions that are heavy Python users (as opposed to simply packaging it up for inclusion)? If so, perhaps they should be invited to join python-dev. Skip From niemeyer at conectiva.com Wed Jun 13 17:54:08 2001 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Wed, 13 Jun 2001 12:54:08 -0300 Subject: [Python-Dev] sre improvements Message-ID: <20010613125408.W13940@tux.distro.conectiva> I'm forwarding this to the dev list.. probably somebody here knows about this... -------------- Hi there!! I have looked into sre, and was wondering if somebody is working to implement more features in it. I'd like, for example, to see the (?(1)blah) operator, available in perl, working. Should I care about this? Should I write some code?? Anybody working in sre currently? Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From skip at pobox.com Wed Jun 13 18:03:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 11:03:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <20010613125408.W13940@tux.distro.conectiva> References: <20010613125408.W13940@tux.distro.conectiva> Message-ID: <15143.36590.447465.657241@beluga.mojam.com> Gustavo> I'd like, for example, to see the (?(1)blah) operator, Gustavo> available in perl, working. Gustavo, For the non-Perl-heads on the list, can you explain what the (?(1)blah) operator does? -- Skip Montanaro (skip at pobox.com) (847)971-7098 From gregor at mediasupervision.de Wed Jun 13 18:13:17 2001 From: gregor at mediasupervision.de (Gregor Hoffleit) Date: Wed, 13 Jun 2001 18:13:17 +0200 Subject: [Python-Dev] on announcing point releases In-Reply-To: <15143.35750.837420.376281@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 10:49:58AM -0500 References: <15143.35750.837420.376281@beluga.mojam.com> Message-ID: <20010613181317.B30006@mediasupervision.de> On Wed, Jun 13, 2001 at 10:49:58AM -0500, Skip Montanaro wrote: > I wonder if it would help gain wider distribution for the point releases if > explicit announcements were sent to the various Linux distributors so they > could create updated packages (RPMs, debs, whatever) for their users. > > On a related note, I see one RedHat email address on python-dev (and one > Debian address on python-list). Are there other Linux distributions that > are heavy Python users (as opposed to simply packaging it up for inclusion)? > If so, perhaps they should be invited to join python-dev. Rest assured that Debian is present on python-dev as well, and nervously looking forward to the maintenance releases ;-) I hope 2.1.1 will make it out in time as well for our next release (being aware that 'before the next Debian release happens' is no very tight timeframe ;-). Gregor From guido at digicool.com Wed Jun 13 18:16:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 12:16:42 -0400 Subject: [Python-Dev] Re: PEP 259: Omit printing newline after newline Message-ID: <200106131616.MAA17468@cj20424-a.reston1.va.home.com> OK, OK, PEP 259 is dead. It seemed a nice idea at the time. :-) Alex and others, if you're serious about implementing print as __print__(), why don't you write a PEP? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Wed Jun 13 18:21:20 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Jun 2001 12:21:20 -0400 (EDT) Subject: [Python-Dev] on announcing point releases In-Reply-To: <20010613181317.B30006@mediasupervision.de> References: <15143.35750.837420.376281@beluga.mojam.com> <20010613181317.B30006@mediasupervision.de> Message-ID: <15143.37632.758887.966026@cj42289-a.reston1.va.home.com> Gregor Hoffleit writes: > looking forward to the maintenance releases ;-) I hope 2.1.1 will make it > out in time as well for our next release (being aware that 'before the next Personally, I see no reason for Thomas to wait for the 2.0.1 release if he doesn't want to. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fredrik at pythonware.com Wed Jun 13 18:32:13 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 18:32:13 +0200 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <007801c0f426$84d1f220$4ffa42d5@hagrid> skip wrote: > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? conditionals: (?(cond)true) (?(cond)true|false) where cond is a group number (true if defined) or an assertion pattern, and true/false are patterns. (imo, whoever invented that needs help ;-) From akuchlin at mems-exchange.org Wed Jun 13 18:39:58 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 13 Jun 2001 12:39:58 -0400 Subject: [Python-Dev] sre improvements Message-ID: >For the non-Perl-heads on the list, can you explain what the (?(1)blah) >operator does? Conditionals. From http://www.perl.com/pub/doc/manual/html/pod/perlre.html, (...)(?(1)A|B) will match 'A' if group 1 matched, and B if it didn't. I'm not sure how "matched" is defined, as the Perl docs are vague; judging from the example, it means 'matched something of nonzero length'. Perl 5.6 introduced a bunch of new regex features, but I'm not sure how much we actually *care* about them; they're no doubt useful if regexes are the only tool you've got and you try to do full parsers using them, but they're also complicated to explain and will make the compiler messier. For example, lookaheads can also go into the conditional, not just an integer. (?i) now obeys the scoping from parens, and you can turn it off with (?-i). If Gustavo wants to implement these features and /F approves of his patches, then sure, put them in. But if either of those conditions fails, little will be lost. --amk From dmitry.antipov at auriga.ru Wed Jun 13 18:46:09 2001 From: dmitry.antipov at auriga.ru (dmitry.antipov at auriga.ru) Date: Wed, 13 Jun 2001 20:46:09 +0400 Subject: [Python-Dev] Why not Lisp-like list-related functions ? Message-ID: <3B2798D1.16F832A3@auriga.ru> Hello all, I'm new to Python but quite familiar with Lisp. So my question is about Python list-related functions. Why append(), extend(), sort(), reverse() etc. doesn't return a reference to it's own (modified) argument ? IMHO (I'm tweaking Python 2.1 to allow first example possible), >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) [9, 13, 19, 21, 8, 3, 6] >>> looks much better (and more "functional") than >>> x = [5, 8, 9, 3] >>> x.sort() >>> x = [3 + x * 2 for x in x] >>> y = [6, 3, 8] >>> y.reverse() >>> x.extend(y) >>> x [9, 13, 19, 21, 8, 3, 6] >>> Python designers and fans, please explain it to me :-). Any comments are welcome. Thanks and reply to me directly if possible, Dmitry Antipov From guido at digicool.com Wed Jun 13 19:01:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 13:01:34 -0400 Subject: [Python-Dev] Weird message to stderr Message-ID: <200106131701.NAA17619@cj20424-a.reston1.va.home.com> > Running Python 2.1 using a .pyc file I get these weird messages > printed to stderr: > > run_pyc_file: nested_scopes: 0 > > These originate in pythonrun.c: > > static PyObject * > run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, > PyCompilerFlags *flags) > { [...] > if (v && flags) { > if (co->co_flags & CO_NESTED) > flags->cf_nested_scopes = 1; > fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", > flags->cf_nested_scopes); > } > Py_DECREF(co); > return v; > } > > Is this is left over debug printf or should I be warned > in some way ? I'll channel Jeremy... Looks like a debug message -- this code isn't tested by the standard test suite. Feel free to get rid of the fprintf() statement (and no, you don't have to write a PEP for this :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 13 19:06:52 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 19:06:52 +0200 Subject: [Python-Dev] Why not Lisp-like list-related functions ? References: <3B2798D1.16F832A3@auriga.ru> Message-ID: <012d01c0f42b$45453b30$4ffa42d5@hagrid> Dmitry wrote: > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? doesn't Lisp have a FAQ? ;-) http://www.python.org/doc/FAQ.html#6.20 Q. Why doesn't list.sort() return the sorted list? ... basically, operations that modify an object generally don't return the object itself, to avoid mistakes like: for item in list.reverse(): print item # backwards ... for item in list.reverse(): print item # backwards, or? a slightly more pythonic way would be to add sorted, extended, reversed (etc) -- but that leads to method bloat. in addition, based on studying huge amounts of python code, I doubt cascading list operations would save the world that much typing... followups to python-list at python.org From paulp at ActiveState.com Wed Jun 13 19:22:09 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 13 Jun 2001 10:22:09 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> Message-ID: <3B27A141.6C69EC55@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > > > We really shouldn't consider the Japanese without Chinese and Korean. > > And those both seem *larger* than the Japanese. :( > > Unfortunately, these aren't available under a usable (=non-GPL) > license yet. Frank Chen has agreed to make them available under a Python-style license. > > What if we add them to CVS and formally maintain them as part of the > > core but distribute them as a separate download? > > Good idea. All in favour? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From aahz at rahul.net Wed Jun 13 19:32:24 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 13 Jun 2001 10:32:24 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B27A141.6C69EC55@ActiveState.com> from "Paul Prescod" at Jun 13, 2001 10:22:09 AM Message-ID: <20010613173224.0FFB999C87@waltz.rahul.net> >>> What if we add them to CVS and formally maintain them as part of the >>> core but distribute them as a separate download? >> >> Good idea. > > All in favour? +1 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gward at python.net Wed Jun 13 20:53:20 2001 From: gward at python.net (Greg Ward) Date: Wed, 13 Jun 2001 14:53:20 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <007801c0f426$84d1f220$4ffa42d5@hagrid>; from fredrik@pythonware.com on Wed, Jun 13, 2001 at 06:32:13PM +0200 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> Message-ID: <20010613145320.G5114@gerg.ca> On 13 June 2001, Fredrik Lundh said: > conditionals: > > (?(cond)true) > (?(cond)true|false) > > where cond is a group number (true if defined) or an assertion > pattern, and true/false are patterns. > > (imo, whoever invented that needs help ;-) I think I'd have to agree with /F on this one... somewhere around Perl 5.003 or 5.004, regexes in Perl went from being a powerful and really cool facility to being a massively overgrown language-within-a-language. I *tried* to use some of the fancy new features a few times out of curiosity, but could never get them to work. (At the time, I think I was a pretty sharp Perl programmer, although I've dulled since then.) Greg -- Greg Ward - Unix bigot gward at python.net http://starship.python.net/~gward/ No animals were harmed in transmitting this message. From jepler at inetnebr.com Wed Jun 13 18:09:58 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Wed, 13 Jun 2001 11:09:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <15143.36590.447465.657241@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 11:03:58AM -0500 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <20010613110957.C29405@inetnebr.com> On Wed, Jun 13, 2001 at 11:03:58AM -0500, Skip Montanaro wrote: > > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > Gustavo, > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? from perlre(1): (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero- width assertion. Say, m{ ( \( )? [^()]+ (?(1) \) ) }x matches a chunk of non-parentheses, possibly included in parentheses themselves. Jeff From tim.one at home.com Thu Jun 14 08:12:48 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 14 Jun 2001 02:12:48 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B2664AD.B560D685@ActiveState.com> Message-ID: [Paul Prescod] > ... > We could argue angels on the head of a pin until the cows come home but > 90% of all Python users think of 8-bit strings as strings of characters. Actually, if you count me, make that 92%. some-things-were-easier-when-python-had-50-users-and-i-was-two- of-them-ly y'rs - tim From paulp at ActiveState.com Thu Jun 14 09:30:19 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 00:30:19 -0700 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> Message-ID: <3B28680B.A46CF171@ActiveState.com> Greg Ward wrote: > >... > > I think I'd have to agree with /F on this one... somewhere around Perl > 5.003 or 5.004, regexes in Perl went from being a powerful and really > cool facility to being a massively overgrown language-within-a-language. > I *tried* to use some of the fancy new features a few times out of > curiosity, but could never get them to work. (At the time, I think I > was a pretty sharp Perl programmer, although I've dulled since then.) I would rather see us try a new approach to regular expressions. I've seen a few proposals for more verbose-but-readable syntaxes. I think one was from Greg Ewing? And maybe one from Ping? For those of us who use regular expressions only once in a while (i.e. the lucky ones), the current syntax is a holy terror. Which characters are magical again? In what contexts? With how many levels of backslashing? Upper case W versus lower case W? Obviously we can never abandon the tried and true Perl5 RE module, but I think we could have another syntax on top. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From arigo at ulb.ac.be Thu Jun 14 10:58:48 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Thu, 14 Jun 2001 10:58:48 +0200 (MET DST) Subject: [Python-Dev] Special-casing "O" Message-ID: Hello everybody, For comparison purposes, I implemented the idea of optimizing PyArg_ParseTuple calls by modifying the C code itself. Here is the result: http://homepages.ulb.ac.be/~arigo/pyarg_pp.tgz I did not upload this as a patch at SourceForge for several reasons. The most fundamental is that it raises bootstrapping issues: how can we compile the Python interpreter if we first have to run a Python script on the source files ? Fixing this would make the Makefiles significantly more complex. The other reason is that the METH_O solution is probably still faster, as it often completely avoids to build the 1-tuple of arguments. More serious performance tests might be needed, however. A bientot, Armin. From thomas at xs4all.net Thu Jun 14 13:10:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 14 Jun 2001 13:10:01 +0200 Subject: [Python-Dev] Releasing 2.0.1 In-Reply-To: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <20010614131001.B1659@xs4all.nl> On Wed, Jun 13, 2001 at 11:19:03AM -0400, Guido van Rossum wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 here. > If you know a good reason why I should hold off on releasing this, or > if you have a patch that absolutely should make it into 2.0.1, please > let me know NOW! This project is way overdue. (Thomas is ready to > release 2.1.1 as soon as this goes out, I believe. :-) Well, not quite, but I can put in a couple of allnighters (I want to do a review of all log-messages since 2.1-final, to see if I missed any checkin messages, and I want to update the NEWS file with a list of bugs fixed) and have it ready in a week or two. I don't think 2.1.1 should be released *that* soon after 2.0.1 anyway. I noticed this in the LICENCE file, by the way: Python 2.1 is a derivative work of Python 1.6.1, as well as of Python 2.0. and 8. By copying, installing or otherwise using Python 2.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Thu Jun 14 13:14:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:14:22 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? Message-ID: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> > Hello all, > > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? IMHO (I'm tweaking Python 2.1 to allow first example > possible), > > >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) > [9, 13, 19, 21, 8, 3, 6] > >>> > > looks much better (and more "functional") than > > >>> x = [5, 8, 9, 3] > >>> x.sort() > >>> x = [3 + x * 2 for x in x] > >>> y = [6, 3, 8] > >>> y.reverse() > >>> x.extend(y) > >>> x > [9, 13, 19, 21, 8, 3, 6] > >>> > > Python designers and fans, please explain it to me :-). > Any comments are welcome. > > Thanks and reply to me directly if possible, > Dmitry Antipov Funny, to me your first form is much harder to read than your second. With the first form, I have to stop and think and look carefully at where the brackets are to see in which order the operations are executed, while in the second form it's obvious, because it's broken down in smaller chunks. So I guess that's the real reason: Python users have a procedural brain, not a functional brain, and we don't like Lispish code. Maybe we also have a smaller brain than the typical Lisper -- I would say, that would make us more normal, and if Python caters to people with a closer-to-average brain size, that would mean more people will be able to program in Python. History will decide... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jun 14 13:31:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:31:16 -0400 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +1, as long as they're not in the CVS subtree that's normally extracted for a regular source distribution. I propose this location in the CVS tree: python/dist/encodings/... (So 'encodings' would be a sibling of 'src', which has been pretty lonely ever since I started using CVS. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Jun 14 17:19:28 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 14 Jun 2001 11:19:28 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <200106141114.HAA25430@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jun 14, 2001 at 07:14:22AM -0400 References: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> Message-ID: <20010614111928.A4560@ute.cnri.reston.va.us> On Thu, Jun 14, 2001 at 07:14:22AM -0400, Guido van Rossum wrote: >Maybe we also have a smaller brain than the typical Lisper -- I would >say, that would make us more normal, and if Python caters to people >with a closer-to-average brain size, that would mean more people will >be able to program in Python. History will decide... I thought it already has, pretty much. --amk From tim at digicool.com Thu Jun 14 18:49:07 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 14 Jun 2001 12:49:07 -0400 Subject: [Python-Dev] PEP 255: Simple Generators Message-ID: You can view an HTML version of PEP 255 here: http://python.sourceforge.net/peps/pep-0255.html Discussion should take place primarily on the Python Iterators list: mailto:python-iterators at lists.sourceforge.net If replying directly to this message, please remove (at least) Python-Dev and Python-Announce. PEP: 255 Title: Simple Generators Version: $Revision: 1.3 $ Author: nas at python.ca (Neil Schemenauer), tim.one at home.com (Tim Peters), magnus at hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators at lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 Post-History: 14-Jun-2001 Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. Specification A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase this in. [XXX spell this out] The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). When a return statement is encountered, nothing is returned, but a StopIteration exception is raised, signalling that the iterator is exhausted. The same is true if control flows off the end of the function. Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions. Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print Q & A Q. Why a new keyword? Why not a builtin function instead? A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new keyword makes that easy. Reference Implementation A preliminary patch against the CVS Python source is available[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html Copyright This document has been placed in the public domain. From guido at digicool.com Thu Jun 14 19:30:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 13:30:42 -0400 Subject: [Python-Dev] Python 2.0.1c1 - GPL-compatible release candidate Message-ID: <200106141730.f5EHUgX03621@odiug.digicool.com> With a sigh of relief I announce Python 2.0.1c1 -- the first Python release in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Note that this is a release candidate. We don't expect any problems, but we're being careful nevertheless. We're planning to do the final release of 2.0.1 a week from now; expect it to be identical to the release candidate except for some dotted i's and crossed t's. Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=39267 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Thu Jun 14 13:46:25 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:46:25 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <02db01c0f4c7$a491c620$0900a8c0@spiff> during a late hacking pass, I was perplexed to realized that r"[\u0000-\uffff]" didn't match any unicode character, and reported it as bug #420011. but a few minutes later, I realized that SRE doesn't support \u and \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works as expected. should I close the bug report, or turn it into a feature request? From fredrik at pythonware.com Thu Jun 14 13:52:26 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:52:26 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> Message-ID: <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Paul wrote: > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +0.5 I still think adding them to the core is okay, but that's me. Cheers /F From gward at python.net Thu Jun 14 22:11:49 2001 From: gward at python.net (Greg Ward) Date: Thu, 14 Jun 2001 16:11:49 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <3B28680B.A46CF171@ActiveState.com>; from paulp@ActiveState.com on Thu, Jun 14, 2001 at 12:30:19AM -0700 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> <3B28680B.A46CF171@ActiveState.com> Message-ID: <20010614161149.C9884@gerg.ca> On 14 June 2001, Paul Prescod said: > I would rather see us try a new approach to regular expressions. I've > seen a few proposals for more verbose-but-readable syntaxes. I think one > was from Greg Ewing? And maybe one from Ping? I remember Ping's from a few year's back. It was pretty cool, but awfully verbose. I *like* the compactness of the One True Regex Language (ie. the one implemented by Perl 5, PCRE, and SRE). > For those of us who use regular expressions only once in a while (i.e. > the lucky ones), the current syntax is a holy terror. Which characters > are magical again? In what contexts? With how many levels of > backslashing? Upper case W versus lower case W? Wow, you should try keeping grep vs. egrep vs. sed vs. awk (which version again?) vs. emacs straight. I generally don't bother: as soon as a problem gets too hairy for grep/sed/awk/etc., I whip out my trusty old friend "perl -e" and all is well again. Unless I'm already coding in Python of course, in which case I whip out my trusty old friend re.compile(), and everything just works. I guess I just have a good memory for line noise. > Obviously we can never abandon the tried and true Perl5 RE module, but I > think we could have another syntax on top. Yeah, I s'pose it could be useful. Yet another great teaching tool, at any rate. Greg -- Greg Ward - Python bigot gward at python.net http://starship.python.net/~gward/ Quick!! Act as if nothing has happened! From greg at cosc.canterbury.ac.nz Fri Jun 15 02:56:50 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 12:56:50 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <20010614161149.C9884@gerg.ca> Message-ID: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Paul Prescod: > I think one > was from Greg Ewing? And maybe one from Ping? I can't remember what my first proposal (many years ago now) was like, but you might like to look at what I'm using in my Plex module: http://www.cosc.canterbury.ac.nz/~greg/python/Plex Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From paulp at ActiveState.com Fri Jun 15 03:36:13 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 18:36:13 -0700 Subject: [Python-Dev] sre improvements References: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Message-ID: <3B29668D.ADFB3C22@ActiveState.com> Greg Ewing wrote: > > Paul Prescod: > > > I think one > > was from Greg Ewing? And maybe one from Ping? > > I can't remember what my first proposal (many years ago > now) was like, but you might like to look at what I'm > using in my Plex module: > > http://www.cosc.canterbury.ac.nz/~greg/python/Plex I would be interested in *both* your regular expression library and your lexer for the Python standard library. But separately. Maybe we need two short PEPs that point to the documentation and suggest how the two packages could be integrated into the standard library. What do you think? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Fri Jun 15 03:49:04 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 13:49:04 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <3B29668D.ADFB3C22@ActiveState.com> Message-ID: <200106150149.NAA03631@s454.cosc.canterbury.ac.nz> > I would be interested in *both* your regular expression library and your > lexer for the Python standard library. But separately. Well, the regular expressions aren't really a separable part of Plex. I mentioned it as a possible source of ideas for anyone working on a new syntax for the regexp stuff. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Fri Jun 15 09:58:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 09:58:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Message-ID: <3B29C037.FB1DB6B8@lemburg.com> Fredrik Lundh wrote: > > Paul wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +0.5 > > I still think adding them to the core is okay, but that's me. What would be the threshold for doing so ? Tamito is actively working on reducing the table sizes of the the codecs and after what I have seen you do on these sort of tables I am pretty sure Tamito can turn these tables into shared libs which are smaller than 200k. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From MarkH at ActiveState.com Fri Jun 15 10:05:26 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 15 Jun 2001 18:05:26 +1000 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B29C037.FB1DB6B8@lemburg.com> Message-ID: > > I still think adding them to the core is okay, but that's me. > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. But isn't this set only one of the many possible Asian codecs? I would have no objection to one 200k module, but if we really wanted to handle "asian codecs" I believe this is only the start. For this reason, I would give a -0 to adding these to the core, and a +1 to adding them to the directory structure proposed by Guido. Mark. From guido at digicool.com Fri Jun 15 18:59:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 12:59:40 -0400 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106151659.MAA30396@cj20424-a.reston1.va.home.com> > during a late hacking pass, I was perplexed to realized that > r"[\u0000-\uffff]" didn't match any unicode character, and reported > it as bug #420011. > > but a few minutes later, I realized that SRE doesn't support \u and > \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works > as expected. > > should I close the bug report, or turn it into a feature request? > > You meant ur"[\u0000-\uffff]", right? (It works the same -- Unicode raw strings still do \u expansion, although the rationale escapes me at the moment -- as does the rationale for why ru"..." is a syntax error...) Looks like a feature request to me. Since \000 and \x00 work in that context, \u0000 would be expected to work. And suppose someone uses u"[\u0000-\u005d]"... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jun 15 21:00:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 15:00:26 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch Message-ID: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> I've checked in Neil's latest generator patch into a branch of the CVS tree. That makes it (hopefully) easier for folks to play with. Tim, can you update the PEP to point to this branch? (There's some boilerplate code about branches in PEP 252 or 253 that you could adapt.) I had to change the code in ceval.c because of recent conflicting changes there. The test suite runs (except test_inspect), but I'd appreciate it if someone (Neil?) could make sure that I didn't overlook anything. (I should probably check the CVS logs. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. If you saw a checkin of Grammar/Grammar in the *head* branch, that was a mistake, and I've already corrected it. From paulp at ActiveState.com Fri Jun 15 21:19:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 15 Jun 2001 12:19:08 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> Message-ID: <3B2A5FAC.C5089CC2@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. Don't forget Chinese (Taiwan and mainland) and Korean! I guess I don't see the big deal in making them separate downloads. We can use distutils to make them easy to install .exe's for Reference Python and PPM for ActivePython. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at lemburg.com Fri Jun 15 22:05:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 22:05:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> <3B2A5FAC.C5089CC2@ActiveState.com> Message-ID: <3B2A6A9B.AC156262@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > What would be the threshold for doing so ? > > > > Tamito is actively working on reducing the table sizes of the the > > codecs and after what I have seen you do on these sort of tables I > > am pretty sure Tamito can turn these tables into shared libs which are > > smaller than 200k. > > Don't forget Chinese (Taiwan and mainland) and Korean! > > I guess I don't see the big deal in making them separate downloads. We > can use distutils to make them easy to install .exe's for Reference > Python and PPM for ActivePython. Ok. BTW, how come www.python.org no longer provides precompiled (contributed) binaries for the various OSes out there ? The FTP server only has these for Python <= 1.5.2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 15 23:39:42 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 15 Jun 2001 17:39:42 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch In-Reply-To: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I've checked in Neil's latest generator patch into a branch of the CVS > tree. That makes it (hopefully) easier for folks to play with. It will for me, and I thank you. > Tim, can you update the PEP to point to this branch? Done. From martin at loewis.home.cs.tu-berlin.de Sat Jun 16 00:17:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 16 Jun 2001 00:17:49 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> > should I close the bug report, or turn it into a feature request? I think the bug report can be closed. Myself, I found it sufficient that you can write normal \u escapes in strings, in particular as you can also use them in raw strings: >>> ur"Ha\u006Clo" u'Hallo' Perhaps not very intuitive, and perhaps even a bug (how do you put a backslash in front of a "u" in a raw unicode string), but useful in this context. Regards, Martin From guido at digicool.com Sat Jun 16 17:46:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 11:46:14 -0400 Subject: [Python-Dev] 2.0.1's GPL-compatibility is official! Message-ID: <200106161546.LAA05521@cj20424-a.reston1.va.home.com> Richard Stallman, Eben Moglen and the FSF agree: Python 2.0.1 is compatible with the GPL. They've updated the text about the Python license on http://www.gnu.org/philosophy/license-list.html, stating in particular: GPL-Compatible, Free Software Licenses [...] The License of Python 1.6a2 and earlier versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that newer versions of Python are under other licenses (see below). The License of Python 2.0.1, 2.1.1, and newer versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that intermediate versions of Python (1.6b1, through 2.0 and 2.1) are under a different license (see below). I would like to emphasize and clarify (again!) that Python is *not* released under the GPL, so if you think the GPL is a bad thing, you don't have to worry about Python being contaminated. The GPL compatibility is important for folks who distribute Python binaries: e.g. the new license makes it okay to release Python binaries linked with GNU readline and other GPL-covered libraries. We'll release the final release of 2.0.1 within a week; so far we've had only one bug reported in the release candidate. I expect that we won't have to wait long for 2.1.1, which will have the same GPL-compatible license as 2.0.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jun 16 18:10:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 12:10:27 -0400 Subject: [Python-Dev] contributed binaries (was: Adding Asian codecs...) Message-ID: <200106161610.MAA05684@cj20424-a.reston1.va.home.com> > BTW, how come www.python.org no longer provides precompiled > (contributed) binaries for the various OSes out there ? > The FTP server only has these for Python <= 1.5.2. There are some binaries for newer versions, mostly Linux RPMs, but these are in different places. I agree the FTP download area is a mess. I propose to give up on the FTP area and start over on the new Zope-based web server, if and when it's ready. Not enough people are helping out, so it's going slowly. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat Jun 16 20:59:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 16 Jun 2001 20:59:52 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions References: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> Message-ID: <3B2BACA7.CDA96737@lemburg.com> "Martin v. Loewis" wrote: > > > should I close the bug report, or turn it into a feature request? > > I think the bug report can be closed. Myself, I found it sufficient > that you can write normal \u escapes in strings, in particular as you > can also use them in raw strings: > > >>> ur"Ha\u006Clo" > u'Hallo' > > Perhaps not very intuitive, and perhaps even a bug (how do you put a > backslash in front of a "u" in a raw unicode string), but useful in > this context. >>> print ur"backslash in front of an 'u': \u005cu" backslash in front of an 'u': \u A double backslash is easier to have: >>> print ur"double backslash in front of an 'u': \\u" double backslash in front of an 'u': \\u Python uses C's convention for \uXXXX where \u is only interpreted as Unicode escape of it is used with an odd number of backslashes in front of it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Mon Jun 18 02:57:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 17 Jun 2001 20:57:53 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <20010614111928.A4560@ute.cnri.reston.va.us> Message-ID: [Guido] > Maybe we also have a smaller brain than the typical Lisper -- I would > say, that would make us more normal, and if Python caters to people > with a closer-to-average brain size, that would mean more people will > be able to program in Python. History will decide... [Andrew Kuchling] > I thought it already has, pretty much. OK, I've kept quiet for days, but can't bear it any longer: Andrew, are you waiting for someone to *force* you to immortalize this exchange in your Python Quotes collection? If so, the PSU knows where you liv From mal at lemburg.com Mon Jun 18 12:14:04 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 18 Jun 2001 12:14:04 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> Message-ID: <3B2DD46C.EEC20857@lemburg.com> Guido van Rossum wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +1, as long as they're not in the CVS subtree that's normally > extracted for a regular source distribution. I propose this location > in the CVS tree: > > python/dist/encodings/... > > (So 'encodings' would be a sibling of 'src', which has been pretty > lonely ever since I started using CVS. ;-) Ok. When Tamito has completed his work on the codecs (he is currently reimplementing them in C), I'll check them in under the new directory. BTW, how should we ship these codecs ? I'd propose to provide a distutils setup.py file which wraps up all codecs under encodings and can be used to create a standard Python add-on "Python-X.X Encoding Add-on". The generated files should then ideally be published right next to the Python source/binary links on the python.org web-pages to achieve high visibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Mon Jun 18 14:25:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 18 Jun 2001 08:25:35 -0400 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: Your message of "Mon, 18 Jun 2001 12:14:04 +0200." <3B2DD46C.EEC20857@lemburg.com> References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> <3B2DD46C.EEC20857@lemburg.com> Message-ID: <200106181225.IAA15518@cj20424-a.reston1.va.home.com> > Ok. When Tamito has completed his work on the codecs (he is currently > reimplementing them in C), I'll check them in under the new directory. Excellent! > BTW, how should we ship these codecs ? > > I'd propose to provide a distutils setup.py file which wraps up > all codecs under encodings and can be used to create a standard > Python add-on "Python-X.X Encoding Add-on". Sounds like a good plan. > The generated files should then ideally be published right next > to the Python source/binary links on the python.org web-pages to > achieve high visibility. Sure, for some defininition of "right next to" :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jun 18 16:35:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 18 Jun 2001 16:35:12 +0200 Subject: [Python-Dev] Moshe Message-ID: <20010618163512.D8098@xs4all.nl> Just FYI: Moshe has been sighted, alive and well. He's been caught up in personal matters, apparently. He apologized and said he'd mail python-dev with an update soonish. Don't-you-wish-you-lurked-on-#python-too-ly y'rs ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From m.favas at per.dem.csiro.au Mon Jun 18 23:28:23 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 05:28:23 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? Message-ID: <3B2E7277.D6109E7E@per.dem.csiro.au> [Platform: Tru64 Unix, Compaq C compiler) The current CVS of 2.2a0 fails test_struct for me with: test test_struct failed -- pack('>i', -2147483649) did not raise error more extensively, trying std iI on -2147483649 == 0xffffffff7fffffff Traceback (most recent call last): File "Lib/test/test_struct.py", line 367, in ? t.run() File "Lib/test/test_struct.py", line 353, in run self.test_one(x) File "Lib/test/test_struct.py", line 269, in test_one any_err(pack, ">" + code, x) File "Lib/test/test_struct.py", line 38, in any_err raise TestFailed, "%s%s did not raise error" % ( test_support.TestFailed: pack('>i', -2147483649) did not raise error A 64-bit platform issue? Also, the current imap.py causes "make test" (test___all__ and test_sundry) to fail with: "exceptions.TabError: inconsistent use of tabs and spaces in indentation (imaplib.py, line 576)" - untested checkin ? -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim at digicool.com Tue Jun 19 00:04:06 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 18 Jun 2001 18:04:06 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: [Mark Favas] > [Platform: Tru64 Unix, Compaq C compiler) > The current CVS of 2.2a0 fails test_struct for me with: > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > more extensively, > trying std iI on -2147483649 == 0xffffffff7fffffff > Traceback (most recent call last): > File "Lib/test/test_struct.py", line 367, in ? > t.run() > File "Lib/test/test_struct.py", line 353, in run > self.test_one(x) > File "Lib/test/test_struct.py", line 269, in test_one > any_err(pack, ">" + code, x) > File "Lib/test/test_struct.py", line 38, in any_err > raise TestFailed, "%s%s did not raise error" % ( > test_support.TestFailed: pack('>i', -2147483649) did not raise error > > A 64-bit platform issue? In test_struct.py, please change this line (right after "class IntTester"): BUGGY_RANGE_CHECK = "bBhHIL" to BUGGY_RANGE_CHECK = "bBhHiIlL" and try again. I suspect you're bumping into a pre-existing bug that simply wasn't checked before (and, yes, there's A Reason it *may* screw up on a 64-bit box but not a 32-bit one). Note that since in standard mode, "i" is considered to be a 4-byte int regardless of platform, we really *should* bitch about trying to pack -2147483649 under "i" (but we don't -- and in general no codes except the new q/Q reliably bitch about out-of-range errors in the standard modes). > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? Leaving that to some loser who cares about whitespace . From m.favas at per.dem.csiro.au Tue Jun 19 00:11:37 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 06:11:37 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? References: Message-ID: <3B2E7C99.E9BEFC3C@per.dem.csiro.au> [Tim Peters suggests] > > [Mark Favas] > > [Platform: Tru64 Unix, Compaq C compiler) > > The current CVS of 2.2a0 fails test_struct for me with: > > > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > In test_struct.py, please change this line (right after "class IntTester"): > > BUGGY_RANGE_CHECK = "bBhHIL" > > to > > BUGGY_RANGE_CHECK = "bBhHiIlL" > > and try again. Yep, passes with this change. > > Also, the current imap.py causes "make test" (test___all__ and > > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > > tabs and spaces in indentation (imaplib.py, line 576)" - untested > > checkin ? > > Leaving that to some loser who cares about whitespace . Guess we'll have to advertise widely, then . -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From barry at digicool.com Tue Jun 19 00:28:21 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 18 Jun 2001 18:28:21 -0400 Subject: [Python-Dev] Bogosities in quopri module? Message-ID: <15150.32901.611349.524220@yyz.digicool.com> I've been playing a bit with the quopri module (trying to support RFC 2047 in mimelib), and I've run across a few bogosities that I'd like to fix. Fixing some of them could break code, so I wanted to see what people think first. First, quopri should have encodestring() and decodestring() functions which take a string and return a string. This would make it more consistent API-wise with e.g. base64. One difference is that quopri.encodestring() should probably take a default argument quotetabs (defaulted to 1) for passing to the encode() function. This shouldn't be very controversial. I think there are two problems with encode(). First, it always tacks on an extra \n character, such that an encode->decode roundtrip is not idempotent. I propose fixing this so that encode() doesn't add the extra newline, but this can break code that expects that newline to be present. Third, I think that encode()'s quotetabs flag should also apply to spaces. RFC 1521 says that both ASCII tabs and spaces may be encoded, and I don't think it's worthwhile that there be a separate flag to independently choose to encode tabs or spaces. Lastly, if you buy the extra-newline solution above, then encode() has to be fixed w.r.t. trailing spaces and tabs. Currently, an encode->decode roundtrip for, e.g. "hello " returns "hello =\n", but what it should really return is "hello=20". Likewise "hello\t" should return "hello=09". The patches must take multiline strings into account though, so that it doesn't chomp newlines out of """hello great big world """ I haven't worked up a patch yet, but when I do I'll upload it to SF to get some feedback. I think there are a few other things in the module that could be cleaned up. I also plan to add a test_quopri.py. Comments? -Barry From see at my.signature Tue Jun 19 08:21:14 2001 From: see at my.signature (Greg Ewing) Date: Tue, 19 Jun 2001 18:21:14 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Something is bothering me about this. In fact, it's bothering me a LOT. In the following, will f() work as a generator-function: def f(): for i in range(5): g(i) def g(i): for j in range(10): yield i,j If I understand PEP255 correctly, this will *not* work. But it seems entirely reasonable to me that it *should* work. It *has* to work, otherwise how am I to write generators that are too complicated to fit into a single function? Someone please tell me I'm wrong about this! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From jepler at inetnebr.com Tue Jun 19 15:25:23 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Tue, 19 Jun 2001 08:25:23 -0500 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619082522.A12200@inetnebr.com> On Tue, Jun 19, 2001 at 06:21:14PM +1200, Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. But it seems entirely reasonable to me that > it *should* work. It *has* to work, otherwise how > am I to write generators that are too complicated > to fit into a single function? The following similar code seems to produce the results you have in mind. def f(): for i in range(5): #g(i) #yield g(i) for x in g(i): yield x def g(i): for j in range(10): yield i, j It would be nice to have a succinct way to say 'for dummy in iterator: yield dummy'. Maybe 'yield from iterator'? Then f would become: def f(): for i in range(5): yield from g(i) Jeff PS I noticed that the generator branch got merged into the trunk. Cool! From fdrake at acm.org Tue Jun 19 15:24:46 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 09:24:46 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 Message-ID: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> I built GCC 3.0 last night, and Python built and passed the regression tests. I've not done any further comparisons, but using --with-cxx=... failed; the C++ ABI changed and a new version of the C++ runtime is required before that will work. I didn't want to install that over my working installation, just in case. ;-) I'll report more as I find out more. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at python.ca Tue Jun 19 16:00:39 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 07:00:39 -0700 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619070039.A13712@glacier.fnational.com> Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. No, it will not work. The title of PEP 255 is "Simple Generators". What you want will require something like stackless in order to get the C stack out of the way. That's a major change to the Python internals. To make your example work you need to do: def f(): for i in range(5): for j in g(i): yield j def g(i): for j in range(10): yield i,j Stackless may still be in Python's future but no for 2.2. Neil From barry at digicool.com Tue Jun 19 16:19:58 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 19 Jun 2001 10:19:58 -0400 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> Message-ID: <15151.24462.400930.295658@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I built GCC 3.0 last night, and Python built and passed Fred> the regression tests. Hey, you were actually able to download it!? :) I couldn't get an ftp connection for the longest time and finally gave up. It'd be interesting to see if there are any performance improvements, esp. on x86 boxen. -Barry From fdrake at acm.org Tue Jun 19 17:07:48 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 11:07:48 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.24462.400930.295658@anthem.wooz.org> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> Message-ID: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Barry A. Warsaw writes: > It'd be interesting to see if there are any performance > improvements, esp. on x86 boxen. GCC 2.95.3: cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.58 This machine benchmarks at 6329.11 pystones/second 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (280major+241minor)pagefaults 0swaps GCC 3.0: cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.65 This machine benchmarks at 6060.61 pystones/second 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (307major+239minor)pagefaults 0swaps There is a little variation with multiple run, but it varies less than 5% from the numbers above. Bumping up the LOOPS constant in pystone.py changes the numbers a small bit, but the relationship remains constant. This is one a Linux-Mandrake 7.2 installation with non-cooker updates installed, and still using the Linux 2.2 kernel: cj42289-a(.../python/linux-gcc-3.0); uname -a Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dan at cgsoftware.com Tue Jun 19 18:19:14 2001 From: dan at cgsoftware.com (Daniel Berlin) Date: 19 Jun 2001 12:19:14 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> ("Fred L. Drake, Jr."'s message of "Tue, 19 Jun 2001 11:07:48 -0400 (EDT)") References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <87vglsbfy5.fsf@cgsoftware.com> "Fred L. Drake, Jr." writes: > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. Except, I bet you didn't use one of the "optimize for a given cpu" switches. Try adding -mpentiumpro -march=pentiumpro to your compiler flags. Otherwise, it's scheduling for a 386. And the old x86 backend wasn't all that bad at scheduling for the 386. Hell, i'm not that bad at scheduling for a 386. :) --Dan > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown > > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Digital Creations > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- "If all the nations in the world are in debt, where did all the money go? "-Steven Wright From mal at lemburg.com Tue Jun 19 18:55:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 18:55:47 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <3B2F8413.77F40494@lemburg.com> "Fred L. Drake, Jr." wrote: > > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown Note that if you really want to see a speedup for x86 boxes then you should take a look at PGCC, the Pentium GCC compiler group: http://www.goof.com/pcg/ You can then adjust the compiler to various x86 CPUs and take advantage of some special optimizations they have intergrated into 2.95.2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Tue Jun 19 19:44:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 12:44:47 -0500 Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <15151.36751.406758.577420@beluga.mojam.com> I am trying to add a module interface to some of the bits missing from PyGtk2. Some functions I'm interested in have varargs signatures, e.g.: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) From fdrake at acm.org Tue Jun 19 21:04:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 15:04:18 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <87vglsbfy5.fsf@cgsoftware.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> Message-ID: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Daniel Berlin writes: > Except, I bet you didn't use one of the "optimize for a given cpu" > switches. No, I hadn't. My main interest was in the GCC team's claim that the generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" did not make much difference at all. M.-A. Lemburg writes: > Note that if you really want to see a speedup for x86 boxes then > you should take a look at PGCC, the Pentium GCC compiler group: > > http://www.goof.com/pcg/ > > You can then adjust the compiler to various x86 CPUs and > take advantage of some special optimizations they have intergrated > into 2.95.2.1. If they have any improved optimizations for recent x86 chips, I'd like to see them folded into GCC. I'd hate to see another egcs-style split. It doesn't look like I can just download a single source package from them and wait 3 hours for it to build, so I won't plan on pursuing this further. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim at digicool.com Tue Jun 19 21:14:10 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:14:10 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] > GCC 2.95.3: > This machine benchmarks at 6329.11 pystones/second > ... > GCC 3.0: > This machine benchmarks at 6060.61 pystones/second > ... > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 > 13:16:08 CEST 2000 i686 unknown This is a good place to note that the single biggest "easy win" for pystone is to run it with -O (that is, Python's -O). Yields a 10% boost on Fred's box, and about 7% on MSVC6+Win2K. pystone is more sensitive to -O than most "real Python apps", probably because it's masses of very simple operations on scalar types -- no real classes, no dicts, no lists except to simulate fixed-size C arrays, lots of globals, and so on. The dynamic frequency of SET_LINENO is high, and the avg work per other opcode is low. OTOH, that's typical of *some* Python apps, and typical of *parts* of almost all Python apps. So it would be worth getting ridding of SET_LINENO even in non- -O runs. Note that SET_LINENO isn't needed to get correct line numbers in tracebacks (and hasn't been needed for years), it's "just" there to support tracing now. Vladimir had what looked to be a workable scheme for doing that a different way, and that would be a cool project for someone to revive (IMO -- Guido's may differ, but he's too busy to notice what we're doing ). From michel at digicool.com Tue Jun 19 21:12:14 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 19 Jun 2001 12:12:14 -0700 (PDT) Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: On Tue, 19 Jun 2001, Mark Favas wrote: > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? I submitted a patch right on this line the other day that Guido applied, but I tested it and niether test___all__ nor test_sundry fail for me today. -Michel From mal at lemburg.com Tue Jun 19 21:28:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 21:28:14 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <3B2FA7CE.DD1602F7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Daniel Berlin writes: > > Except, I bet you didn't use one of the "optimize for a given cpu" > > switches. > > No, I hadn't. My main interest was in the GCC team's claim that the > generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" > did not make much difference at all. > > M.-A. Lemburg writes: > > Note that if you really want to see a speedup for x86 boxes then > > you should take a look at PGCC, the Pentium GCC compiler group: > > > > http://www.goof.com/pcg/ > > > > You can then adjust the compiler to various x86 CPUs and > > take advantage of some special optimizations they have intergrated > > into 2.95.2.1. > > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. > It doesn't look like I can just download a single source package > from them and wait 3 hours for it to build, so I won't plan on > pursuing this further. Oh, it's fairly easy to get a pgcc compiler: all you have to do is apply their small set of patches to the gcc source before compiling it. And then you should set your OPT environment variable to e.g. OPT="-g -O3 -Wall -Wstrict-prototypes -mcpu=k6" This will cause the pgcc compiler to use these settings in pretty much all compiles you ever do without having to think about it every time. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Tue Jun 19 21:36:41 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:36:41 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: Message-ID: [Michel Pelletier] > I submitted a patch right on this line the other day that Guido applied, > but I tested it and niether test___all__ nor test_sundry fail for me > today. Not to worry! I fixed all this stuff yesterday. imaplib.py had an ambiguous mix of hard tabs and spaces, which Guido "should have" caught before checking in, and that Python itself complained about when run with -tt (which is how Mark ran the test suite). There's no problem anymore. From nas at python.ca Tue Jun 19 22:37:18 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 13:37:18 -0700 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 19, 2001 at 03:04:18PM -0400 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <20010619133718.A14814@glacier.fnational.com> Fred L. Drake, Jr. wrote: > Compiling with "make OPT='-mcpu=i686 -O3'" did not make much > difference at all. Try OPT="-m486 -O2". That gave me the best results last time I played with this stuff. > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. Some people say you should avoid PGCC since it generates buggy code. I don't know if that's true or not. Neil From thomas at xs4all.net Tue Jun 19 23:04:46 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 19 Jun 2001 23:04:46 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message-ID: <20010619230446.E8098@xs4all.nl> On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > The test used int(time.time()) to get a random number, but this doesn't > work on the mac (where times are bigger than ints). Changed to > int(time.time()%1000000). Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be degrading the sequentiality of this particularly unrandom random number on platforms where ints really are big enough to hold times :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Tue Jun 19 23:25:26 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 19 Jun 2001 23:25:26 +0200 (MEST) Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> > The only place in the standard modules I saw that processed a truly > arbitrary number of arguments is the struct_pack method of the > struct module, and it doesn't use PyArg_Parse* to process them. Can > someone point me to an example of marshalling arbitrary numbers of > arguments then calling a varargs function? In a true varargs function, you cannot use PyArg_Parse*. Instead, you have to iterate over the argument tuple with PyTuple_GetItem, fetching one argument after another. Another example of such a function is builtin_max. > (I'll worry about calling gtk_binding_entry_add_signal after I > figure out how to marshal the args.) I'd worry about this first: In C, it is not possible to call a true varargs function in a portable way if the caller doesn't statically (i.e. in source code) know the number of arguments. Only the callee can be variable, not the caller. A slight exception is that you are allowed to pass-through va_list objects from one function to another. However, that requires that the callee expects a va_list argument, i.e. is not a varargs function, plus there is no portable way to create a va_list object from scratch. If you absolutely need to call such a function, you can use the Cygnus libffi function, which, for a certain number of microprocessors and C ABIs, allows to call arbitrary function pointers. However, I'd rather recommend to look for alternatives to gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall accepts a GSList*, which is a chained list of arguments, instead of being varargs. This you can call in a C module - the other one is out of reach. Regards, Martin From skip at pobox.com Tue Jun 19 23:32:50 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 16:32:50 -0500 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <20010619133718.A14814@glacier.fnational.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> <20010619133718.A14814@glacier.fnational.com> Message-ID: <15151.50434.297860.277726@beluga.mojam.com> Neil> Some people say you should avoid PGCC since it generates buggy Neil> code. I don't know if that's true or not. If nothing else, PGCC almost certainly gets a lot less exercise than the mainstream GCC code. Given the statement in the PGCC FAQ that typical speedups are on the range of 5%: http://www.goof.com/pcg/pgcc-faq.html#SEC0119 it doesn't seem like it would be worth the effort to use it in any critical applications. Better to just wait for PGCC optimizations to trickle into GCC itself. Skip From jack at oratrix.nl Tue Jun 19 23:56:43 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 19 Jun 2001 23:56:43 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message by Thomas Wouters , Tue, 19 Jun 2001 23:04:46 +0200 , <20010619230446.E8098@xs4all.nl> Message-ID: <20010619215648.B2A7CE267B@oratrix.oratrix.nl> Recently, Thomas Wouters said: > On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > > > The test used int(time.time()) to get a random number, but this doesn't > > work on the mac (where times are bigger than ints). Changed to > > int(time.time()%1000000). > > Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be > degrading the sequentiality of this particularly unrandom random number on > platforms where ints really are big enough to hold times :) I think the last sentence should be "... platforms where time before 1970 doesn't exist so they can fit it in a measly 32 bits":-) But anyway: I haven't a clue whether the sequentiality is important, it doesn't really seem to be from a quick glance. If you want to fix it: allez votre corridor. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip at pobox.com Wed Jun 20 00:01:13 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 17:01:13 -0500 Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> Message-ID: <15151.52137.623119.852524@beluga.mojam.com> >> The only place in the standard modules I saw that processed a truly >> arbitrary number of arguments is the struct_pack method of the struct >> module, and it doesn't use PyArg_Parse* to process them. Can someone >> point me to an example of marshalling arbitrary numbers of arguments >> then calling a varargs function? Martin> In a true varargs function, you cannot use PyArg_Parse*. Martin> Instead, you have to iterate over the argument tuple with Martin> PyTuple_GetItem, fetching one argument after another. I think it would be nice if PyArg_ParseTuple and friends took a "*" format character. It would only be useful at the end of a format string, but would allow the generic argument parsing machinery to be used for those arguments that precede it. The argument it writes into would be an int, which would represent the offset of the first argument not processed by PyArg_ParseTuple. Reusing my example: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) If I had a Python module wrapper function for this it might call PyArg_ParseTuple as PyArg_ParseTuple(args, "iis*", &keyval, &modifiers, &signal_name, &offset); Processing of the rest of the argument list would be the responsibility of the author and start at args[offset]. >> (I'll worry about calling gtk_binding_entry_add_signal after I figure >> out how to marshal the args.) Martin> I'd worry about this first: In C, it is not possible to call a Martin> true varargs function in a portable way if the caller doesn't Martin> statically (i.e. in source code) know the number of Martin> arguments. Only the callee can be variable, not the caller. Understood. It turns out that the function I used as an example is actually only called in a few distinct ways. I can analyze its var-arguments fairly easily and dispatch to the appropriate call to the underlying function. Martin> However, I'd rather recommend to look for alternatives to Martin> gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall Martin> accepts a GSList*, which is a chained list of arguments, instead Martin> of being varargs. This you can call in a C module - the other Martin> one is out of reach. Hmm... thanks, this does look like the correct solution. I failed to notice the distinction between the two functions when I first scanned the source code, the signall (two-els) version is never called outside of gtkbindings.c, the Gtk documentation in this area is, well, rather sparse, to say the least (nine comments over 1200 lines of code, the only two substatial ones of which are boilerplate at the top), and there is no reference manual documentation for any of the interesting functions. By comparison, the Python documentation looks as if Guido has employed a team of full-time tech writers for years. Way to go, Fred! Skip From nas at python.ca Wed Jun 20 00:12:49 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 15:12:49 -0700 Subject: [Python-Dev] OS timer and profiling Python code Message-ID: <20010619151249.A15126@glacier.fnational.com> On x86 hardware the Linux timer runs at 100 Hz by default. On modern hardware that is probably much too slow to accurately profile programs using the Python profiler. Changing the value in include/asm-i386/param.h from 100 to 1024 and recompiling the kernel made a huge difference for me. Perhaps we should include a note in the profiler documentation. I'm not sure if this affects gprof as well but I suspect it does. Neil From moshez at zadka.site.co.il Wed Jun 20 07:31:23 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 20 Jun 2001 08:31:23 +0300 Subject: [Python-Dev] Moshe In-Reply-To: <20010618163512.D8098@xs4all.nl> References: <20010618163512.D8098@xs4all.nl> Message-ID: On Mon, 18 Jun 2001 16:35:12 +0200, Thomas Wouters wrote: > Just FYI: Moshe has been sighted, alive and well. He's been caught up in > personal matters, apparently. He apologized and said he'd mail python-dev > with an update soonish. Yes, indeed, and soonish got sorta delayed too... Anyway, I am alive and well, and the bad guys will have to do better then 300m to get me in an explosion ;-) Anyway, I'm terribly sorry for disappearing - my personal life caught up with me and stuff. I'm now trying to catch up with everything. Thanks to whoever took 2.0.1 from where I left off and kept it going. -- "I'll be ex-DPL soon anyway so I'm |LUKE: Is Perl better than Python? looking for someplace else to grab power."|YODA: No...no... no. Quicker, -- Wichert Akkerman (on debian-private)| easier, more seductive. For public key, finger moshez at debian.org |http://www.{python,debian,gnu}.org From greg at cosc.canterbury.ac.nz Wed Jun 20 07:55:28 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 17:55:28 +1200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Tim Peters wrote: > > Who would this help? Seriously. There's nothing special about a generator > to a caller, except that it returns an object that implements the iterator > interface. What matters to the caller is irrelevant here. We're talking about what matters to someone writing or reading the implementation. To those people, there is a VERY big difference between a regular function and a generator-function -- about as big as the difference between a class and a function! In fact, a generator-function is in many ways much more like a class than a function. Calling a generator-function doesn't execute any of the code in its body; instead, it creates an instance of the generator, much like calling a class creates an instance of the class. Calling them "generator classes" and "generator instances" would perhaps be more appropriate, and more suggestive of the way they actually behave. The more I think about this, the more I agree with those who say that overloading the function-definition syntax for defining generators is a bad idea. It seems to make about as much sense as saying that there shouldn't be any special syntax for defining a class -- the header of a class definition should look exactly like a function definition, and to tell the difference you have to look for some subtle clue further down. I suggest dropping the "def" altogether and using: generator foo(args): ... yield x ... Right from the word go, this says loudly and clearly that this thing is *not* a function, it's something else. If you haven't come across generators before, you go and look in the manual to find out what it means. There you're told something like Executing a generator statement creates a special callable object called a generator. Calling a generator creates a generator-instance, which is an iterator object... [...stuff about the "yield" statement...] I think this is going to be easier to document and lead to much less confusion than trying to explain the magic going on when you call something that looks for all the world like a function and it doesn't execute any of the code in it. Explicit is better than implicit! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg at cosc.canterbury.ac.nz Wed Jun 20 08:17:09 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:17:09 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B303FE5.735A5FDC@cosc.canterbury.ac.nz> Tim Peters wrote: > > This is like saying that functions returning integers should be declared > "defint" instead, or some such gibberish. Not the same thing. If a function returns an integer, somewhere in it or in something that it calls there is a piece of code that explicitly creates an integer. But under PEP 255, there is *nothing* anywhere in the code that you can point to and say "look, here is where the generator-iterator is created!" Instead, it happens implicitly at some point just after the generator-function is called, but before any of its code is executed. You could say that the same thing is true when you call a class object -- creation of the instance happens implicitly before __init__ is called. But there is no secret made of the fact that classes are not functions, and there is nothing in the syntax to lead you to believe that they behave like functions. In contrast, the proposed generator syntax makes generators look so nearly like functions that their actual behaviour, once you get your head around it, seems quite bizarre. I just think it's going to lead to a lot of confusion and misunderstanding, among newcomers especially. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg at cosc.canterbury.ac.nz Wed Jun 20 08:28:13 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:28:13 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <3B30427D.5A90DDE7@cosc.canterbury.ac.nz> Olaf Delgado Friedrichs wrote: > > If I understand correctly, this should work: > > def f(): > for i in range(5): > for x in g(i): > yield x > > def g(i): > for j in range(10): > yield i,j Yes, I realised that shortly afterwards. But I think we're going to get a lot of questions from newcomers who have tried to implicitly nest iterators and are very confused about why it doesn't work and what needs to be done to make it work. An explicit generator definition syntax would help here, I think. First of all, it would be a syntax error to use "yield" outside of a generator definition, so they would be forced to declare the inner one as a generator. Then, if they neglected to make the outer one a generator too, it would look like this: def f(): for i in range(5): g(i) generator g(i): for j in range(10): yield i,j from which it is glaringly obvious that f() is NOT a generator, and therefore can't be used as one. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From loewis at informatik.hu-berlin.de Wed Jun 20 12:27:30 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Wed, 20 Jun 2001 12:27:30 +0200 (MEST) Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <15151.52137.623119.852524@beluga.mojam.com> (message from Skip Montanaro on Tue, 19 Jun 2001 17:01:13 -0500) References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> <15151.52137.623119.852524@beluga.mojam.com> Message-ID: <200106201027.MAA06782@pandora.informatik.hu-berlin.de> > I think it would be nice if PyArg_ParseTuple and friends took a "*" format > character. It would only be useful at the end of a format string, but would > allow the generic argument parsing machinery to be used for those arguments > that precede it. Now I understand. Yes, that would be useful, but apparently was not required often enough so far to make somebody ask for it. Regards, Martin From aahz at rahul.net Wed Jun 20 15:00:08 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 06:00:08 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> from "Greg Ewing" at Jun 20, 2001 05:55:28 PM Message-ID: <20010620130008.7880D99C88@waltz.rahul.net> Greg Ewing wrote: > > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... +2 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From nas at python.ca Wed Jun 20 16:28:20 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 20 Jun 2001 07:28:20 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: ; from tim_one@users.sourceforge.net on Tue, Jun 19, 2001 at 11:57:34PM -0700 References: Message-ID: <20010620072820.A16584@glacier.fnational.com> Tim Peters wrote: > gen_iternext(): repair subtle refcount problem. > NeilS, please check! This came from staring at your genbug.py, but I'm > not sure it plugs all possible holes. Without this, I caught a > frameobject refcount going negative, and it was also the cause (in debug > build) of _Py_ForgetReference's attempt to forget an object with already- > NULL _ob_prev and _ob_next pointers -- although I'm still not entirely > sure how! Doesn't this cause a memory leak? f_back is INCREFed in PyFrame_New. There are other problems lurking here as well. def f(): try: yield 1 finally: print "finally" def h(): g = f() g.next() while 1: h() The above code leaks memory like mad, with or without your change. Also, the finally clause is never executed although it probably should be. My feeling is that the reference counting of f_back should be done by ceval and not by the frame object. The problem with the finally clause is another ball of wax. I think its fixable though. I'll look at it closer this evening. Neil From tim.one at home.com Wed Jun 20 16:28:19 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:28:19 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > ... Why is this on Python-Dev? The PEP announcement specifically asked for discussion to occur on the Iterators list, and specifically asked to keep it *off* of Python-Dev. I've been playing along with people who wanted to discuss it on c.l.py instead, as finite time allows, but no way does the discussion belong here. From arigo at ulb.ac.be Wed Jun 20 16:30:49 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Wed, 20 Jun 2001 16:30:49 +0200 (MET DST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: Hi, On Wed, 20 Jun 2001, Greg Ewing wrote: > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... Nice idea. We might even think about dropping the 'yield' keyword altogether and using 'return' instead (althought I'm not quite sure it is a good idea; I'm just suggesting it with a personal -0.5). A bientot, Armin. From tim.one at home.com Wed Jun 20 16:41:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:41:13 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: <20010620072820.A16584@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Doesn't this cause a memory leak? f_back is INCREFed in > PyFrame_New. There are other problems lurking here as well. > ... Our msgs crossed in the mail. Unfortunately, I have to get off email now and probably won't get on again before this evening. Tracebacks appear to be a potential problem too ... we'll-reinvent-stackless-before-this-is-over<0.9-wink>-ly y'rs - tim From barry at digicool.com Wed Jun 20 18:35:49 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 12:35:49 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: <15152.53477.212348.243592@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> What matters to the caller is irrelevant here. We're talking GE> about what matters to someone writing or reading the GE> implementation. To those people, there is a VERY big GE> difference between a regular function and a GE> generator-function -- about as big as the difference GE> between a class and a function! GE> In fact, a generator-function is in many ways much more GE> like a class than a function. Calling a generator-function GE> doesn't execute any of the code in its body; instead, it GE> creates an instance of the generator, much like calling GE> a class creates an instance of the class. Calling them GE> "generator classes" and "generator instances" would GE> perhaps be more appropriate, and more suggestive of the GE> way they actually behave. Thanks Greg, I think you've captured perfectly my discomfort with the proposal. I'm fine with return being "special" inside a generator, along with most of the other details of the pep. But it bugs me that the semantics of calling the thing created by `def' is different depending on some statement embedded deep in the body of the code. Think about it from a teaching perspective: You're taught that def creates a function, perhaps called foo. You know that calling foo starts execution at the first line in the function block. You know you can put a print statement on the first line and it will print something out when the function is called. You know that you can set a debugger break point at foo's first line and when you call the function, the debugger will leave you on that first line of code. But all that changes with a generator! My print statement isn't executed when I call the function... how weird! Hey, the debugger doesn't even break on the line when I call the function. Okay, maybe it's some /other/ foo my program is really calling. So let's hunt around for other possible foo's that my program might be calling. Hmm, no dice there. Now I'm really confused because I haven't gotten to the chapter that says "Now that you know all about functions, forget most of that if you find a yield statement in the body of the function, because it's a special kind of function called a generator. Calling such a special function doesn't execute any code, it just instantiates a built-in object called a generator object. To get any of the generator's code to execute, you have to call the generator object's next() method." Further, I print out the type of the object returned by calling foo and I see it's a . Okay, so now let me search foo for a return statement. Because I know about functions, and I know that the returned object isn't None, I know that the function isn't falling off the end. So there must be a return statement that explicitly returns a generator object (whatever that is). Hmm, nope, there's just a bare return sitting there. That's damn confusing. I wonder what those yield statements are doing. Well, I look those up in my book's index and I see that's described in chapter 57, which I haven't gotten to yet. Besides, those yields clearly have integers after them, so that can't be it. So how the heck do I get a generator object by calling this function??? You'll counter that the "search for yield to find out if the function is special" is a simple rule, once learned is easily remembered. I'll counter that it's harder for me to do an Isearch in XEmacs to find out what kind of thing foo is. :) To me, it's just bad mojo to have the behavior of the thing created by `def' determined by what's embedded in the body of the program. I don't buy the defint argument, because by searching for a return statement in the function, you can find out exactly what is being returned when the function is called. Not so with a generator. My vote is for a "generator" keyword to introduce the code block of a generator. Makes perfect sense to me, and it will be a strong indication to anybody reading my code that something special is going on. And something special /is/ going on! An informal poll of PythonLabs indicates a split on this subject, perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But who said this was a democracy anyway? :) somewhat-like-my-own-country-of-origin-ly y'rs, -Barry From tim at digicool.com Wed Jun 20 18:42:00 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 12:42:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. From fredrik at pythonware.com Wed Jun 20 18:54:22 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 20 Jun 2001 18:54:22 +0200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> <15152.53477.212348.243592@anthem.wooz.org> Message-ID: <006d01c0f9a9$a879fcd0$4ffa42d5@hagrid> barry wrote: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! agreed. +1 on generator instead of def. (and +0 on suspend instead of yield, but that's me) Cheers /F From jeremy at alum.mit.edu Wed Jun 20 19:25:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 13:25:05 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: Why can't we discuss Python development on python-dev? please-take-replies-to-python-dev-meta-ly y'rs, Jeremy -----Original Message----- From: python-dev-admin at python.org [mailto:python-dev-admin at python.org]On Behalf Of Tim Peters Sent: Wednesday, June 20, 2001 12:42 PM To: Barry A. Warsaw Cc: python-dev at python.org Subject: RE: [Python-Dev] Suggested amendment to PEP 255 Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev From tim at digicool.com Wed Jun 20 20:28:17 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 14:28:17 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: [Jeremy Hylton] > Why can't we discuss Python development on python-dev? You can, but without me in this case. The arguments aren't new (they were discussed on the Iterators list before the PEP was posted), and I don't have time to repeat them on (now three) different forums. The PEP announcement clearly said discussion belonged on the Iterators list, specifically asked that it stay off of Python-Dev, and the PEP Discussion-To field (which I assume Barry filled in -- I did not) reads Discussion-To: python-iterators at lists.sourceforge.net If you want a coherent historic record (I do), that's where this belongs. From aahz at rahul.net Wed Jun 20 20:37:49 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 11:37:49 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: from "Jeremy Hylton" at Jun 20, 2001 01:25:05 PM Message-ID: <20010620183749.B419E99C82@waltz.rahul.net> Jeremy Hylton wrote: > > Why can't we discuss Python development on python-dev? I'm split on this issue. I understand why Tim wants to have the discussion corralled into a single place; it's also a moderate inconvenience to have to add another mailing list every time a "critical" issue comes up. I think the best compromise is to follow the rules currently in existence for the PEP process, and if one doesn't wish to subscribe to another mailing list, e-mail one's feedback to the PEP author directly and raise bloody hell if the next PEP revision doesn't include a mention of the feedback. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry at digicool.com Wed Jun 20 21:07:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 15:07:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <15152.62548.504923.152041@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> and the PEP Discussion-To field (which I assume Barry filled TP> in -- I did not) reads Not me. I believe it was in Magnus's original version of the PEP. But I do think that now that the code is in the main CVS trunk, it is appropriate to remove the Discussion-To: header and redirect comments back to python-dev. That may be difficult in practice however. -Barry From jack at oratrix.nl Wed Jun 20 23:52:16 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 20 Jun 2001 23:52:16 +0200 Subject: [Python-Dev] _PyTrace_init declaration Message-ID: <20010620215221.1697FE267B@oratrix.oratrix.nl> I'm getting "no prototype" warnings on _PyTrace_init, and inspection shows that this routine indeed doesn't show up in an include file. As it is used elsewhere (in sysmodule.c) shouldn't it be called PyTrace_init and have it's prototype declared somewhere? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tim.one at home.com Thu Jun 21 00:31:10 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 18:31:10 -0400 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: [Jack Jansen] > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? It should indeed be declared in ceval.h (Fred?), but so long as it's part of the private API it should not lose the leading underscore. From thomas at xs4all.net Thu Jun 21 00:29:51 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 21 Jun 2001 00:29:51 +0200 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <20010621002951.H8098@xs4all.nl> On Wed, Jun 20, 2001 at 11:52:16PM +0200, Jack Jansen wrote: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No, and yes. the _Py* functions are internal, but non-static (used in other files.) They should have a prototype declared somewhere, but they shouldn't be used outside of Python itself. It shouldn't be named 'PyTrace_init' unless it is a supported part of the API. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg at cosc.canterbury.ac.nz Thu Jun 21 01:39:17 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 21 Jun 2001 11:39:17 +1200 (NZST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: <200106202339.LAA04351@s454.cosc.canterbury.ac.nz> > The PEP announcement specifically asked for > discussion to occur on the Iterators list Sorry, I missed that - I was paying more attention to the PEP itself than what the announcement said. Going now to subscribe to the iterators list forthwith. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From jeremy at alum.mit.edu Thu Jun 21 01:47:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 19:47:28 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! > > An informal poll of PythonLabs indicates a split on this subject, > perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But > who said this was a democracy anyway? :) > > somewhat-like-my-own-country-of-origin-ly y'rs, > -Barry That's a nice analogy, Ruth Barry Ginsburg; a Supreme Court, which appoints the president, seems a closer fit to Python's dictatorship than some sort of democratic process. I wasn't present for the oral arguments, but I'm sure we all know how Tim Scalia voted and that Guido van Clarence Thomas agreed without comment. I assume, then, that Anthony Kennedy Jr. joined you, although he's often a swing vote, too. Can't wait to hear the report from Nina "Michael Hudson" Totenberg. I was originally happy with the use of def. It's not much of a stretch since the def statement defines a code block that has formal parameters and creates a new scope. I certainly wouldn't be upset if Python ended up using def to define a generator. I appreciate, though, that the definition of a generator may look an awful lot like a function. I can imagine a user reading a module, missing the yield statement, and trying to use the generator as a function. I can't imagine this would happen often. My limited experience with CLU suggests that iterators aren't going to be huge, unwieldy blocks where it's hard to see what the ultimate control flow is. If a confused user treats a generator as a regular function, he or she certainly can't expect it to return anything useful, since all the return statements are bare returns; the expected behavior would be some side-effect on global state, which seems both unlikely and unseemly for an iterator. I'm not sure how hard it will be to explain generators to new users. I expect you would teach functions and iterations via for loop, then explain that there is a special kind of function called a generator that can be used in a for loop. It uses a yield statement instead of a return statement to return values. Not all that hard. If we use a different keyword to introduce them, you'd probably explain them much the same way: A generator is a special kind of function that can be used in a for loop and is defined with generator instead of def. As other people have mentioned, Icon doesn't use special syntax to introduce generators. We might as well look at CLU, too, where a different approach. You can view the CLU Reference Manual at: http://ncstrl.mit.edu/Dienst/UI/2.0/Describe/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR -225 It uses "proc" to introduce a procedure and "iter" to introduce an iterator. See page 72 for the details: http://ncstrl.mit.edu/Dienst/UI/2.0/Page/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR-225 /72 It's a toss up, then between the historical antecedents Icon and CLU. I'd tend to favor a new keyword for generators, but could be talked out of that position. Jeremy From fdrake at acm.org Thu Jun 21 01:57:57 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 20 Jun 2001 19:57:57 -0400 (EDT) Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <15153.14469.903865.533713@cj42289-a.reston1.va.home.com> Jack Jansen writes: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No. I thought I had a prototype for it just above the usage. Any, I'm re-working that code this week, so you can assign this to me in the bug tracker. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Thu Jun 21 16:32:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 21 Jun 2001 10:32:40 -0400 Subject: [Python-Dev] PEP 255 - BDFL Pronouncement: 'def' it stays Message-ID: <200106211432.f5LEWeA03163@odiug.digicool.com> I've thought long and hard and tried to read almost all the mail on this topic, and I cannot get myself to change my mind. No argument on either side is totally convincing, so I have consulted my language designer's intuition. It tells me that the syntax proposed in the PEP is exactly right - not too hot, not too cold. But, like the Oracle at Delphi in Greek mythology, it doesn't tell me why, so I don't have a rebuttal for the arguments against the PEP syntax. The best I can come up with (apart from agreeing with the rebuttals that Tim and others have already made) is "FUD". If this had been part of the language from day one, I very much doubt it would have made Andrew Kuchling's "Python Warts" page. So I propose that Tim and others defending 'def' save their remaining breath, and I propose that Paul and others in favor of 'gen[erator]' start diverting their energy towards thinking about how to best teach generators the PEP syntax. Tim, please add a BDFL pronouncement to the PEP to end the argument. You can also summarize the arguments on either side, for posterity -- without trying to counter them. I found one useful comment on the PEP that isn't addressed and is orthogonal to the whole discussion: try/finally. When you have a try/finally around a yield statement, it is possible that the finally clause is not executed at all when the iterator is never resumed. I find this disturbing, and am tempted to propose that yield inside try/finally be disallowed (but yield inside try/except is still allowed). Another idea might be to somehow continue the frame with an exception at this point -- but I don't have a clue what exception would be appropriate (StopIteration isn't because it goes in the other direction) and I don't know what to do if the generator catches exception and tries to yield again (maybe the exception should be raised again?). The continued execution of the frame would be part of the destructor for the generator-iterator object, so, like a __del__ method, any unhandled exceptions wouldn't be able to propagate out of it. PS I lost my personal archive of the last 18 hours of the iter mailing list, and the web archive is down, alas, so I'm writing this from memory. I *did* read most of the messages in my archive before I accidentally deleted it, though. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tdickenson at devmail.geminidataloggers.co.uk Thu Jun 21 17:02:54 2001 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Thu, 21 Jun 2001 16:02:54 +0100 Subject: [Python-Dev] Re: [Python-iterators] PEP 255 - BDFL Pronouncement: 'def' it stays In-Reply-To: <200106211432.f5LEWeA03163@odiug.digicool.com> References: <200106211432.f5LEWeA03163@odiug.digicool.com> Message-ID: On Thu, 21 Jun 2001 10:32:40 -0400, Guido van Rossum wrote: > Another idea might be to somehow continue the frame with an >exception at this point -- but I don't have a clue what exception >would be appropriate (StopIteration isn't because it goes in the other >direction) Im sure any exception is appropriate there. What about restarting the frame as if the 'yield' had been followed a 'return'? Toby Dickenson tdickenson at geminidataloggers.com From mwh at python.net Fri Jun 22 01:20:17 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 22 Jun 2001 00:20:17 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-06-07 - 2001-06-21 Message-ID: This is a summary of traffic on the python-dev mailing list between June 7 and June 21 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the tenth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 192 | [|] | [|] 30 | [|] | [|] | [|] | [|] | [|] | [|] [|] 20 | [|] [|] | [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-019-014-001-003-014-039-026-013-009-004-001-005-023-021 Thu 07| Sat 09| Mon 11| Wed 13| Fri 15| Sun 17| Tue 19| Fri 08 Sun 10 Tue 12 Thu 14 Sat 16 Mon 18 Wed 20 Quiet fortnight. * Adding .decode() method to Unicode * Marc-Andre Lemburg asked for opinions on adding a .decode method to unicode objects: He certainly got them; the responses ranged from neutral to negative, and there was a surprising amount of hostility in the air. The problem (as ever in these matters) seems to be that Python currently uses the same type for 8-bit strings and gobs of arbitrary data. Guido came to the rescue and calmed everyone down: since when discussion has vanished again. * Adding Asian codecs to the core * Marc-Andre Lemburg announced that Tamito KAJIYAMA has decided to relicense his Japanese codecs with a BSD-style license, enabling them to be included in the core: This is clearly a good thing; the only quibble is that the encodings are by their nature rather large, so they will probably go into a separate directory in CVS (probably python/dist/encodings/) and not go into the source tarball released on python.org. * Omit printing newline after newline * As readers of comp.lang.python will have noticed, Guido posted: and retracted: PEP 259, a proposal for changing the behaviour of the print statement. * sre "improvements" * Gustavo Niemeyer asked if anyone planned to add the "(?(1)blah)" re operators to Python: but Python is not perl and there wasn't much support for making regular expressions more baffling than they already are. * Generators * In a discussion that slobbered across comp.lang.python, python-dev and the python-iterators list at sf (and belongs on the latter!) there was much talk of PEP 255, Simple Generators. Most was positive; the main dissent was from people that thought it was too hard to tell a generator from a regular function (at the source level). However Guido listened to Tim's repeated claims that this is insignificant once you've actually used generators once or twice and Pronounced "'def' it is": and noticed that there are still some issues wrt try/finally blocks. However, clever people seem to be thinking about it, so I'm sure the problem's days are numbered :-) I should also note that the gen-branch has been checked into the trunk of CVS. Woohoo! Cheers, M. From arigo at ulb.ac.be Fri Jun 22 13:00:34 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Fri, 22 Jun 2001 13:00:34 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: Hello everybody, I implemented a proof-of-concept version of a "Python compiler". It is not really a compiler. I know perfectly well that you cannot compile Python into something more efficient than a bunch of calls to PyObject_xxx. Still, this very preliminary version runs the following function twice as fast as the python interpreter: def f(n): result = 0 i = 0 while i; from arigo@ulb.ac.be on Fri, Jun 22, 2001 at 01:00:34PM +0200 References: Message-ID: <20010622071846.A7014@craie.housenet> On Fri, Jun 22, 2001 at 01:00:34PM +0200, Armin Rigo wrote: > Hello everybody, > > I implemented a proof-of-concept version of a "Python compiler". It is not > really a compiler. I know perfectly well that you cannot compile Python > into something more efficient than a bunch of calls to PyObject_xxx. > Still, this very preliminary version runs the following function twice as > fast as the python interpreter: I've implemented something similar, but didn't get such favorable results yet. I was concentrating more on implementing a type system and code to infer type information, and had spent less time on the code generation. (For instance, my system could determine the result type of subscript-type operations, and infer the types of lists over a loop, as in: l1 = [1,3.14159, "tubers"] l2 = [0]*3 for j in range(3): l2[j] = l1[j-3] # Type of l2 is HeterogeneousListType([IntType, FloatType, # StringType]) You could make it run forever on a pathological case like l = [] while 1: l = [l] with the fix being to "give up" after some number of iterations, and declare the unstable object (l) as having type "ObjectType", which is always correct but overbroad. My code is still available, but my motivation has faded somewhat and I haven't had the time to work on it recently in any case. It uses "GNU Lightning" for JIT code generation, rather than using an external compiler. (If I were to approach the problem again, I might discard the JIT code generator in favor of starting over again with the python2c compiler and adding type information) It can make judgements about sequences of calls, such as def f(): return g() when g is given the "solid" attribute, and the compilation process begins by hoisting the former global load of g into a constant load, something like def make_f(): local_g = g def f(): return local_g() return f f = make_f() What are you using to generate code? How would you compare the sophistication of your type inference system to the one I've outlined above? Jeff From Greg.Wilson at baltimore.com Fri Jun 22 14:34:17 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 22 Jun 2001 08:34:17 -0400 Subject: [Python-Dev] ...und zen, ze world! Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> From pedroni at inf.ethz.ch Fri Jun 22 14:59:40 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 22 Jun 2001 14:59:40 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221259.OAA02519@core.inf.ethz.ch> Hi. Just after reading the README, it's very intriguing and interesting, (if I remember well this resemble the customization approach of the Self VM compiler) ideally it could evolve in a loadable extension, that then works together with the normal interp (unchanged up to offering some hooks*) in a trasparent way for the user ... emitting native code for the major platforms or just specialized bytecodes. I will give a serious look at it. regards, Samuele Pedroni. *: some possible useful hooks would be: - minimal profiling support in order to specialize only things called often - feedback for dynamic changing of methods, class hierarchy, ... if we want to optimize method lookup (which would make sense) - a mixed fixed slots/dict layout for instances. From nas at python.ca Fri Jun 22 16:43:17 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 07:43:17 -0700 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <20010622074317.A22058@glacier.fnational.com> Is "raise StopIteration" an abuse of exceptions? Why can we not use "return StopIteration" to signal the end of an iterator? I've done a bit of hacking and the idea seems to work. On possible problem is that the StopIteration object in the builtin module could cause some confusing behavior. For example the code: for obj in __builtin__.__dict__.values(): print obj would not work as expected. This could be fixed in most causes by changing the tp_iternext protocol. Something like: int tp_iternext(PyObject *it, PyObject **item) were the return value is 1, 0, or -1. IOW, StopIteration would not have to come into the protocol if the object implemented tp_iternext. Neil From guido at digicool.com Fri Jun 22 18:19:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:19:34 -0400 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <200106221619.f5MGJY306866@odiug.digicool.com> This is treated extensively in the discussion section of the iterators-PEP; quoting: - It has been questioned whether an exception to signal the end of the iteration isn't too expensive. Several alternatives for the StopIteration exception have been proposed: a special value End to signal the end, a function end() to test whether the iterator is finished, even reusing the IndexError exception. - A special value has the problem that if a sequence ever contains that special value, a loop over that sequence will end prematurely without any warning. If the experience with null-terminated C strings hasn't taught us the problems this can cause, imagine the trouble a Python introspection tool would have iterating over a list of all built-in names, assuming that the special End value was a built-in name! - Calling an end() function would require two calls per iteration. Two calls is much more expensive than one call plus a test for an exception. Especially the time-critical for loop can test very cheaply for an exception. - Reusing IndexError can cause confusion because it can be a genuine error, which would be masked by ending the loop prematurely. I'm not sure why you are reopening this -- special terminating values are evil IMO. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jun 22 18:20:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:20:43 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221620.f5MGKib06875@odiug.digicool.com> Very cool, Armin! Did you announce this on c.l.py too? I wish I had time to look at this in more detail -- but please do go on developing it, and look at what others have tried... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jun 22 18:30:44 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 22 Jun 2001 12:30:44 -0400 Subject: [Python-Dev] why not "return StopIteration"? References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <15155.29364.416545.301534@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: | - Calling an end() function would require two calls per | iteration. Two calls is much more expensive than one call | plus a test for an exception. Especially the time-critical | for loop can test very cheaply for an exception. Plus, if the exception is both raised and caught in C, it is never instantiated, so exception matching is a pointer compare. I know this isn't the case with user defined iterators (since Python's raise semantics is to instantiate the exception), but it helps. -Barry From guido at digicool.com Fri Jun 22 19:12:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 13:12:20 -0400 Subject: [Python-Dev] Python 2.0.1 released! Message-ID: <200106221712.f5MHCLF07192@odiug.digicool.com> I'm happy to announce Python 2.0.1 -- the final release of the first Python version in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Compared to the release candidate, we've fixed a few typos in the license, tweaked the documentation a bit, and fixed an indentation error in statcache.py; other than that, the release candidate was perfect. :-) Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=40616 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jun 22 19:21:03 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 22 Jun 2001 13:21:03 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622074317.A22058@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Is "raise StopIteration" an abuse of exceptions? I only care whether it works . It certainly came as a surprise to me, though, that I'm going to need to fiddle PEP 255 to explain that return in a generator isn't really equivalent to raise StopIteration (because a return in the try-part of a try/except should not trigger the except-part if the generator is pumped again). While a minor wart, it's a wart. If this stands, I'm going to look into changing gen_iternext() to determine whether eval_frame() finished by raising StopIteration, and mark the iterator as done if so. That is, force "return" and "raise StopIteration" to act the same inside generators, and to force "raise StopIteration" inside a generator to truly *mean* "I'm done" in all cases. This would also allow to avoid the proposed special-casing of generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since it's a problem unique to generators, this simply should not be eval_frame's problem to solve -- if generators create the problem, generators should pay to solve it). > Why can we not use "return StopIteration" to signal the end of an > iterator? Just explained why not yesterday, and you did two sentences later . > .... > This could be fixed in most causes by changing the tp_iternext > protocol. Something like: > > int tp_iternext(PyObject *it, PyObject **item) > > were the return value is 1, 0, or -1. Meaning 13, 42, and 666 respectively ? That is, one for "error", one for "OK, and item is the next value", and one for "no error but no next value either -- this iterator terminated normally"? That could work. At one point during the development of the iterator PEP, Guido had some code like that in the internals, on *top* of the exception business. It was clumsy then because redundant. At the level of Python code, how would a user spell "end of iteration"? Would iterators need to return a 2-two tuple in all non-exception cases then, e.g. a (next_value, i_am_done_flag) pair? Or would Python-level iterators simply be unable to return StopIteration as a normal value? > IOW, StopIteration would not have to come into the protocol if the > object implemented tp_iternext. All iterable objects in 2.2 implement tp_iternext, although sometimes it's a Miranda tp_iternext (i.e., one created for an object that doesn't supply its own), so that shouldn't be a worry. All in all, I'm -0 on changing the exception approach -- it's worked very well so far. From thomas at xs4all.net Fri Jun 22 20:02:59 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 22 Jun 2001 20:02:59 +0200 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: References: Message-ID: <20010622200259.N8098@xs4all.nl> On Fri, Jun 22, 2001 at 01:21:03PM -0400, Tim Peters wrote: > If this stands, I'm going to look into > changing gen_iternext() to determine whether eval_frame() finished by > raising StopIteration, and mark the iterator as done if so. That is, force > "return" and "raise StopIteration" to act the same inside generators, and to > force "raise StopIteration" inside a generator to truly *mean* "I'm done" in > all cases. This would also allow to avoid the proposed special-casing of > generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since > it's a problem unique to generators, this simply should not be eval_frame's > problem to solve -- if generators create the problem, generators should pay > to solve it). I don't get this. Currently, (unless Just checked in his patch) generators work in exactly that way: the compiler compiles 'return' into 'raise StopIteration' if it encounters it inside a generator, and into a regular return otherwise. Why would you ask for the patch Just provided, and then change it back ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Fri Jun 22 20:11:13 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 22 Jun 2001 14:11:13 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622200259.N8098@xs4all.nl> Message-ID: [Thomas Wouters] > I don't get this. Currently, (unless Just checked in his patch) > generators work in exactly that way: the compiler compiles 'return' > into 'raise StopIteration' if it encounters it inside a generator, > and into a regular return otherwise. Yes. The part about analyzing the return value inside gen_iternext() would be the only change from the status quo. > Why would you ask for the patch Just provided, and then change it back ? I wouldn't. I asked *you* for a patch (which I haven't yet applied, but will) in a different area, but Just's patch was his own initiative. I hesitated on that one for reasons beyond just lack of time to get to it, and I'm still reluctant to accept it. My msg sketched an alternative to that patch. Note that Just has also (very recently) sketched another alternative, but on the Iterators list instead. just-isn't-in-need-of-defense-because-he-isn't-being-abused-ly y'rs - tim From fdrake at beowolf.digicool.com Fri Jun 22 20:31:44 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:31:44 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010622183144.C6A5428927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. From paulp at ActiveState.com Fri Jun 22 20:37:01 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 22 Jun 2001 11:37:01 -0700 Subject: [Python-Dev] ...und zen, ze world! References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> Message-ID: <3B33904D.F821FE36@ActiveState.com> > > Interesting that there's as much Perl as assembly code, > and more Fortran than Python :-). The Fortran is basically one big package: LAPACK. A bunch of the Python is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd improve our numbers quite a bit. :) -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From esr at thyrsus.com Fri Jun 22 20:46:11 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 22 Jun 2001 14:46:11 -0400 Subject: [Python-Dev] ...und zen, ze world! In-Reply-To: <3B33904D.F821FE36@ActiveState.com>; from paulp@ActiveState.com on Fri, Jun 22, 2001 at 11:37:01AM -0700 References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> <3B33904D.F821FE36@ActiveState.com> Message-ID: <20010622144611.A15388@thyrsus.com> Paul Prescod : > > Interesting that there's as much Perl as assembly code, > > and more Fortran than Python :-). > > The Fortran is basically one big package: LAPACK. A bunch of the Python > is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd > improve our numbers quite a bit. :) I'm working on it. -- Eric S. Raymond The whole of the Bill [of Rights] is a declaration of the right of the people at large or considered as individuals... It establishes some rights of the individual as unalienable and which consequently, no majority has a right to deprive them of. -- Albert Gallatin, Oct 7 1789 From fdrake at beowolf.digicool.com Fri Jun 22 20:53:37 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:53:37 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622185337.BE51228927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. This version includes some contributed changes to the documentation for the cmath module. To make the LaTeX to HTML conversion work, I have made the resulting HTML contain entity references for the "plus/minus" and "infinity" symbols (± and ∞, respectively). These may be problematic for some browsers. Please let me know how it looks on your browser by sending an email to python-docs at python.org. Be sure to state your browser name and version, and what operating system you are using. Thanks! http://python.sourceforge.net/devel-docs/lib/module-cmath.html From nas at python.ca Fri Jun 22 22:13:14 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 13:13:14 -0700 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <200106221619.f5MGJY306866@odiug.digicool.com>; from guido@digicool.com on Fri, Jun 22, 2001 at 12:19:34PM -0400 References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <20010622131314.A22978@glacier.fnational.com> Guido van Rossum wrote: > This is treated extensively in the discussion section of the > iterators-PEP Ah. I don't remember reading that part or seeing the discussion. Sorry I brought it up. Neil From fdrake at beowolf.digicool.com Fri Jun 22 22:52:48 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 16:52:48 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622205248.6290128927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Changed the revised cmath documentation to use "j" as a suffix for complex literals instead of using "i" as a prefix; this is more similar to Python. Changed the font of the suffix to match that used elsewhere in the documentation. This should be a little more readable, but does not change any potential browser compatibility issues, so I still need reports of compatibility or non-compatibility. See my prelimiary report on the topic at: http://mail.python.org/pipermail/doc-sig/2001-June/001940.html From arigo at ulb.ac.be Sat Jun 23 10:13:04 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Sat, 23 Jun 2001 10:13:04 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <20010622071846.A7014@craie.housenet> Message-ID: Hello Jeff, On Fri, 22 Jun 2001, Jeff Epler wrote: > What are you using to generate code? I am generating pseudo-code, which is interpreted by a C module. (With real assembler code, it would of course be much faster, but it was just simpler for the moment.) > How would you compare the > sophistication of your type inference system to the one I've outlined > above? Yours is much more complete, but runs statically. Mine works at run-time. As explained in detail in the readme file, my plan is not to make a "compiler" in the usual sense. I actually have no type inferences; I just collect at run time what types are used at what places, and generate (and possibly modify) the generated code according to that information. (More about it later.) A bientot, Armin. From tim.one at home.com Sat Jun 23 11:17:54 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 23 Jun 2001 05:17:54 -0400 Subject: [Python-Dev] PEP 255: Simple Generators, Revised Posting In-Reply-To: Message-ID: Major revision: more details about exceptions, return vs StopIteration, and interactions with try/except/finally; more Q&A; and a BDFL Pronouncement. The reference implementation appears solid and works as described here in all respects, so I expect this will be the last major revision (and so also last full posting) of this PEP. The output below is in ndiff format (see Tools/scripts/ndiff.py in your Python distribution). Just the new text can be seen in HTML form here: http://python.sf.net/peps/pep-0255.html "Feature discussions" should take place primarily on the Python Iterators list: mailto:python-iterators at lists.sourceforge.net Implementation discussions may wander in and out of Python-Dev too. PEP: 255 Title: Simple Generators - Version: $Revision: 1.3 $ ? ^ + Version: $Revision: 1.12 $ ? ^^ Author: nas at python.ca (Neil Schemenauer), tim.one at home.com (Tim Peters), magnus at hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators at lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 - Post-History: 14-Jun-2001 + Post-History: 14-Jun-2001, 23-Jun-2001 ? +++++++++++++ Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. - Specification + Specification: Yield ? ++++++++ A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase - this in. [XXX spell this out] + this in. [XXX spell this out -- but new keywords have ripple effects + across tools too, and it's not clear this can be forced into the future + framework at all -- it's not even clear that Python's parser alone can + be taught to swing both ways based on a future stmt] The yield statement may only be used inside functions. A function that - contains a yield statement is called a generator function. + contains a yield statement is called a generator function. A generator ? +++++++++++++ + function is an ordinary function object in all respects, but has the + new CO_GENERATOR flag set in the code object's co_flags member. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. + Restriction: A yield statement is not allowed in the try clause of a + try/finally construct. The difficulty is that there's no guarantee + the generator will ever be resumed, hence no guarantee that the finally + block will ever get executed; that's too much a violation of finally's + purpose to bear. + + + Specification: Return + A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). - When a return statement is encountered, nothing is returned, but a + When a return statement is encountered, control proceeds as in any + function return, executing the appropriate finally clauses (if any - StopIteration exception is raised, signalling that the iterator is ? ------------ + exist). Then a StopIteration exception is raised, signalling that the ? ++++++++++++++++ - exhausted. The same is true if control flows off the end of the + iterator is exhausted. A StopIteration exception is also raised if + control flows off the end of the generator without an explict return. + - function. Note that return means "I'm done, and have nothing ? ----------- + Note that return means "I'm done, and have nothing interesting to ? +++++++++++++++ - interesting to return", for both generator functions and non-generator ? --------------- + return", for both generator functions and non-generator functions. ? +++++++++++ - functions. + + Note that return isn't always equivalent to raising StopIteration: the + difference lies in how enclosing try/except constructs are treated. + For example, + + >>> def f1(): + ... try: + ... return + ... except: + ... yield 1 + >>> print list(f1()) + [] + + because, as in any function, return simply exits, but + + >>> def f2(): + ... try: + ... raise StopIteration + ... except: + ... yield 42 + >>> print list(f2()) + [42] + + because StopIteration is captured by a bare "except", as is any + exception. + + + Specification: Generators and Exception Propagation + + If an unhandled exception-- including, but not limited to, + StopIteration --is raised by, or passes through, a generator function, + then the exception is passed on to the caller in the usual way, and + subsequent attempts to resume the generator function raise + StopIteration. In other words, an unhandled exception terminates a + generator's useful life. + + Example (not idiomatic but to illustrate the point): + + >>> def f(): + ... return 1/0 + >>> def g(): + ... yield f() # the zero division exception propagates + ... yield 42 # and we'll never get here + >>> k = g() + >>> k.next() + Traceback (most recent call last): + File "", line 1, in ? + File "", line 2, in g + File "", line 2, in f + ZeroDivisionError: integer division or modulo by zero + >>> k.next() # and the generator cannot be resumed + Traceback (most recent call last): + File "", line 1, in ? + StopIteration + >>> + + + Specification: Try/Except/Finally + + As noted earlier, yield is not allowed in the try clause of a try/ + finally construct. A consequence is that generators should allocate + critical resources with great care. There is no restriction on yield + otherwise appearing in finally clauses, except clauses, or in the try + clause of a try/except construct: + + >>> def f(): + ... try: + ... yield 1 + ... try: + ... yield 2 + ... 1/0 + ... yield 3 # never get here + ... except ZeroDivisionError: + ... yield 4 + ... yield 5 + ... raise + ... except: + ... yield 6 + ... yield 7 # the "raise" above stops this + ... except: + ... yield 8 + ... yield 9 + ... try: + ... x = 12 + ... finally: + ... yield 10 + ... yield 11 + >>> print list(f()) + [1, 2, 4, 5, 8, 9, 10, 11] + >>> Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print + Both output blocks display: + + A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + Q & A + Q. Why not a new keyword instead of reusing "def"? + + A. See BDFL Pronouncements section below. + - Q. Why a new keyword? Why not a builtin function instead? + Q. Why a new keyword for "yield"? Why not a builtin function instead? ? ++++++++++++ A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new - keyword makes that easy. + keyword makes that easy. The CPython referrence implementation also + exploits it heavily, to detect which functions *are* generator- + functions (although a new keyword in place of "def" would solve that + for CPython -- but people asking the "why a new keyword?" question + don't want any new keyword). + + Q: Then why not some other special syntax without a new keyword? For + example, one of these instead of "yield 3": + + return 3 and continue + return and continue 3 + return generating 3 + continue return 3 + return >> , 3 + from generator return 3 + return >> 3 + return << 3 + >> 3 + << 3 + + A: Did I miss one ? Out of hundreds of messages, I counted two + suggesting such an alternative, and extracted the above from them. + It would be nice not to need a new keyword, but nicer to make yield + very clear -- I don't want to have to *deduce* that a yield is + occurring from making sense of a previously senseless sequence of + keywords or operators. Still, if this attracts enough interest, + proponents should settle on a single consensus suggestion, and Guido + will Pronounce on it. + + Q. Why allow "return" at all? Why not force termination to be spelled + "raise StopIteration"? + + A. The mechanics of StopIteration are low-level details, much like the + mechanics of IndexError in Python 2.1: the implementation needs to + do *something* well-defined under the covers, and Python exposes + these mechanisms for advanced users. That's not an argument for + forcing everyone to work at that level, though. "return" means "I'm + done" in any kind of function, and that's easy to explain and to use. + Note that "return" isn't always equivalent to "raise StopIteration" + in try/except construct, either (see the "Specification: Return" + section). + + Q. Then why not allow an expression on "return" too? + + A. Perhaps we will someday. In Icon, "return expr" means both "I'm + done", and "but I have one final useful value to return too, and + this is it". At the start, and in the absence of compelling uses + for "return expr", it's simply cleaner to use "yield" exclusively + for delivering values. + + + BDFL Pronouncements + + Issue: Introduce another new keyword (say, "gen" or "generator") in + place of "def", or otherwise alter the syntax, to distinguish + generator-functions from non-generator functions. + + Con: In practice (how you think about them), generators *are* + functions, but with the twist that they're resumable. The mechanics of + how they're set up is a comparatively minor technical issue, and + introducing a new keyword would unhelpfully overemphasize the + mechanics of how generators get started (a vital but tiny part of a + generator's life). + + Pro: In reality (how you think about them), generator-functions are + actually factory functions that produce generator-iterators as if by + magic. In this respect they're radically different from non-generator + functions, acting more like a constructor than a function, so reusing + "def" is at best confusing. A "yield" statement buried in the body is + not enough warning that the semantics are so different. + + BDFL: "def" it stays. No argument on either side is totally + convincing, so I have consulted my language designer's intuition. It + tells me that the syntax proposed in the PEP is exactly right - not too + hot, not too cold. But, like the Oracle at Delphi in Greek mythology, + it doesn't tell me why, so I don't have a rebuttal for the arguments + against the PEP syntax. The best I can come up with (apart from + agreeing with the rebuttals ... already made) is "FUD". If this had + been part of the language from day one, I very much doubt it would have + made Andrew Kuchling's "Python Warts" page. Reference Implementation - A preliminary patch against the CVS Python source is available[7]. + The current implementation, in a preliminary state (no docs and no + focused tests), is part of Python's CVS development tree[9]. + Using this requires that you build Python from source. + + This was derived from an earlier patch by Neil Schemenauer[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html + [9] To experiment with this implementation, check out Python from CVS + according to the instructions at + http://sf.net/cvs/?group_id=5470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From mal at lemburg.com Sat Jun 23 12:54:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 23 Jun 2001 12:54:27 +0200 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B347563.9BBEF858@lemburg.com> Armin Rigo wrote: > > Hello Jeff, > > On Fri, 22 Jun 2001, Jeff Epler wrote: > > What are you using to generate code? > > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) > > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Sound like you are using (re)compiling on-the-fly -- that would certainly be a very reasonable way to deal with Python's dynamic object world. It would also solve the problems of static compilers with type inference nicely. A very nice idea ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Sat Jun 23 16:11:03 2001 From: skip at pobox.com (Skip Montanaro) Date: Sat, 23 Jun 2001 09:11:03 -0500 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B347563.9BBEF858@lemburg.com> References: <3B347563.9BBEF858@lemburg.com> Message-ID: <15156.41847.86431.594106@beluga.mojam.com> mal> Sound like you are using (re)compiling on-the-fly ... This is what the Self compiler did, though I don't know if its granularity was as fine as I understand psyco's is from reading its README file. It's been awhile since I read through that stuff, but I seem to recall it would compile functions to machine code only if they were heavily executed. It also did a lot of type inferencing. Skip From guido at digicool.com Sat Jun 23 17:58:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 23 Jun 2001 11:58:40 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Sat, 23 Jun 2001 10:13:04 +0200." References: Message-ID: <20010623160024.QWCF14539.femail14.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) This has great promise! Once you have an interpreter for some kind of pseudo-code, it's always possible to tweak the interpreter or the pseudo-code to make it faster. And you can make another jump to machine code to make it a lot faster. There was a project (p2c or python2c) that tried to compile an entire Python program to C code that was mostly just calling the Python runtime C API functions. It also obtained about a factor of 2 in speed-up, but its problem was (if I recall) that even a small Python module translated into hundreds of thousands of lines of C -- think what that would do to locality. Since you have already obtained the same speedup with your approach, I think there's great promise. Count on sending in a paper for the next Python conference! > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Very cool: a Python JIT compiler. > (More about it later.) Can't wait! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at beowolf.digicool.com Sun Jun 24 04:41:04 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Sat, 23 Jun 2001 22:41:04 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010624024104.A757728927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ A couple of small updates, including spelling the keywords correctly in the language reference. This version brings back the hyperlinked grammar productions I played around with earlier. They still need work, but they are somewhat better than plain text. From m.favas at per.dem.csiro.au Sun Jun 24 06:25:27 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 12:25:27 +0800 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <3B356BB7.9BE71569@per.dem.csiro.au> Socketmodule at the moment has multiple problems after the changes to handle IPv6: 1: socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These functions both use offsetof(), which is defined (on my system, at least) in stddef.h. The #include for this file is inside a #if 0 block. 2: #including this file allow the compile to complete without error. However, there is no Makefile dependency on these two files, once socketmodule.o has been built. Changes to either of the get{name,addr}info.c files will not cause socketmodule to be rebuilt. 3: The socket module still does not work, however, since it refers to an unresolved symbol inet_pton >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: inet_pton inet_pton is called in two places in getaddrinfo.c... there's likely to be other platforms besides Tru64 Unix that do not have this function. -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Sun Jun 24 06:48:32 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 24 Jun 2001 00:48:32 -0400 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B356BB7.9BE71569@per.dem.csiro.au> Message-ID: ]Mark Favas] > Socketmodule at the moment has multiple problems after the changes to > handle IPv6: > > 1: > socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These > functions both use offsetof(), which is defined (on my system, at least) > in stddef.h. The #include for this file is inside a #if 0 block. > > 2: > #including this file allow the compile to complete without error. > However, there is no Makefile dependency on these two files, once > socketmodule.o has been built. Changes to either of the > get{name,addr}info.c files will not cause socketmodule to be rebuilt. > > 3: > The socket module still does not work, however, since it refers to an > unresolved symbol inet_pton > >>> import socket > Traceback (most recent call last): > File "", line 1, in ? > File > "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Li > b/socket.py", > line 41, in ? > from _socket import * > ImportError: Unresolved symbol in > /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/bui > ld/lib.osf1-V4.0-alpha-2.2/_socket.so: > inet_pton > > inet_pton is called in two places in getaddrinfo.c... there's likely to > be other platforms besides Tru64 Unix that do not have this function. If it's any consolation, the Windows build is in worse shape: socketmodule.c Modules\addrinfo.h(123) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(125) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\getaddrinfo.c(109) : warning C4013: 'offsetof' undefined; assuming extern returning int Modules\getaddrinfo.c(109) : error C2143: syntax error : missing ')' before 'type' Modules\getaddrinfo.c(109) : error C2099: initializer is not a constant Modules\getaddrinfo.c(109) : error C2059: syntax error : ')' Modules\getaddrinfo.c(111) : error C2059: syntax error : ',' Modules\getaddrinfo.c(407) : warning C4013: 'inet_pton' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_MULTICAST' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_EXPERIMENTAL' undefined; assuming extern returning int Modules\getaddrinfo.c(417) : error C2065: 'IN_LOOPBACKNET' : undeclared identifier Modules\getaddrinfo.c(417) : warning C4018: '==' : signed/unsigned mismatch Modules\getaddrinfo.c(531) : error C2373: 'WSAGetLastError' : redefinition; different type modifiers C:\VC98\INCLUDE\winsock.h(787) : see declaration of 'WSAGetLastError' Modules\getnameinfo.c(66) : error C2143: syntax error : missing ')' before 'type' Modules\getnameinfo.c(66) : error C2099: initializer is not a constant Modules\getnameinfo.c(66) : error C2059: syntax error : ')' Modules\getnameinfo.c(67) : error C2059: syntax error : ',' Modules\getnameinfo.c(133) : warning C4013: 'snprintf' undefined; assuming extern returning int Modules\getnameinfo.c(153) : warning C4018: '==' : signed/unsigned mismatch Modules\getnameinfo.c(167) : warning C4013: 'inet_ntop' undefined; assuming extern returning int Modules\getnameinfo.c(168) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Modules\getnameinfo.c(200) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Martin should revert the changes to socketmodule.c until this has a prayer of working. From est at hyperreal.org Sun Jun 24 07:38:06 2001 From: est at hyperreal.org (est at hyperreal.org) Date: Sat, 23 Jun 2001 22:38:06 -0700 (PDT) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: "from Armin Rigo at Jun 22, 2001 01:00:34 pm" Message-ID: <20010624053806.16277.qmail@hyperreal.org> Am I seeing things or does it actually speed up five to six times on my machine? Very exciting! timing specializing_call(, 2000)... result 1952145856 in 4.94 seconds timing specializing_call(, 2000)... result 1952145856 in 3.91 seconds timing f(2000,)... result 1952145856 in 25.17 seconds I wonder to what extent this approach can be applied to method calls. My analysis of my performance-bound Python apps convinces me that those are a major bottleneck for me. About a fifth of their time seems to go into creating the bound method object (reducable by caching them on the instance)..another fifth into allocating the memory for the frame object (ameliorated by pymalloc). As for the rest, I really don't know. E From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 10:34:06 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:34:06 +0200 Subject: [Python-Dev] gethostbyname2 Message-ID: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> The IPv6 patch proposes to introduce a new socket function, socket.gethostbyname2(name, af). This becomes necessary as a name might have both an IPv4 and an IPv6 address. One alternative for providing such API is to get socket.gethostbyname an optional second argument (the address family). itojun's rationale for calling it gethostbyname2 is that the C API, as defined in RFC 2133. Which of these alternatives would you prefer? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 10:20:31 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:20:31 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> After integrating the first chunk of IPv6 changes, Tim Peters quickly found that they won't compile on Windows - even though this was the least-critical part of the patch. Specifically, this code emulates the getaddrinfo and getnameinfo calls, which will be exposed to Python programs in a later patch. Therefore, it is essential that they are available on every system, either directly or through emulation. For Windows, one option is to use the Microsoft-provided emulation, which is available from http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp To use this emulation, only the header files of the package are required; it is not necessary to actually install the IPv6 preview on the system. The MS emulation will try to load a few DLLs which are known to provide getaddrinfo. If neither DLL is found, the code in the header file falls back to an emulation. That way, the resulting socket.pyd would use the true API function on installations that provide them, and the emulation on all other systems. The only requirement for building Python is then that the header file from the technology preview is available on the build machine (tpipv6.h). It may be that the header file is also included in recent SDK releases, I haven't checked. Is such a requirement acceptable for building the socket module on Windows? Regards, Martin From m.favas at per.dem.csiro.au Sun Jun 24 10:58:42 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 16:58:42 +0800 Subject: [Python-Dev] IPv6 support Message-ID: <3B35ABC2.11F3B261@per.dem.csiro.au> IPv6 support may be nice, and even desirable. However, supporting IPv6 should not come at the cost of causing problems either in compilation or at runtime on those platforms that do not support IPv6 natively. Requiring additional preview code or non-standardly-supplied packages to be installed is fine if people _want_ to take advantage of the new IPv6 functionality, but _not_ fine if this IPv6 functionality is not required. IPv4 support should not require the installation of additional IPv6 packages. Well, that's my 2 cent's worth (even if that's only 1 cent US ). -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From pf at artcom-gmbh.de Sun Jun 24 11:20:10 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 24 Jun 2001 11:20:10 +0200 (MEST) Subject: foobar2(), foobar3(), ... (was Re: [Python-Dev] gethostbyname2) In-Reply-To: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> from "Martin v. Loewis" at "Jun 24, 2001 10:34:06 am" Message-ID: Martin v. Loewis: > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? IMO: The possibility to add new keyword arguments with default values is one of the major strengths Python has compared to other programming languages. Especially in the scenario, where an existing mature API has to be enhanced later with added features: In such a situation I always prefer APIs with fewer functions (may be with large lists of optional arguments) compared to APIs containing a bunch of functions or methods called 'popen2()', 'gethostbyname2()' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From tim.one at home.com Sun Jun 24 12:51:40 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 24 Jun 2001 06:51:40 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. Mark Favas also reported failure on a Unix box -- we can't leave the CVS tree in an unusable state, and Mark in particular provides uniquely valuable feedback from his collection of Platforms from Mars . I #ifdef'ed out the offending includes on Windows for now, but that doesn't help Mark. > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp It says it's unsupported preview software for Win2K only. Since even the first *real* release of anything from MS sucks, I wouldn't touch this unless I absolutely had to. But I don't have any cycles for this project anyway, so this: > ... > Is such a requirement acceptable for building the socket module on > Windows? will have to be addressed by someone who does. Is anyone, e.g., at ActiveState keen on this? From mal at lemburg.com Sun Jun 24 13:06:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 13:06:19 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: <3B35C9AB.2D1D2185@lemburg.com> "Martin v. Loewis" wrote: > > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. > > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp > > To use this emulation, only the header files of the package are > required; it is not necessary to actually install the IPv6 preview on > the system. The MS emulation will try to load a few DLLs which are > known to provide getaddrinfo. If neither DLL is found, the code in the > header file falls back to an emulation. That way, the resulting > socket.pyd would use the true API function on installations that > provide them, and the emulation on all other systems. > > The only requirement for building Python is then that the header file > from the technology preview is available on the build machine > (tpipv6.h). It may be that the header file is also included in recent > SDK releases, I haven't checked. > > Is such a requirement acceptable for building the socket module on > Windows? Isn't this the MS SDK that has the new "Open Source" license clause in it ?! If yes, I very much doubt that this approach would be feasable for Python... http://msdn.microsoft.com/downloads/eula_mit.htm Quote from a recent posting by Steven Majewski on c.l.p.: """ (c) Open Source. Recipients license rights to the Software are conditioned upon Recipient (i) not distributing such Software, in whole or in part, in conjunction with Potentially Viral Software (as defined below); and (ii) not using Potentially Viral Software (e.g. tools) to develop Recipient software which includes the Software, in whole or in part. For purposes of the foregoing, Potentially Viral Software means software which is licensed pursuant to terms that: (x) create, or purport to create, obligations for Microsoft with respect to the Software or (y) grant, or purport to grant, to any third party any rights to or immunities under Microsofts intellectual property or proprietary rights in the Software. By way of example but not limitation of the foregoing, Recipient shall not distribute the Software, in whole or in part, in conjunction with any Publicly Available Software. Publicly Available Software means each of (i) any software that contains, or is derived in any manner (in whole or in part) from, any software that is distributed as free software, open source software (e.g. Linux) or similar licensing or distribution models; and (ii) any software that requires as a condition of use, modification and/or distribution of such software that other software distributed with such software (A) be disclosed or distributed in source code form; (B) be licensed for the purpose of making derivative works; or (C) be redistributable at no charge. Publicly Available Software includes, without limitation, software licensed or distributed under any of the following licenses or distribution models, or licenses or distribution models similar to any of the following: (A) GNUs General Public License (GPL) or Lesser/Library GPL (LGPL), (B) The Artistic License (e.g., PERL), (C) the Mozilla Public License, (D) the Netscape Public License, (E) the Sun Community Source License (SCSL), and (F) the Sun Industry Standards License (SISL). """ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Sun Jun 24 15:23:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 24 Jun 2001 09:23:52 -0400 Subject: [Python-Dev] gethostbyname2 In-Reply-To: Your message of "Sun, 24 Jun 2001 10:34:06 +0200." <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> References: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> Message-ID: <20010624132540.RTEI4013.femail3.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? Definitely an optional 2nd arg to gethostbyname() -- in C, you can't do tht, so they *had* to create a new function, but Python is more flexible. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Sun Jun 24 17:18:22 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sun, 24 Jun 2001 08:18:22 -0700 Subject: [Python-Dev] IPv6 and Windows References: Message-ID: <3B3604BE.7E2F6C6E@ActiveState.com> Tim Peters wrote: > > Is such a requirement acceptable for building the socket module on > > Windows? > > will have to be addressed by someone who does. Is anyone, e.g., at > ActiveState keen on this? Not as far as I know. I haven't looked at the patches, but couldn't we have the IPv6 code be #ifdef'ed out, so that those who care about IPv6 can periodically test it while the various OS-level libraries are ramped up over the next months/years, but w/o disturbing the 'current' builds? --david From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 19:00:43 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:00:43 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> (mal@lemburg.com) References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> Message-ID: <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> > > Is such a requirement acceptable for building the socket module on > > Windows? > > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No, this has a different license text, which can be seen on http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp On redistribution, it says # If you redistribute the SOFTWARE and/or your Source Modifications, # or any portion thereof as provided above, you agree: (i) to # distribute the SOFTWARE only in conjunction with, and as part of, # your Source Modifications which add significant functionality to the # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source # Modifications solely as part of your research and not in any # commercial product; (iii) the SOFTWARE and/or your Source # Modifications will not be distributed for profit; (iv) to retain all # branding, copyright and trademark notices included with the SOFTWARE # and include a copy of this EULA with any distribution of the # SOFTWARE, or any portion thereof; and (v) to indemnify, hold # harmless, and defend Microsoft from and against any claims or # lawsuits, including attorneys' fees, that arise or result from # the use or distribution of your Source Modifications. I don't know whether this is acceptable or not. Regards, Martin From mal at lemburg.com Sun Jun 24 20:08:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 20:08:13 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> Message-ID: <3B362C8D.D3AECE3C@lemburg.com> "Martin v. Loewis" wrote: > > > > Is such a requirement acceptable for building the socket module on > > > Windows? > > > > Isn't this the MS SDK that has the new "Open Source" license > > clause in it ?! > > No, this has a different license text, which can be seen on > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp > > On redistribution, it says > > # If you redistribute the SOFTWARE and/or your Source Modifications, > # or any portion thereof as provided above, you agree: (i) to > # distribute the SOFTWARE only in conjunction with, and as part of, > # your Source Modifications which add significant functionality to the > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > # Modifications solely as part of your research and not in any > # commercial product; (iii) the SOFTWARE and/or your Source > # Modifications will not be distributed for profit; (iv) to retain all > # branding, copyright and trademark notices included with the SOFTWARE > # and include a copy of this EULA with any distribution of the > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > # harmless, and defend Microsoft from and against any claims or > # lawsuits, including attorneys' fees, that arise or result from > # the use or distribution of your Source Modifications. > > I don't know whether this is acceptable or not. Most likely not: there are lots of commercial Python users out there who wouldn't like these clauses at all... we'd also lose the GPL compatibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 19:48:03 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:48:03 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> > I haven't looked at the patches, but couldn't we have the IPv6 code > be #ifdef'ed out, so that those who care about IPv6 can periodically > test it while the various OS-level libraries are ramped up over the > next months/years, but w/o disturbing the 'current' builds? Not if we are going to introduce itojun's patch. In that patch, the IPv6 code *is* actually ifdef'ed out. It is getaddrinfo/getnameinfo that gives problems, which isn't IPv6 specific at all. The problem is that the library patches (httplib, ftplib, etc) do use getaddrinfo to find out how to contact a remote system, which is the right thing to do IMO. So even if the IPv6 support can be activated only if desired, getaddrinfo absolutely has to work. So the only question then is where we get an implementation of these functions if the system doesn't provide one. itojun has suggested the WIDE libraries; since they apparently don't compile on Windows, I've suggested the MS TP emulation. If the latter is not acceptable, we either have to fix the WIDE implementation to work on Windows also; As for the problems Mark reported: I think they can get fixed. Regards, Martin From thomas at xs4all.net Sun Jun 24 23:35:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 24 Jun 2001 23:35:37 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <20010624233537.R8098@xs4all.nl> On Sun, Jun 24, 2001 at 07:48:03PM +0200, Martin v. Loewis wrote: > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL support. I'm only comfortable with the IPv6 patch if it's optional, or can at least be disabled. I haven't looked at the patch, but why is getaddrinfo absolutely necessary, if the code works without it now, too ? > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; > As for the problems Mark reported: I think they can get fixed. What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I don't think it can't, it just takes more work. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 23:39:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:39:45 +0200 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> > 1: socketmodule.c now #includes getnameinfo.c and > getaddrinfo.c. These functions both use offsetof(), which is defined > (on my system, at least) in stddef.h. That should be fixed now. stddef.h is included in socketmodule.c; if it is not available or does not define offsetof, an additional definition is provided. > 2. [...] Changes to either of the get{name,addr}info.c files will > not cause socketmodule to be rebuilt. I don't know how to solve this one. If distutils builds the modules, makefile dependencies won't help. > 3. The socket module still does not work, however, since it refers > to an unresolved symbol inet_pton I took the simplest solution that I could think of, delegating inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all other address families (AF_INET6 in particular). I've verified that this code does the same as the builtin functions on my Linux system; please let me know whether it compiles for you. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 23:56:48 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:56:48 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <20010624233537.R8098@xs4all.nl> (message from Thomas Wouters on Sun, 24 Jun 2001 23:35:37 +0200) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> Message-ID: <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> > Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL > support. I'm only comfortable with the IPv6 patch if it's optional, or can > at least be disabled. I haven't looked at the patch, but why is getaddrinfo > absolutely necessary, if the code works without it now, too ? getaddrinfo offers protocol-independent address lookup. It is necessary to use that API to support AF_INET and AF_INET6 transparently in application code. itojun proposes to change a number of standard library modules. Please have a look at the actual patch for details; the typical change will look like this (for httplib) diff -u -r1.35 httplib.py --- Lib/httplib.py 2001/06/01 16:25:38 1.35 +++ Lib/httplib.py 2001/06/24 04:41:48 @@ -357,10 +357,22 @@ def connect(self): """Connect to the host and port specified in __init__.""" - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - if self.debuglevel > 0: - print "connect: (%s, %s)" % (self.host, self.port) - self.sock.connect((self.host, self.port)) + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): + af, socktype, proto, canonname, sa = res + try: + self.sock = socket.socket(af, socktype, proto) + if self.debuglevel > 0: + print "connect: (%s, %s)" % (self.host, self.port) + self.sock.connect(sa) + except socket.error, msg: + if self.debuglevel > 0: + print 'connect fail:', (self.host, self.port) + self.sock.close() + self.sock = None + continue + break + if not self.sock: + raise socket.error, msg def close(self): """Close the connection to the HTTP server.""" As you can see, the modified code can simultaneously access both IPv4 and IPv6 hosts, and will pick whatever it can connect to best. Without getaddrinfo, httplib would continue to support IPv4 hosts only. The IPv6 support itself is absolutely optional. If it is not available, getaddrinfo will never return IPv6 addresses, or propose AF_INET6 as the address family. > What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If > this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I > don't think it can't, it just takes more work. Depends on what zero-impact-if-necessary means to you. The patch, as it stands, can be fixed to compile on all systems that are currently supported. It cannot be fixed to be taken completely out (unless you literally do that: take it out). I don't plan to fight for it too much. Please have a look at the code itself, and try to cooperate on integrating it. Don't reject it outright without having even looked at it. If I get strong rejections from everybody, I'll just withdraw it and feel sorry for the time I've already spent with it. Regards, Martin From m.favas at per.dem.csiro.au Mon Jun 25 00:16:25 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Mon, 25 Jun 2001 06:16:25 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> Message-ID: <3B3666B9.335DA17E@per.dem.csiro.au> [Martin v. Loewis] > > > 1: socketmodule.c now #includes getnameinfo.c and > > getaddrinfo.c. These functions both use offsetof(), which is defined > > (on my system, at least) in stddef.h. > > That should be fixed now. stddef.h is included in socketmodule.c; if > it is not available or does not define offsetof, an additional > definition is provided. Yes, this is fine now... > > > 2. [...] Changes to either of the get{name,addr}info.c files will > > not cause socketmodule to be rebuilt. > > I don't know how to solve this one. If distutils builds the modules, > makefile dependencies won't help. > > > 3. The socket module still does not work, however, since it refers > > to an unresolved symbol inet_pton > > I took the simplest solution that I could think of, delegating > inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all > other address families (AF_INET6 in particular). I've verified that > this code does the same as the builtin functions on my Linux system; > please let me know whether it compiles for you. > To get socketmodule.c to compile, I had to make a change to line 2963 so that the declaration of inet_pton matched the previous declaration on line 220 (changing char *src to const char *src). Still have problems though, due to the use of snprintf in getnameinfo.c: Python 2.2a0 (#444, Jun 25 2001, 05:58:17) [C] on osf1V4 Type "copyright", "credits" or "license" for more information. >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: snprintf Cheers, Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Mon Jun 25 07:02:30 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:02:30 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> Message-ID: >> http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp [MAL] > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No. That was for the "Mobile Internet Toolkit" toolkit; no relation, AFAICT. > If yes, I very much doubt that this approach > would be feasable for Python... > > http://msdn.microsoft.com/downloads/eula_mit.htm From tim.one at home.com Mon Jun 25 07:14:17 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:14:17 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; I don't have cycles for this, but will cheerily suggest that the WIDE problems didn't appear especially deep, just "the usual" careless brand of Unix+gcc+glibc specific coding. For example, HAVE_LONG_LONG is #define'd on Windows, but, just as in Python source, you can't *use* "long long" literally, you have to use the LONG_LONG macro instead. Then Windows doesn't have an offsetof() macro, or an snprintf() either. Etc. The code is in trouble exactly where it relies on platform-specific extensions to the std C language and library. Problems with those won't be unique to Windows, either, which is a deeper concern (but already well expressed by others). It would be nice if Python could contribue portability back to WIDE. That requires worker bees, though, and lots of x-platform testing. If it turns out we can't swing that, then support for this is premature, and we should wait, e.g., for WIDE to put more effort into porting their code. From just at letterror.com Mon Jun 25 08:55:17 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 08:55:17 +0200 Subject: [Python-Dev] os.path.normcase() in site.py Message-ID: <20010625085521-r01010600-9a6226c8@213.84.27.177> I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago. I think this is wrong: we should always try to *preserve* case. I see os.path.normcase() as a tool to be able to better compare two paths, but you shouldn't *store* paths this way. I for one am irritated when I see a path that doesn't have the proper case. The intention of makepath() in site.py seems good -- it turns all paths into absolute paths -- but is the normcase really neccesary? *** Please CC follow-ups to me, as I'm not on python-dev. Just From martin at loewis.home.cs.tu-berlin.de Mon Jun 25 08:39:44 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 25 Jun 2001 08:39:44 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B3666B9.335DA17E@per.dem.csiro.au> (message from Mark Favas on Mon, 25 Jun 2001 06:16:25 +0800) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> > To get socketmodule.c to compile, I had to make a change to line 2963 > so that the declaration of inet_pton matched the previous declaration on > line 220 (changing char *src to const char *src). Still have problems > though, due to the use of snprintf in getnameinfo.c: Ok, they are printing a single number into a 512 byte buffer; that is safe even with sprintf only, so I have just remove the snprintf call. Can you please try again? Thanks for your reports, Martin From thomas at xs4all.net Mon Jun 25 09:20:53 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 09:20:53 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625085521-r01010600-9a6226c8@213.84.27.177> References: <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625092053.S8098@xs4all.nl> On Mon, Jun 25, 2001 at 08:55:17AM +0200, Just van Rossum wrote: > *** Please CC follow-ups to me, as I'm not on python-dev. Is that by choice ? It seems rather... peculiar, to me, that you have checkin access but aren't on python-dev. You'll miss all those wonderful "Don't touch CVS, I'm building a release" and "Who put CVS in an unstable state?" messages. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Mon Jun 25 09:51:00 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 03:51:00 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625092053.S8098@xs4all.nl> Message-ID: [Just van Rossum] > *** Please CC follow-ups to me, as I'm not on python-dev. [Thomas Wouters] > Is that by choice ? It seems rather... peculiar, to me, that you have > checkin access but aren't on python-dev. Well, I suppose it's supposed to be a secret, but Guido and Just haven't talked in 17 years come Wednesday. IIRC, something about a bottle of wine and a toilet seat, and a small but energetic ferret. Just hacked his way into SourceForge access (those skills just run in the family, I guess), but every time he hacks onto Python-Dev Guido detects it and locks him out again. It's very sad, really -- but also wonderfully Dutch. at-least-that's-the-best-explanation-i-can-think-of-ly y'rs - tim From thomas at xs4all.net Mon Jun 25 10:35:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 10:35:38 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: References: Message-ID: <20010625103538.T8098@xs4all.nl> On Mon, Jun 25, 2001 at 03:51:00AM -0400, Tim Peters wrote: [ Tim explains about the century-old, horrid blood feud that cost the lives of many an innocent ferret, not to mention bottles of wine, caused by Just's future attempts to join python-dev -- damn that timemachine ] Okay... how about someone takes Guido out for dinner and feeds him way too many bottles of wine and ferrets to show him such things do not necessarily lead to blood feuds ? Maybe take along some psychotropic drugs and a halfway decent hypnotist for safety's measure. Meanwhile Barry subscribes Just to python-dev and you or someone else with the pickpocket skills to get at the keys for the time machine (come on, fess up, you all practiced) make sure Guido can't get at it, lest he try and make up with Just in the past in his 'suggestable' state... Better change the Mailman admin password too, just to be on the safe side. Or if that has no chance of a prayer in hell of working, I can give Just a secret xs4all.nl address (since he has an XS4ALL account nowadays, that shouldn't be a problem) and we just never tell Guido that py-dev at xs4all.nl is really Just ;) > It's very sad, really -- but also wonderfully Dutch. No, it would only be wondefully dutch if either brother was German or Belgian in some way, or of royal blood and married to the wrong type of christian sect (Protestant or Catholic -- I keep forgetting which is which.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Mon Jun 25 11:05:23 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 05:05:23 -0400 Subject: [Python-Dev] RE: [Python-iterators] Death by Leakage In-Reply-To: Message-ID: Here's a simpler leaker, amounting to an insanely convoluted way to generate the ints 1, 2, 3, ...: DO_NOT_LEAK = 1 class LazyList: def __init__(self, g): self.sofar = [] self.fetch = g.next def __getitem__(self, i): sofar, fetch = self.sofar, self.fetch while i >= len(sofar): sofar.append(fetch()) return sofar[i] def clear(self): self.__dict__.clear() def plus1(g): for i in g: yield i + 1 def genm23(): yield 1 for i in plus1(m23): yield i for i in range(10000): m23 = LazyList(genm23()) [m23[i] for i in range(50)] if DO_NOT_LEAK: m23.clear() Neil, it would help if genobjects had a memberlist so that the struct members were discoverable from Python code; that would also let me add appropriate methods to Cyclops.py to find cycles automatically. Anyway, m23 is a LazyList instance, where m23.fetch is genm23().next, i.e. m23.fetch is s bound method of the genm23() generator-iterator. So the frame for genm23 is reachable from m23. __dict__. That frame contains an anonymous (it's living in the frame's valuestack) generator-iterator thingie corresponding to the plus1(m23) call. *That* generator's frame in turn has m23 in its locals (m23 was an argument to plus1), and another iterator method referencing m23 in its valuestack (due to the "for i in g"). But m23 is the LazyList instance we started with, so there's a cycle, and clearing m23.__dict__ breaks it. gc doesn't chase generators or frames, so it can't clean this stuff up if we don't clear the dict. So this appears hopeless unless gc adds both generators and frames to its repertoire. OTOH, it's got to be rare -- maybe . Worth it? From loewis at informatik.hu-berlin.de Mon Jun 25 11:43:33 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 25 Jun 2001 11:43:33 +0200 (MEST) Subject: [Python-Dev] make static Message-ID: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> There is a bug report on SF that 'make static' fails for a Makefile.pre.in extension, see http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 Is that process still supported? Unless I'm mistaken, this is complicated by the fact that Makefile.pre.in packages use the Makefile.pre.in that comes with the package, not the one that comes with the Python installation. Any insights welcome, Martin From jack at oratrix.nl Mon Jun 25 12:18:40 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:18:40 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: Message by Mark Favas , Mon, 25 Jun 2001 06:16:25 +0800 , <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <20010625101842.B6BC6303182@snelboot.oratrix.nl> I'm having a lot of problems with the new getaddrinfo stuff: no prototypes used in various routines, missing consts in routine declarations and then passing const strings to it, all routines seem to be globals (and with pretty dangerous names) even though they all look pretty static to me, etc. Could whoever put this in do a round of quality control on it, please? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon Jun 25 12:28:08 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:28:08 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Message by Just van Rossum , Mon, 25 Jun 2001 08:55:17 +0200 , <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625102809.42357303182@snelboot.oratrix.nl> > I noticed that these days __file__ attributes of modules are case normalized > (ie. lowercased on case insensitive file systems), or at least the directory > part. Then I noticed that this is caused by the fact that all sys.path entries > are case normalized. It turns out that site.py does this, in a function called > makepath(), added by Fred about 8 months ago. > > I think this is wrong: we should always try to *preserve* case. There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters. And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting. I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik at pythonware.com Mon Jun 25 14:12:23 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 25 Jun 2001 14:12:23 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> Message-ID: <006101c0fd70$17a6b660$0900a8c0@spiff> martin wrote: > getaddrinfo offers protocol-independent address lookup. It is > necessary to use that API to support AF_INET and AF_INET6 > transparently in application code. itojun proposes to change a number > of standard library modules. Please have a look at the actual patch > for details; the typical change will look like this (for httplib) > > diff -u -r1.35 httplib.py > --- Lib/httplib.py 2001/06/01 16:25:38 1.35 > +++ Lib/httplib.py 2001/06/24 04:41:48 > @@ -357,10 +357,22 @@ > > def connect(self): > """Connect to the host and port specified in __init__.""" > - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > - if self.debuglevel > 0: > - print "connect: (%s, %s)" % (self.host, self.port) > - self.sock.connect((self.host, self.port)) > + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): > + af, socktype, proto, canonname, sa = res > + try: > + self.sock = socket.socket(af, socktype, proto) > + if self.debuglevel > 0: > + print "connect: (%s, %s)" % (self.host, self.port) > + self.sock.connect(sa) > + except socket.error, msg: > + if self.debuglevel > 0: > + print 'connect fail:', (self.host, self.port) > + self.sock.close() > + self.sock = None > + continue > + break > + if not self.sock: > + raise socket.error, msg instead of adding code like that to every single module, maybe we should add a convenience function to the socket module? (and make that function smart enough to work also if getaddrinfo isn't supported by the native platform...) From guido at digicool.com Mon Jun 25 15:40:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:10 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251340.f5PDeAO07244@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:40:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:47 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 10:35:38 +0200." <20010625103538.T8098@xs4all.nl> References: <20010625103538.T8098@xs4all.nl> Message-ID: <200106251340.f5PDele07256@odiug.digicool.com> No need to get me drunk. Barry & I decided to change this policy weeks ago, but (in order to avoid a flurry of subscription requests from functional-language proponents) we decided to keep the policy change a secret. :-) Just can suscribe safely now. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:40:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251340.f5PDe6e07238@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:41:46 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:41:46 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 19:48:03 +0200." <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <200106251341.f5PDfkg07283@odiug.digicool.com> > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Yes, but in an IPv4-only environment it would be super trivial to implement, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:42:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:42:18 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 20:08:13 +0200." <3B362C8D.D3AECE3C@lemburg.com> References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> <3B362C8D.D3AECE3C@lemburg.com> Message-ID: <200106251342.f5PDgI107298@odiug.digicool.com> > > # If you redistribute the SOFTWARE and/or your Source Modifications, > > # or any portion thereof as provided above, you agree: (i) to > > # distribute the SOFTWARE only in conjunction with, and as part of, > > # your Source Modifications which add significant functionality to the > > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > > # Modifications solely as part of your research and not in any > > # commercial product; (iii) the SOFTWARE and/or your Source > > # Modifications will not be distributed for profit; (iv) to retain all > > # branding, copyright and trademark notices included with the SOFTWARE > > # and include a copy of this EULA with any distribution of the > > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > > # harmless, and defend Microsoft from and against any claims or > > # lawsuits, including attorneys' fees, that arise or result from > > # the use or distribution of your Source Modifications. > > > > I don't know whether this is acceptable or not. > > Most likely not: there are lots of commercial Python users out there > who wouldn't like these clauses at all... we'd also lose the GPL > compatibility. Don't even *think* about using code with that license. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:43:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:04 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251343.f5PDh4907304@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:43:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:25 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251343.f5PDhQ407309@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Jun 25 15:50:31 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 25 Jun 2001 08:50:31 -0500 Subject: [Python-Dev] xrange vs generators Message-ID: <15159.16807.480121.637386@beluga.mojam.com> With generators in the language, should xrange be deprecated? Skip From just at letterror.com Mon Jun 25 16:05:43 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 16:05:43 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <20010625160545-r01010600-e232a14e@213.84.27.177> Guido van Rossum wrote: > I can't think of any function besides the attempt to avoid duplicates. > > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? They are already, they already go through abspath(), which calls normpath(). > I'll leave it to Fred, Jack or Just to fix this. If it were up to me, I'd simply remove the normcase() call from makepath(). Just From arigo at ulb.ac.be Mon Jun 25 15:08:52 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:08:52 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625134824.00abde60@127.0.0.1> Hello everybody, A note about what I have in mind about Psyco... Type-sets are independent from memory representation. In other words, it is not because two variables can take the same set of values that the data is necessarily encoded in the same way in memory. In particular, I believe we won't need to change the way the current Python interpreted encodes data. For example, instances currently have a dictionary of attributes and no "fixed slots", but this is not a problem for Psyco, which can encode instances in better ways (e.g. as a C struct) as long as it is only accessed by Psyco-compiled Python code and no "legacy" code. This approach also allows Psyco to completely remove the overhead of creating bound method objects and frame objects; both are generally temporary, and so during their whole lifetime they can be represented much more efficiently in memory. For frame objects it should be clear (we probably need no frame at all as long as no exception exits the current procedure, and even in this case it could be optimized). For method objects we use "memory sharing", a technique already applied in the current Psyco. More precisely, if some (immutable) data is found at some memory location (or machine register) and Python code says it should be duplicated, we need not duplicate it at all; we can just consider that the copy is at the same location as the original. For method objects it means the following: suppose you have an instance "xyz" and query its "foo()" method. Suppose that you can (at some time) be sure that, because of the class of "xyz", "xyz.foo" will always be the Python function "f". Then the method object's representation can be simplified: all it needs to store in memory is a pointer to "xyz", because "f" is a constant part. Now a single pointer to the "xyz" instance is exactly the same memory format as the original "xyz" variable, so that this particular representation of a bound method object can share the original "xyz" pointer. No actual machine code is produced; Psyco simply notes that both "xyz" and "xyz.foo" are represented at the same location, althought "xyz" represents an instance with the given pointer, and "xyz.foo" represents the "f" function with its first argument bound to the given pointer. According to est at hyperreal.org, method and frame objects each represent 20% of the execution time... (Est, on which kind of machine did you get Psyco run the sample code 5 times faster !? It's only 2 times faster on a modern Pentium...) A bient?t, Armin. From arigo at ulb.ac.be Mon Jun 25 15:45:20 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:45:20 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625150819.00aa5220@127.0.0.1> Hello, At 14:59 22.06.2001 +0200, Samuele Pedroni wrote: >*: some possible useful hooks would be: >- minimal profiling support in order to specialize only things called often >- feedback for dynamic changing of methods, class hierarchy, ... if we want >to optimize method lookup (which would make sense) >- a mixed fixed slots/dict layout for instances. There is one point that you didn't mention, which I believe is important: how to handle global/builtin variables. First, a few words about the current Python semantics. * I am sorry if what follows has already been discussed; I am raising the question again because it might be important for Psyco. If you feel this should better be a PEP please just tell me so. * Complete lexical scoping was recently added, implemented with "free" and "cell" variables. These are only used for functions defined inside of other functions; top-level functions use the opcode LOAD_GLOBAL for all non-local variables. LOAD_GLOBAL performs one or two dictionary look-up (two if the variable is built-in). For simple built-ins like "len" this might be expensive (has someone measured such costs ?). I suggest generalizing the compile-time lexical scoping rules. Let's compile all functions' non-local variables (top-level and others) as "free" variables. This means the corresponding module's global variables must be "cell" variables. This is just what we would get if the module's code was one big function enclosing the definition of all the other functions. Next, the variables not defined in the module (the built-ins) are "free" variables of the module, and the built-in module provides "cell" variables for them. Remember that "free" and "cell" variables are linked together when the function (or module in this case) is defined (for functions, when "def" is executed; for modules, it would be at load-time). Benefit: not a single dictionary look-up any more; uniformity of treatment. Potential code break: global variables shadowing built-ins would behave like local variables shadowing globals, i.e. the mere presence of a global "xyz=..." would forever hide the "xyz" built-in from the module, even before the assignment or after a "del xyz". (c.f. UnboundLocalError.) To think about: what the "global" keyword would mean in this context. Implementation problems: if we want to keep the module's dictionary of global variables (and we certainly do) it would require changes to the dictionary implementation (or the creation of a different kind of dictionary). One solution is to automatically dereference cell objects and raise exceptions upon reading empty cells. Another solution is to turn dictionaries into collections of objects that all behave like cell objects (so that if "d" is any dictionary, something like "d.ref(key)" would let us get a cell object which could be read or written later to actually get or set the value associated to "key", and "d[key]" would mean "d.ref(key).cell_ref). Well, these are just proposals; they might not be a good solution. Why it is related to Psyco: the current treatment of globals/builtins makes it hard for Psyco to statically tell what function we are calling when it sees e.g. "len(a)" in the code. We would at least need some help from the interpreter; at least hooks called when the module's globals() dictionary change. The above proposal might provide a more uniform solution. Thanks for your attention. Armin. From guido at digicool.com Mon Jun 25 16:26:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:26:08 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 08:50:31 CDT." <15159.16807.480121.637386@beluga.mojam.com> References: <15159.16807.480121.637386@beluga.mojam.com> Message-ID: <200106251426.f5PEQ8907629@odiug.digicool.com> > With generators in the language, should xrange be deprecated? > > Skip No, but maybe xrange() should be changed to return an iterator. E.g. something like this: def xrange(start, stop, step): while start < stop: yield start start += stop but with the appropriate defaults, and reversal of the test if step < 0, and an error if step == 0, and type checks enforcing ints (or long ints!), and implemented in C. :-) Although xrange() objects currently support some sequence algebra, that is mostly bogus and I don't think anyone in their right mind uses it. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller at ion-tof.com Mon Jun 25 16:37:31 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 16:37:31 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> > > With generators in the language, should xrange be deprecated? > > > > Skip > > No, but maybe xrange() should be changed to return an iterator. > E.g. something like this: > > def xrange(start, stop, step): > while start < stop: > yield start > start += stop > > but with the appropriate defaults, and reversal of the test if step < > 0, and an error if step == 0, and type checks enforcing ints (or long > ints!), and implemented in C. :-) > > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I _was_ using xrange as sets representing (potentially large) ranges of ints. Example: positive = xrange(1, sys.maxint) if num in positive: ... I didt follow the iterators discussion: would this continue to work? Thomas From esr at thyrsus.com Mon Jun 25 16:41:34 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 25 Jun 2001 10:41:34 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251426.f5PEQ8907629@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 10:26:08AM -0400 References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <20010625104134.B30559@thyrsus.com> Guido van Rossum : > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I agree. As long as we make those cases fail loudly, I see no objection to dropping support for them. -- Eric S. Raymond Americans have the will to resist because you have weapons. If you don't have a gun, freedom of speech has no power. -- Yoshimi Ishikawa, Japanese author, in the LA Times 15 Oct 1992 From barry at digicool.com Mon Jun 25 16:38:20 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 25 Jun 2001 10:38:20 -0400 Subject: [Python-Dev] os.path.normcase() in site.py References: <20010625103538.T8098@xs4all.nl> Message-ID: <15159.19676.727068.217548@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> Okay... how about someone takes Guido out for dinner and feeds TW> him way too many bottles of wine and ferrets to show him such TW> things do not necessarily lead to blood feuds ? Maybe take TW> along some psychotropic drugs and a halfway decent hypnotist TW> for safety's measure. Don't forget the dentist, proctologist, and a trepanist. Actually, if you can find a holeologist it would be much more efficient (my cousin Neil, a.k.a. Dr. Finger, a.k.a. Dr Watumpka would be ideal, but he's studying in Dortmund these days). TW> Meanwhile Barry subscribes Just to python-dev I'd be glad to, and I won't even divulge the fact that python-dev is only ostensibly a closed, insular mailing list these days. TW> and you or someone else with the pickpocket skills to get at TW> the keys for the time machine No pickpocketing skill necessary. Guido leaves the keys in a small safebox magnetically adhered underneath the running boards. Just be sure to ground yourself first (learned the hard way)! TW> (come on, fess up, you all practiced) make sure Guido can't TW> get at it, lest he try and make up with Just in the past in TW> his 'suggestable' state... Better change the Mailman admin TW> password too, just to be on the safe side. I've tried that many times, but I suspect Guido has a Pybot thermetically linked to the time machine which "instantly" recedes several seconds into the past each time I change it, only to change it back. TW> Or if that has no chance of a prayer in hell of working, I can TW> give Just a secret xs4all.nl address (since he has an XS4ALL TW> account nowadays, that shouldn't be a problem) and we just TW> never tell Guido that py-dev at xs4all.nl is really Just ;) You realize it's way too "late" for that, don't you? The time machine works just as well in the forward direction as in the past direction, and long before he left the comfy environs of Amsterdam to brave it out in the harsh, unforgiving wilderness of Washington, he mapped out every moment of young Wouters' life. Why do you think I've worn aluminum foil underwear for the past 30 years? Trust me, it's not for the feeling of freshness and confidence it provides (okay, only partially). >> It's very sad, really -- but also wonderfully Dutch. TW> No, it would only be wondefully dutch if either brother was TW> German or Belgian in some way, or of royal blood and married TW> to the wrong type of christian sect (Protestant or Catholic -- TW> I keep forgetting which is which.) It would also be wonderfully American, but only if Just had trivially wronged Guido years ago by eating one of his nabisco cookies or some such. -Barry From guido at digicool.com Mon Jun 25 16:47:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:47:50 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 16:37:31 +0200." <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> Message-ID: <200106251447.f5PEloH07777@odiug.digicool.com> [me] > > Although xrange() objects currently support some sequence algebra, > > that is mostly bogus and I don't think anyone in their right mind uses > > it. [theller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > > I didt follow the iterators discussion: would this > continue to work? No, it would break. And I see another breakage too: r = xrange(10) for i in r: for j in r: print i, j would not do the right thing if xrange() returned an iterator (because iterators can only be used once). This is too bad; I really wish that xrange() could die or be limited entirely to for loops. I wonder if we could put warnings on xrange() uses beyond the most basic...? --Guido van Rossum (home page: http://www.python.org/~guido/) From pedroni at inf.ethz.ch Mon Jun 25 16:51:16 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Mon, 25 Jun 2001 16:51:16 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251451.QAA17756@core.inf.ethz.ch> Hi. [Armin Rigo] ... > Why it is related to Psyco: the current treatment of globals/builtins makes > it hard for Psyco to statically tell what function we are calling when it > sees e.g. "len(a)" in the code. We would at least need some help from the > interpreter; at least hooks called when the module's globals() dictionary > change. The above proposal might provide a more uniform solution. > FYI, a different proposal for opt. globals access by Jeremy Hylton. It seems, it would break fewer things ... don't know whether it can be as useful for Psyco: http://mail.python.org/pipermail/python-dev/2001-May/014995.html In any case I think Psyco will need notification support from the interpreter about dynamic changes to things that Psyco honestly assumes to be invariant in order to achieve performance. regards, Samuele Pedroni. From thomas.heller at ion-tof.com Mon Jun 25 17:05:09 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 17:05:09 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <00e001c0fd88$3a532140$e000a8c0@thomasnotebook> > [theller] > > I _was_ using xrange as sets representing (potentially large) > > ranges of ints. > > Example: > > > > positive = xrange(1, sys.maxint) > > > > if num in positive: > > ... > > > > I didt follow the iterators discussion: would this > > continue to work? > > No, it would break. Since there was a off-by-one bug for 'if num in xrange()' in Pyhon2.0 my code already has been rewritten. Thomas From pedroni at inf.ethz.ch Mon Jun 25 17:04:45 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Mon, 25 Jun 2001 17:04:45 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251504.RAA18642@core.inf.ethz.ch> Hi. [Armin Rigo] > In particular, I believe we won't need to change the way the current Python > interpreted encodes data. For example, instances currently have a > dictionary of attributes and no "fixed slots", but this is not a problem > for Psyco, which can encode instances in better ways (e.g. as a C struct) > as long as it is only accessed by Psyco-compiled Python code and no > "legacy" code. This makes sense, but I'm asking if it is affordable to have all code executed (if we aim for usage-transparency) through Psyco-compiled code (memory foot-print, compilation vs. execution trade-offs for rarely executed code) Otherwise in a mixed execution context we would pay for conversions. I can see how a dynamic compiler can deal with methods together with the interpreter that notifies when a dynamic change to hierarchy, method defs can potetianlly invalidate compiled code. I see more problems with instance data slots, because there are no strong hints in the code about which are the "official" slots of a class, and undisciplined code can treat instances just as dicts. regards, Samuele Pedroni. From fdrake at acm.org Mon Jun 25 17:13:31 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 25 Jun 2001 11:13:31 -0400 (EDT) Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> References: <20010625102809.42357303182@snelboot.oratrix.nl> <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. There were two reasons for adding this code: 1. Avoid duplicates (speeds imports if there are duplicates and the modules are found on an entry after the dupes). 2. Avoid breakage when a script uses os.chdir(). This is probably unusual for large applications, but fairly common for little admin helper scripts. > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? > > I'll leave it to Fred, Jack or Just to fix this. I certainly agree that this can be improved; if Jack or Just would like to assign it to me on SourceForge, I'd be glad to fix it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim at digicool.com Mon Jun 25 17:39:47 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 11:39:47 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: [Thomas Heller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > I didt follow the iterators discussion: would this > continue to work? [Guido] > No, it would break. "x in y" works with any iterable y in 2.2, incl. generators. So e.g. >>> def xr(n): ... i = 0 ... while i < n: ... yield i ... i += 1 ... >>> 1 in xr(10) 1 >>> 9 in xr(10) 1 >>> 10 in xr(10) 0 >>> However, there's no __contains__ method here, so in the last case it actually did 10 compares. 0 in xr(sys.maxint) is very quick, but I'm still waiting for -1 in xr(sys.maxint) to complete . > And I see another breakage too: This would also apply to Thomas's example of giving a name to an xrange object, if implemented via generator: >>> small = xr(5) >>> 2 in small 1 >>> 2 in small 0 >>> > ... > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Hmm. I'd rather not endure the resulting complaints without a strong rationale for deprecating it. One that strikes close to my heart: there's more code in 2.2 to support xrange than there is to support generators! But users don't care about that. From thomas at xs4all.net Mon Jun 25 17:42:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 17:42:12 +0200 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <20010625174211.U8098@xs4all.nl> On Mon, Jun 25, 2001 at 10:47:50AM -0400, Guido van Rossum wrote: [ xrange can't be changed into a generator ] > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Why do we want to do this ? xrange() is still exactly what it was: an object that pretends to be a list of integers. Besides being useful for those who work a lot with ranges, it's a wondeful example on what you can do with Python (even if it isn't actually written in Python :-) I see less reason to deprecate xrange than to deprecate the gopherlib, wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jun 25 18:07:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:07:44 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 11:39:47 EDT." References: Message-ID: <200106251607.f5PG7iq08192@odiug.digicool.com> > Hmm. I'd rather not endure the resulting complaints without a > strong rationale for deprecating it. One that strikes close to my > heart: there's more code in 2.2 to support xrange than there is to > support generators! But users don't care about that. But I do, and historically this code has often been bug-ridden without anybody noticing -- so it's not like it's needed much. I would suggest to remove most of the fancy features of xrange(), in particular the slice, contains and repeat slots. A step further would be to remove getitem also, and add a tp_getiter slot instead -- returning not itself but a new iterator that iterates through the prescribed sequence. We need a PEP for this. Anyone? Should be short and sweet. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 18:11:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:11:10 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 17:42:12 +0200." <20010625174211.U8098@xs4all.nl> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> <20010625174211.U8098@xs4all.nl> Message-ID: <200106251611.f5PGBA608205@odiug.digicool.com> > [ xrange can't be changed into a generator ] > > > This is too bad; I really wish that xrange() could die or be limited > > entirely to for loops. I wonder if we could put warnings on xrange() > > uses beyond the most basic...? > > Why do we want to do this ? xrange() is still exactly what it was: an object > that pretends to be a list of integers. Besides being useful for those who > work a lot with ranges, it's a wondeful example on what you can do with > Python (even if it isn't actually written in Python :-) There is exactly *one* idiomatic use of xrange(): for i in xrange(...): ... All other operations supported by the xrange object are very rarely used, and historically their implementation has had obvious bugs that no-one noticed for years. > I see less reason to deprecate xrange than to deprecate the gopherlib, > wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. Those are useful application-area libraries for some folks. The idiomatic xrange() object is useful too. But the advanced features of xrange() are an example of code bloat. --Guido van Rossum (home page: http://www.python.org/~guido/) From Greg.Wilson at baltimore.com Mon Jun 25 18:25:33 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 25 Jun 2001 12:25:33 -0400 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1437 - 13 msgs Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E27F1@nsamcanms1.ca.baltimore.com> > Guido: > Since you have already obtained the same speedup with your approach, I > think there's great promise. Count on sending in a paper for the next > Python conference! Greg: "Doctor Dobb's Journal" would also be interested in an article. Who knows --- it might even be done before the ones on stackless, garbage collection, Zope acquisition, and generators... :-) Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From just at letterror.com Mon Jun 25 18:47:30 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 18:47:30 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Message-ID: <20010625184734-r01010600-dbd1c84a@213.84.27.177> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. Fred L. Drake, Jr. wrote: > There were two reasons for adding this code: > > 1. Avoid duplicates (speeds imports if there are duplicates and > the modules are found on an entry after the dupes). > > 2. Avoid breakage when a script uses os.chdir(). This is > probably unusual for large applications, but fairly common for > little admin helper scripts. 1) normcase(). Bad. 2) abspath(). Good. I think #2 is a ligitimate problem, but I'm not so sure of #1: is it really so common for sys.path to contain duplicates, to worry about it at all? > > I'll leave it to Fred, Jack or Just to fix this. > > I certainly agree that this can be improved; if Jack or Just would > like to assign it to me on SourceForge, I'd be glad to fix it. Here's my proposed fix: Index: site.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/site.py,v retrieving revision 1.27 diff -c -3 -r1.27 site.py *** site.py 2001/06/12 16:48:52 1.27 --- site.py 2001/06/25 16:42:33 *************** *** 67,73 **** def makepath(*paths): dir = os.path.join(*paths) ! return os.path.normcase(os.path.abspath(dir)) L = sys.modules.values() for m in L: --- 67,73 ---- def makepath(*paths): dir = os.path.join(*paths) ! return os.path.abspath(dir) L = sys.modules.values() for m in L: Just From aahz at rahul.net Mon Jun 25 19:19:48 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 25 Jun 2001 10:19:48 -0700 (PDT) Subject: [Python-Dev] 2.1.1 vs. os.normcase() Message-ID: <20010625171948.D636399C80@waltz.rahul.net> It's too late for 2.0.1, but should this bugfix go into 2.1.1? (Just to be clear, this is the problem that Just reported with site.py calling os.normcase() in makepath().) ((I'm only asking about this bug in specific because we're getting down to the wire on 2.1.1 IIUC.)) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon Jun 25 20:06:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 14:06:02 -0400 Subject: [Python-Dev] 2.1.1 vs. os.normcase() In-Reply-To: Your message of "Mon, 25 Jun 2001 10:19:48 PDT." <20010625171948.D636399C80@waltz.rahul.net> References: <20010625171948.D636399C80@waltz.rahul.net> Message-ID: <200106251806.f5PI62L08770@odiug.digicool.com> > It's too late for 2.0.1, but should this bugfix go into 2.1.1? > > (Just to be clear, this is the problem that Just reported with site.py > calling os.normcase() in makepath().) > > ((I'm only asking about this bug in specific because we're getting down > to the wire on 2.1.1 IIUC.)) Unclear if it's purely a bugfix -- this could be considered a feature, but I don't know. What do others think? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim at digicool.com Mon Jun 25 20:47:06 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 14:47:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > ... > With MacPython's way of finding the initial sys.path contents we > don't have the Windows-Python problem that we add the same directory > 5 times (once in uppercase, once in lowercase, once in mixed case, > once in mixed-case with / for \, etc:-), Happily, we don't have that problem on a stock Windows Python anymore: C:\Python21>python Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import sys, pprint >>> pprint.pprint(sys.path) ['', 'c:\\python21', 'c:\\python21\\dlls', 'c:\\python21\\lib', 'c:\\python21\\lib\\plat-win', 'c:\\python21\\lib\\lib-tk'] >>> OTOH, this is still Icky, because those don't match (wrt case) the names in the filesystem (e.g., just look at the initial prompt line: I was in Python21 when I ran this, not python21). > so if this is what it's trying to solve we can take it out easily. It's hard to believe Fred added code to solve a Windows problem ; I don't know what it's trying to do. From m.favas at per.dem.csiro.au Mon Jun 25 21:38:47 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 26 Jun 2001 03:38:47 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> Message-ID: <3B379347.7E8D00EB@per.dem.csiro.au> "Martin v. Loewis" wrote: > > > To get socketmodule.c to compile, I had to make a change to line 2963 > > so that the declaration of inet_pton matched the previous declaration on > > line 220 (changing char *src to const char *src). Still have problems > > though, due to the use of snprintf in getnameinfo.c: > > Ok, they are printing a single number into a 512 byte buffer; that is > safe even with sprintf only, so I have just remove the snprintf call. > Can you please try again? > > Thanks for your reports, > Martin No trouble... The current CVS compiles (with a warning), links, and runs. The warning given is: cc: Warning: /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Modu les/getaddrinfo.c, line 407: In this statement, the referenced type of the point er value "hostname" is const, but the referenced type of the target of this assi gnment is not. (notconstqual) if (inet_pton(gai_afdl[i].a_af, hostname, pton)) { ------------------------------------------------^ which can be fixed by declaring the second argument to inet_pton as const char* instead of char* in the two occurences of inet_pton in socketmodule.c Cheers, Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From martin at loewis.home.cs.tu-berlin.de Tue Jun 26 01:08:00 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 26 Jun 2001 01:08:00 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106251341.f5PDfkg07283@odiug.digicool.com> (message from Guido van Rossum on Mon, 25 Jun 2001 09:41:46 -0400) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <200106251341.f5PDfkg07283@odiug.digicool.com> Message-ID: <200106252308.f5PN80701342@mira.informatik.hu-berlin.de> > > The problem is that the library patches (httplib, ftplib, etc) do use > > getaddrinfo to find out how to contact a remote system, which is the > > right thing to do IMO. So even if the IPv6 support can be activated > > only if desired, getaddrinfo absolutely has to work. > > Yes, but in an IPv4-only environment it would be super trivial to > implement, right? Right, and getaddrinfo.c/getnameinfo.c attempt such an implementation. They might attempt to get it "more right" than necessary, but still they are "pure C", in the sense that they don't rely on any libraries except for those available in a typical IPv4 sockets implementation. At least that's the theory. It turns out that they've been using inet_pton and snprintf, which is probably because they have been mainly tested on BSD. I'm in good faith that we can reduce them to a "no funny library calls needed" minimum. If somebody wants to implement them anew from ground up, only using what the socketmodule already uses, that would be fine as well. An actual review for the code for portability problems would also be helpful. Regards, Martin From greg at cosc.canterbury.ac.nz Tue Jun 26 06:32:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 26 Jun 2001 16:32:05 +1200 (NZST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106251451.QAA17756@core.inf.ethz.ch> Message-ID: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Samuele Pedroni : > a different proposal for opt. globals access > by Jeremy Hylton. It seems, it would break fewer things ... I really like Jeremy's proposal. I've been having similar thoughts myself for quite a while. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Tue Jun 26 16:57:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 10:57:37 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Tue, 26 Jun 2001 16:32:05 +1200." <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> References: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Message-ID: <200106261457.f5QEvbZ11007@odiug.digicool.com> > Samuele Pedroni : > > > a different proposal for opt. globals access > > by Jeremy Hylton. It seems, it would break fewer things ... > > I really like Jeremy's proposal. I've been having similar > thoughts myself for quite a while. > > Greg Ewing Ditto. Isn't this what I've been calling "low-hanging fruit" for ages? Apparently it's low but still out of reach. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jun 26 19:59:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 13:59:55 -0400 Subject: [Python-Dev] PEP 260: simplify xrange() Message-ID: <200106261759.f5QHxtH15045@odiug.digicool.com> Here's another sweet and short PEP. What do folks think? Is xrange()'s complexity really worth having? --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 260 Title: Simplify xrange() Version: $Revision: 1.1 $ Author: guido at python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 26-Jun-2001 Post-History: 26-Jun-2001 Abstract This PEP proposes to strip the xrange() object from some rarely used behavior like x[i:j] and x*n. Problem The xrange() function has one idiomatic use: for i in xrange(...): ... However, the xrange() object has a bunch of rarely used behaviors that attempt to make it more sequence-like. These are so rarely used that historically they have has serious bugs (e.g. off-by-one errors) that went undetected for several releases. I claim that it's better to drop these unused features. This will simplify the implementation, testing, and documentation, and reduce maintenance and code size. Proposed Solution I propose to strip the xrange() object to the bare minimum. The only retained sequence behaviors are x[i], len(x), and repr(x). In particular, these behaviors will be dropped: x[i:j] (slicing) x*n, n*x (sequence-repeat) cmp(x1, x2) (comparisons) i in x (containment test) x.tolist() method x.start, x.stop, x.step attributes By implementing a custom iterator type, we could speed up the common use, but this is optional (the default sequence iterator does just fine). I expect it will take at most an hour to rip it all out; another hour to reduce the test suite and documentation. Scope This PEP only affects the xrange() built-in function. Risks Somebody's code could be relying on the extended code, and this code would break. However, given that historically bugs in the extended code have gone undetected for so long, it's unlikely that much code is affected. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From fdrake at acm.org Tue Jun 26 22:01:41 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:01:41 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... Message-ID: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> I'd like people to run the attached C program and send the output to me. What this does is run the gettimeofday() and getrusage() functions until the time values change. The intent is to determine the quality of the available timing information. For example, on my Linux-Mandrake 7.2 installation with a stock 2.2.17 kernel, I get this: timeofday: 1 (1 calls), rusage: 10000 (2465 calls) Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Tue Jun 26 22:05:48 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:05:48 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> Fred L. Drake, Jr. writes: > I'd like people to run the attached C program and send the output to OK, I've attached it this time. Sorry! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: observation.c URL: From gward at python.net Tue Jun 26 22:10:09 2001 From: gward at python.net (Greg Ward) Date: Tue, 26 Jun 2001 16:10:09 -0400 Subject: [Python-Dev] make static In-Reply-To: <200106251340.f5PDeAO07244@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 09:40:10AM -0400 References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> <200106251340.f5PDeAO07244@odiug.digicool.com> Message-ID: <20010626161009.B2820@gerg.ca> On 25 June 2001, Guido van Rossum said: > As long as it works, it works. I don't think there's a reason to > spend more than absolutely minimal time trying to keep it working > though -- we're trying to encourage everybody to migrate towards > distutils. So (without having seen the SF report) I'd say "tough > luck". The catch is that I never got around to implementing statically building a new interpreter via the Distutils, so (for now) Makefile.pre.in is the only way to do this. ;-( (Unless someone added it to the Distutils while I wasn't looking, which wouldn't be hard since I haven't looked in, ummm, six months or so...) Greg -- Greg Ward - just another /P(erl|ython)/ hacker gward at python.net http://starship.python.net/~gward/ "When I hear the word `culture', I reach for my gun." --Goebbels "When I hear the word `Microsoft', *I* reach for *my* gun." --me From arigo at ulb.ac.be Wed Jun 27 04:01:54 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From arigo at ulb.ac.be Wed Jun 27 04:01:54 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From nas at python.ca Tue Jun 26 23:01:38 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 14:01:38 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 04:01:41PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <20010626140138.A2838@glacier.fnational.com> Fred L. Drake, Jr. wrote: > timeofday: 1 (1 calls), rusage: 10000 (2465 calls) My hacked version of Linux 2.4 on an AMD-800 box: timeofday: 1 (2 calls), rusage: 976 (1792 calls) I don't quite understand the output. What does the 976 mean? Neil From fdrake at acm.org Tue Jun 26 23:23:53 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 17:23:53 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626140138.A2838@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> Message-ID: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > My hacked version of Linux 2.4 on an AMD-800 box: > > timeofday: 1 (2 calls), rusage: 976 (1792 calls) > > I don't quite understand the output. What does the 976 mean? The "1" and the "976" are the appearant resolution of the time values reported by those two calls, in microseconds. It looks like the HZ define in that header file you pointed out could be bumped a little higher. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas at csiro.au Wed Jun 27 01:21:47 2001 From: mark.favas at csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 07:21:47 +0800 Subject: [Python-Dev] latest unicode-related change causes failure in test_unicode & test_unicodedata Message-ID: <3B39190B.E7DA5B5D@csiro.au> CVS of a short while ago, Tru64 Unix: "make test" gives two unicode-related failures: test_unicode test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding test_unicodedata The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_unicodedata Testing Unicode Database... Methods: ********************************************************************** Then ... We expected (repr): '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' But instead we got: '374108f225e0c1488f8389ce6333902830d299fb' test test_unicodedata failed -- Writing: '374108f225e0c1488f8389ce6333902830d299fb', expected: '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' Running the tests manually, test_unicode fails, test_unicodedata doesn't fail, but doesn't match the expected output for Methods: (test_unicode) Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing builtin codecs... Traceback (most recent call last): File "Lib/test/test_unicode.py", line 383, in ? verify(u'\ud800\udc02'.encode('utf-8') == \ File "./Lib/test/test_support.py", line 95, in verify raise TestFailed(reason) test_support.TestFailed: test failed (test_unicodedata) python Lib/test/test_unicodedata.py Testing Unicode Database... Methods: 374108f225e0c1488f8389ce6333902830d299fb Functions: 41e1d4792185d6474a43c83ce4f593b1bdb01f8a API: ok -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From JamesL at Lugoj.Com Wed Jun 27 02:06:23 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 17:06:23 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39237F.1A7EF3F2@Lugoj.Com> Guido van Rossum wrote: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? Are there still known bugs that will take some effort to repair? Is xrange constantly touched when changes are made elsewhere? If no to both, then I suggest don't fix what ain't broken; life is too short. (Unless it is annoying you to distraction, then do the deed and get it over with.) From tim.one at home.com Wed Jun 27 02:32:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 26 Jun 2001 20:32:26 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39237F.1A7EF3F2@Lugoj.Com> Message-ID: [James Logajan] > Are there still known bugs that will take some effort to repair? Is > xrange constantly touched when changes are made elsewhere? If no to > both, then I suggest don't fix what ain't broken; life is too short. > (Unless it is annoying you to distraction, then do the deed and get > it over with.) I think it's more the latter. I partly provoked this by bitterly pointing out that there's more code in the CVS tree devoted to supporting the single xrange() gimmick than Neil Schemenauer added to support the get-out-of-town more powerful new generators. Masses of crufty code nobody benefits from are a burden on the soul. although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- full-of-crufty-old-irix5-demos-in-the-std-library-ly y'rs - tim From tdelaney at avaya.com Wed Jun 27 02:36:25 2001 From: tdelaney at avaya.com (Delaney, Timothy) Date: Wed, 27 Jun 2001 10:36:25 +1000 Subject: [Python-Dev] RE: PEP 260: simplify xrange() Message-ID: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 260 > Title: Simplify xrange() > Version: $Revision: 1.1 $ > Author: guido at python.org (Guido van Rossum) > Status: Draft > Type: Standards Track > Python-Version: 2.2 > Created: 26-Jun-2001 > Post-History: 26-Jun-2001 > > Abstract > > This PEP proposes to strip the xrange() object from some rarely > used behavior like x[i:j] and x*n. > > > Problem > > The xrange() function has one idiomatic use: > > for i in xrange(...): ... If this is to be done, I would also propose that xrange() and range() be changed to allow passing in a straight-out sequence such as in the following code in order to get rid of the need for range(len(seq)): import __builtin__ def range (start, stop=None, step=1, range=range): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return range(start2, stop2, step) except TypeError: assert stop is None return range(len(start)) def xrange (start, stop=None, step=1, xrange=xrange): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return xrange(start2, stop2, step) except TypeError: assert stop is None return xrange(len(start)) a = [5, 'a', 'Hello, world!'] b = range(a) c = xrange(4, 6) d = xrange(b) e = range(c) print a print b print c print d print e print range(d, 2) Tim Delaney From gward at python.net Wed Jun 27 03:24:32 2001 From: gward at python.net (Greg Ward) Date: Tue, 26 Jun 2001 21:24:32 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: ; from tdelaney@avaya.com on Wed, Jun 27, 2001 at 10:36:25AM +1000 References: Message-ID: <20010626212432.A4003@gerg.ca> On 27 June 2001, Delaney, Timothy said: > If this is to be done, I would also propose that xrange() and range() be > changed to allow passing in a straight-out sequence such as in the following > code in order to get rid of the need for range(len(seq)): I'm +1 on the face of it without stopping to consider any implications. ;-) Some bits of syntactic sugar as just too good to pass up. range(len(sequence)) is syntactic cod-liver oil. Greg -- Greg Ward - programmer-at-big gward at python.net http://starship.python.net/~gward/ Blood is thicker than water, and much tastier. From nas at python.ca Wed Jun 27 03:28:29 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 18:28:29 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 05:23:53PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Message-ID: <20010626182829.A3344@glacier.fnational.com> Fred L. Drake, Jr. wrote: > The "1" and the "976" are the appearant resolution of the time > values reported by those two calls, in microseconds. It looks like > the HZ define in that header file you pointed out could be bumped a > little higher. ;-) I've got it at 1024. >>> 976. / 10000 * 1024 99.942400000000006 I think yours is at the 100 default. Neil From fdrake at acm.org Wed Jun 27 04:14:00 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 22:14:00 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626182829.A3344@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> <20010626182829.A3344@glacier.fnational.com> Message-ID: <15161.16744.665259.229385@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > I've got it at 1024. > > >>> 976. / 10000 * 1024 > 99.942400000000006 > > I think yours is at the 100 default. That's correct. Yours could be bumped a bit (factor of 10? I'm not really sure where it would cause problems in practice, though I think I understand the general explanations I've seen), and mine could be bumped a good bit. But I intend to stick with a stock kernel since I expect most users will be using a stock kernel, and I don't have a pile of extra machines to play with. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From greg at cosc.canterbury.ac.nz Wed Jun 27 04:37:21 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 14:37:21 +1200 (NZST) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> Message-ID: <200106270237.OAA05182@s454.cosc.canterbury.ac.nz> Here are the results from a few machines around here: s454% uname -a SunOS s454 5.7 Generic_106541-10 sun4m sparc SUNW,SPARCstation-4 s454% observation timeofday: 2 (1 calls), rusage: 10000 (22 calls) oma% uname -a SunOS oma 5.7 Generic sun4u sparc SUNW,Ultra-4 oma% observation timeofday: 1 (2 calls), rusage: 10000 (115 calls) pc250% uname -a SunOS pc250 5.8 Generic_108529-03 i86pc i386 i86pc pc250% observation timeofday: 1 (1 calls), rusage: 10000 (232 calls) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From JamesL at Lugoj.Com Wed Jun 27 04:42:20 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 19:42:20 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39480C.F4808C1F@Lugoj.Com> Tim Peters wrote: > [James Logajan] > > Are there still known bugs that will take some effort to repair? Is > > xrange constantly touched when changes are made elsewhere? If no to > > both, then I suggest don't fix what ain't broken; life is too short. > > (Unless it is annoying you to distraction, then do the deed and get > > it over with.) > > I think it's more the latter. I partly provoked this by bitterly pointing > out that there's more code in the CVS tree devoted to supporting the single > xrange() gimmick than Neil Schemenauer added to support the get-out-of-town > more powerful new generators. Masses of crufty code nobody benefits from > are a burden on the soul. Design mistakes one has made do tend to weigh on one's soul (speaking from more than two decades of programming experience) so I understand the primal urge to correct them when one can, and even when one shouldn't. So although I'm quite annoyed by all these new-fangled gimmicks being added to the language (i.e. Python generators being added to solve California's power problems) I have no problem with xrange being fenced in. (I find the very existence of the PEP process somewhat unsettling; there are now thousands of programmers trying to use the language. Why burden them with insuring their programs remain compatible with yet-another-damn-set-of-proposals every year? Or worse: trying to rewrite their code "more elegantly" using all the latest gimmicks. Why in my day, if you wanted to, say, save execution state, you figured out how to do it and didn't go crying to the language designer. Damn these young lazy programmers. Don't know how good they have it. Wouldn't know how to save their execution state if their lives depended on it. Harumph.) Speaking of "generators", I just want to say that I think that "generator" makes for lousy terminology. If I understand correctly, "generators" are coroutines that have peer-to-peer synchronized messaging (synchronizing and communicating at the "yield" points). To my mind, "generators" does not evoke that image at all. Assuming I understand it in my early senility.... > although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- > full-of-crufty-old-irix5-demos-in-the-std-library-ly Perhaps because the Irix community would be quite Irate if they were removed? From tim.one at home.com Wed Jun 27 06:38:15 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 00:38:15 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39480C.F4808C1F@Lugoj.Com> Message-ID: [James Logajan] > Design mistakes one has made do tend to weigh on one's soul (speaking > from more than two decades of programming experience) so I understand > the primal urge to correct them when one can, and even when one > shouldn't. Is this a case when one shouldn't? That is, is it a specific comment on PEP 260, or just a general venting here? > So although I'm quite annoyed by all these new-fangled gimmicks being > added to the language (i.e. Python generators being added to solve > California's power problems) I have no problem with xrange being fenced > in. OK. > (I find the very existence of the PEP process somewhat unsettling; > there are now thousands of programmers trying to use the language. Why > burden them with insuring their programs remain compatible with yet- > another-damn-set-of-proposals every year? You can ask the C, C++, Fortran, Perl, COBOL (etc, etc) folks that too, but I suspect it's a rhetorical question. I wish you could ask the Java committee, but they work in secret . > Or worse: trying to rewrite their code "more elegantly" using all the > latest gimmicks. Use of new features isn't required by Guido, and neither is downloading new releases. If *you* waste your time doing that, we both know it's because you can't resist <0.5 wink>. > ... > Speaking of "generators", I just want to say that I think that > "generator" makes for lousy terminology. A generator, umm, *generates* a sequence of values. It's neither more specific nor more general than that, so we're pretty much limited to vaguely suggestive terms like "generator" and "iterator"; Python already used the latter word for something else. I'd be happy to call them pink flamingos. > If I understand correctly, "generators" are coroutines They're formally semi-coroutines; it's not symmetric. > that have peer-to-peer synchronized messaging (synchronizing and > communicating at the "yield" points). Way too highfalutin' a view. Think of a generator as a resumable function, and you're not missing anything -- not even an implementation subtlety. They *are* resumable functions. A "yield" is just a "return", but with the twist that the function can resume executing after the "yield" again. If you also think of ordinary call/return as a peer-to-peer etc etc, then I suppose you're stuck with that view here too. > To my mind, "generators" does not evoke that image at all. Good, because that image was overblown beyond recognition . >> although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- >> full-of-crufty-old-irix5-demos-in-the-std-library-ly > Perhaps because the Irix community would be quite Irate if they were > removed? Doubt it: the Irix5 library files haven't really been touched since 1993. For several years we've also shipped an Irix6 library with all the same stuff. But I suppose releasing a new OS was a symptom of SGI picking on its users too . From tim.one at home.com Wed Jun 27 07:14:29 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:14:29 -0400 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: The _winreg project no longer links: Creating library ./_winreg_d.lib and object ./_winreg_d.exp _winreg.obj : error LNK2001: unresolved external symbol __imp__PyUnicode_DecodeMBCS The compilation of PyUnicode_DecodeMBCS in unicodeobject.c is in a #if defined(MS_WIN32) && defined(HAVE_USABLE_WCHAR_T) block. But the top of unicodeobject.h now wraps the enabling # if defined(MS_WIN32) && !defined(USE_UCS4_STORAGE) # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif block inside a #ifndef PY_UNICODE_TYPE block, and a change to PC/config.h: #define PY_UNICODE_TYPE unsigned short stops all that. IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and that prevents unicodeobject.c from supplying routines _winreg.c calls. leaving-it-to-an-expert-who-thinks-they-know-what-all-these-symbols- are-supposed-to-really-mean-ly y'rs - tim From greg at cosc.canterbury.ac.nz Wed Jun 27 07:41:46 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 17:41:46 +1200 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" Message-ID: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> I'm trying to install Python-2.1 on Windows, and I keep getting "Corrupt Installation Detected" when I run the installer. From tim.one at home.com Wed Jun 27 07:53:01 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:53:01 -0400 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" In-Reply-To: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm trying to install Python-2.1 on Windows, > and I keep getting "Corrupt Installation Detected" > when I run the installer. [but no other evidence that > it's actually corrupt] You didn't say which flavor of Windows, but should have . Ditto what it is you're running (the PythonLabs distro? ActiveState's? PythonWare's?). Known causes for this from the PythonLabs installer include (across various flavors of Windows), in decreasing order of likelihood: + Trying to install while logged in to an account with insufficient permissions (try logging in as Adminstrator, if on a version of Windows where that makes sense). + Trying to install over a network. Copy the installer to a local disk first. + Conflicts with anti-virus software (disable it -- indeed, my Win9x Life got much saner after I wiped Norton AntiVirus from my hard drive). + Conflicts with other running programs (like installer splash screens always say, close all other programs). + Insufficient memory, disk space, or magic low-level Windows resources. + There may or may not be a problem unique to French versions of Windows. Any of those apply? From martin at loewis.home.cs.tu-berlin.de Wed Jun 27 09:12:11 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 27 Jun 2001 09:12:11 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > that prevents unicodeobject.c from supplying routines _winreg.c > calls. The best thing, IMO, would be if PC/config.h defines everything available in config.h also. In this case, the proper defines would be #define Py_USING_UNICODE #define HAVE_USABLE_WCHAR_T #define Py_UNICODE_SIZE 2 #define PY_UNICODE_TYPE wchar_t If that approach is used, the defaulting in Include/unicodeobject.h could go away. Alternatively, define only Py_USING_UNICODE of this in PC/config.h, and change the block in Include/unicodeobject.h to /* Windows has a usable wchar_t type (unless we're using UCS-4) */ # ifdef MS_WIN32 # ifdef USE_UCS4_STORAGE # define Py_UNICODE_SIZE 4 # define PY_UNICODE_TYPE unsigned int # else # define Py_UNICODE_SIZE 2 # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif # endif Regards, Martin From tim.one at home.com Wed Jun 27 09:39:38 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 03:39:38 -0400 Subject: [Python-Dev] New Unicode warnings Message-ID: There are 3 functions now where the prototypes in unicodeobject.h don't match the definitions in unicodeobject.c. Like, in .h, extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( register const Py_UNICODE ch /* Unicode character */ ); but in .c: Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) That is, they disagree about const (a silly language idea if ever there was one ). The others (I haven't check these for the exact reason(s), but assume they're the same deal): _PyUnicode_ToUppercase _PyUnicode_ToLowercase From Armin.Rigo at ima.unil.ch Wed Jun 27 11:01:18 2001 From: Armin.Rigo at ima.unil.ch (RIGO Armin) Date: Wed, 27 Jun 2001 11:01:18 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B391D88.305CCB4E@ActiveState.com> Message-ID: On Tue, 26 Jun 2001, Paul Prescod wrote: > Armin Rigo wrote: > > I am considering using GNU Lightning to produce code from the Psyco > > compiler. (...) > > Core Python has no GPLed components. I would hate to have you put in a > bunch of work worthy of inclusion in core Python to see it rejected on > those grounds. Good remark. Anyone else has comments about this ? Psyco would probably not be part of the core Python, but only an extension module; but your objection is nevertheless valid. Any alternatives ? I am considering a more theoretical approach, based on Tunes (http://tunes.org) as mentionned in Psyco's readme file, but this would take a lot more time -- althought it might give much more impressive results. Armin. From neal at metaslash.com Wed Jun 27 13:48:00 2001 From: neal at metaslash.com (Neal Norwitz) Date: Wed, 27 Jun 2001 07:48:00 -0400 Subject: [Python-Dev] ANN: PyChecker version 0.6.1 Message-ID: <3B39C7F0.2CA171C5@metaslash.com> A new version of PyChecker is available for your hacking pleasure. PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Comments, criticisms, new ideas, and other feedback is welcome. Here's the CHANGELOG: * Check format strings: "%s %s %s" % (v1, v2, v3, v4) for arg counts * Warn when format strings do: '%(var) %(var2)' * Fix Local variable (xxx) not used, when have: "%(xxx)s" % locals() * Warn when local variable (xxx) doesn't exist and have: "%(xxx)s" % locals() * Install script in /usr/local/bin to invoke PyChecker * Don't produce unused global warnings when using a module in parameters * Don't produce unused global warnings when using a module in class variables * Add check when using method as an attribute (if self.method and x == y:) * Add check for right # of args to object construction * Add check for right # of args to function calls in other modules * Check for returning a value from __init__ * Fix using from XX import YY ; from XX import ZZ causing re-import warning * Fix UNABLE TO IMPORT errors for files that don't end with a newline * Support for checking consistent return values -- not complete produces too many false positives (off by default, use -r/--returnvalues to enable) PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From paulp at ActiveState.com Wed Jun 27 13:53:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 27 Jun 2001 04:53:08 -0700 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B39C924.E865177D@ActiveState.com> RIGO Armin wrote: > >... > > I am considering a more theoretical approach, based on Tunes > (http://tunes.org) as mentionned in Psyco's readme file, but this would > take a lot more time -- althought it might give much more impressive > results. If you are thinking about incorporating some ideas from Tunes that's one thing. But if you want to use their code I would ask "what code?" I have heard about Tunes for several years now and not seen any visible forward progress. See also: http://tunes.org/Tunes-FAQ-6.html#ss6.2 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mark.favas at csiro.au Wed Jun 27 13:48:37 2001 From: mark.favas at csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 19:48:37 +0800 Subject: [Python-Dev] More unicode blues... Message-ID: <3B39C815.E9CDF41B@csiro.au> unicodectype.c now fails to compile, because ch is declared const, and then assigned to. Tim has (apparently) had similar problems, but in his case the compiler just gives a warning, rather than an error.: cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal at lemburg.com Wed Jun 27 14:10:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 27 Jun 2001 14:10:57 +0200 Subject: [Python-Dev] Unicode Maintenance Message-ID: <3B39CD51.406C28F0@lemburg.com> Looking at the recent burst of checkins for the Unicode implementation completely bypassing the standard SF procedure and possible comments I might have on the different approaches, I guess I've been ruled out as maintainer and designer of the Unicode implementation. Well, I guess that's how things go. Was nice working for you guys, but no longer is... I'm tired of having to defend myself against meta-comments about the design, uncontrolled checkins and no true backup about my standing in all this from Guido. Perhaps I am misunderstanding the role of a maintainer and implementation designer, but as it is all respect for the work I've put into all this seems faded. That's the conclusion I draw from recent postings by Martin and Fredrik and their nightly "takeover". Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From arigo at ulb.ac.be Wed Jun 27 14:18:43 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Wed, 27 Jun 2001 14:18:43 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B39C924.E865177D@ActiveState.com> Message-ID: Hello Paul, On Wed, 27 Jun 2001, Paul Prescod wrote: > If you are thinking about incorporating some ideas from Tunes that's one > thing. But if you want to use their code I would ask "what code?" I have > heard about Tunes for several years now and not seen any visible forward > progress. Yes, I know this. I am myself a (recent) member of the Tunes project, and have made Tunes' goals mine. Armin From guido at digicool.com Wed Jun 27 16:32:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 10:32:23 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Wed, 27 Jun 2001 11:01:18 +0200." References: Message-ID: <200106271432.f5REWOn19377@odiug.digicool.com> > Good remark. Anyone else has comments about this ? Not really, except to emphasize that inclusion of GPL'ed code in core Python is indeed a no-no. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 27 16:48:02 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:48:02 +0200 Subject: [Python-Dev] New Unicode warnings References: Message-ID: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> tim peters wrote: > There are 3 functions now where the prototypes in unicodeobject.h don't > match the definitions in unicodeobject.c. Like, in .h, > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > register const Py_UNICODE ch /* Unicode character */ > ); > > but in .c: > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) what's that "register" doing in a prototype? any reason we cannot just change the signature(s) to Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) to make it look more like contemporary C code? From fredrik at pythonware.com Wed Jun 27 16:49:31 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:49:31 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken References: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> Message-ID: <00a101c0ff19$e2a19740$4ffa42d5@hagrid> martin wrote: > > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > > that prevents unicodeobject.c from supplying routines _winreg.c > > calls. > > The best thing, IMO, would be if PC/config.h defines everything > available in config.h also. In this case, the proper defines would be > > #define Py_USING_UNICODE > #define HAVE_USABLE_WCHAR_T > #define Py_UNICODE_SIZE 2 > #define PY_UNICODE_TYPE wchar_t > > If that approach is used, the defaulting in Include/unicodeobject.h > could go away. my fault; I missed the HAVE_USABLE_WCHAR_T define when I tried to fix tim's fix. I'll fix it. From guido at digicool.com Wed Jun 27 17:07:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 11:07:47 -0400 Subject: [Python-Dev] New Unicode warnings In-Reply-To: Your message of "Wed, 27 Jun 2001 16:48:02 +0200." <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> References: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> Message-ID: <200106271507.f5RF7lq19494@odiug.digicool.com> > tim peters wrote: > > > There are 3 functions now where the prototypes in unicodeobject.h don't > > match the definitions in unicodeobject.c. Like, in .h, > > > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > > register const Py_UNICODE ch /* Unicode character */ > > ); > > > > but in .c: > > > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) > > what's that "register" doing in a prototype? Enjoying a day off? > any reason we cannot just change the signature(s) to > > Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) > > to make it look more like contemporary C code? > > I cannot see how either register or const are going to make any difference in the prototype given that Py_UNICODE is a scalar type, so please just do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From JamesL at Lugoj.Com Wed Jun 27 17:58:54 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Wed, 27 Jun 2001 08:58:54 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B3A02BE.21039365@Lugoj.Com> Tim Peters wrote: > > [James Logajan] > > Design mistakes one has made do tend to weigh on one's soul (speaking > > from more than two decades of programming experience) so I understand > > the primal urge to correct them when one can, and even when one > > shouldn't. > > Is this a case when one shouldn't? That is, is it a specific comment on PEP > 260, or just a general venting here? Just a general bit of silly "" venting. Insert some non-zero fraction in the wink. I tried to insert some obvious absurdities to indicate I was not being very serious. (Yes, I know that one shouldn't try that in mixed company.) From guido at digicool.com Wed Jun 27 18:11:49 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:11:49 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 14:10:57 +0200." <3B39CD51.406C28F0@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> Message-ID: <200106271611.f5RGBn819631@odiug.digicool.com> > Looking at the recent burst of checkins for the Unicode implementation > completely bypassing the standard SF procedure and possible comments > I might have on the different approaches, I guess I've been ruled out > as maintainer and designer of the Unicode implementation. > > Well, I guess that's how things go. Was nice working for you guys, > but no longer is... I'm tired of having to defend myself against > meta-comments about the design, uncontrolled checkins and no true > backup about my standing in all this from Guido. > > Perhaps I am misunderstanding the role of a maintainer and > implementation designer, but as it is all respect for the work I've > put into all this seems faded. That's the conclusion I draw from recent > postings by Martin and Fredrik and their nightly "takeover". > > Thanks, > -- > Marc-Andre Lemburg [For those of us to whom Marc-Andre's complaint comes as a total surprise: there was a thread on i18n-sig about whether we should support Unicode surrogates, followed by a conclusion to skip surrogates and jump directly to optional support for UCS-4, followed by some checkins that enabled a configuration choice between UCS-2 and UCS-4, and code to make it work. As a side effect, surrogate support in the UCS-2 version actually improved slightly.] Now, now, Marc-Andre. The only comments I recall from you on my "surrogates: just say no" post seemed favorable, except that you proposed to to all the way and make UCS-4 mandatory. I explained why I didn't want to go that far, and why I didn't believe your arguments against giving users a choice. I didn't hear back from you then, and I didn't think you could have much of a problem with my position. Our process requires the use of the SF patch manager only for controversial changes. Based on your feedback, I didn't think there was anything controversial about the changes that Fredrik and Martin have made! (If there was, IMO it was temporarily breaking the Windows build and the test suite -- but that's all fixed now.) I don't understand where you get the idea that we lost respect for your work! In fact, the fact that it was so easy to make the changes suggested to me that the original design was well suited to this particular change (as opposed to the surrugate support proposals, which all sounded like they would require a *lot* of changes). I don't think that we have very strict roles in this community anyway. (My role as BDFL excluded -- that's why I get to write this response. :-) I'd say that Fredrik owns SRE, because he has asserted that ownership at various times: he's undone changes by others that broke the 1.5.2 support, for example. But the Unicode support in Python isn't owned by one person: many folks have contributed to that, including Fredrik, who designed and wrote the original Unicode string object implementation. If you have specific comments about the changes made, please be specific. If you feel slighted by meta-comments, please also be specific. I don't think I've said anything derogatory about you or your design. Paul Prescod offered to write a PEP on this issue. My cynical half believes that we'll never hear from him again, but my optimistic half hopes that he'll actually write one, so that we'll be able to discuss the various issues for the users with the users. I encourage you to co-author the PEP, since you have a lot of background knowledge about the issues. BTW, I think that Misc/unicode.txt should be converted to a PEP, for the historic record. It was very much a PEP before the PEP process was invented. Barry, how much work would this be? No editing needed, just formatting, and assignment of a PEP number (the lower the better). --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jun 27 18:24:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 12:24:30 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <15162.2238.720508.508081@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> BTW, I think that Misc/unicode.txt should be converted to a GvR> PEP, for the historic record. It was very much a PEP before GvR> the PEP process was invented. Barry, how much work would GvR> this be? No editing needed, just formatting, and assignment GvR> of a PEP number (the lower the better). Not much work at all, so I'll do this (and replace Misc/unicode.txt with a pointer to the PEP). Let's go with PEP 7, but stick it under the "Other Informational PEPs" category. -Barry From guido at digicool.com Wed Jun 27 18:36:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:36:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 12:24:30 EDT." <15162.2238.720508.508081@anthem.wooz.org> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> Message-ID: <200106271636.f5RGa5719660@odiug.digicool.com> > GvR> BTW, I think that Misc/unicode.txt should be converted to a > GvR> PEP, for the historic record. It was very much a PEP before > GvR> the PEP process was invented. Barry, how much work would > GvR> this be? No editing needed, just formatting, and assignment > GvR> of a PEP number (the lower the better). > > Not much work at all, so I'll do this (and replace Misc/unicode.txt > with a pointer to the PEP). Let's go with PEP 7, but stick it under > the "Other Informational PEPs" category. > > -Barry Rather than informational, how about "Standard Track - Accepted (or Final)" ? That really matches the history best. I'd propose PEP number 100 -- the below-100 series is more for meta-PEPs. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jun 27 19:05:35 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 13:05:35 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> <200106271636.f5RGa5719660@odiug.digicool.com> Message-ID: <15162.4703.741647.850696@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Rather than informational, how about "Standard Track - GvR> Accepted (or Final)" ? That really matches the history best. GvR> I'd propose PEP number 100 -- the below-100 series is more GvR> for meta-PEPs. Fine with me. -Barry From fdrake at acm.org Wed Jun 27 21:45:05 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 15:45:05 -0400 (EDT) Subject: [Python-Dev] New profiling interface Message-ID: <15162.14273.490573.156770@cj42289-a.reston1.va.home.com> The new core interface I checked in allows profilers and tracers (debuggers, coverage tools) to be written in C. I still need to write documentation for it; that shouldn't be too far off though. If anyone would like to have this available for Python 2.1.x, I have a version that I developed on the release20-maint branch. It can't be added to that branch since it's pretty clearly a new feature, but the patch is available at: http://starship.python.net/crew/fdrake/patches/py21-profiling.patch Enjoy! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas at csiro.au Wed Jun 27 23:45:17 2001 From: mark.favas at csiro.au (Mark Favas) Date: Thu, 28 Jun 2001 05:45:17 +0800 Subject: [Python-Dev] unicode, "const"s and lvalues Message-ID: <3B3A53ED.A8EEE265@csiro.au> Unreasonable as it may seem, my compiler really expects that entities declared as const's not be used in contexts where a modifiable lvalue is required. It gets all huffy, and refuses to continue compiling, even if I speak nicely (in unicode) to it. I'll file a bug report. On the code, not the compiler . cc -c -O -Olimit 1500 -Dss_family=__ss_family -Dss_len=__ss_len -I. -I./Include -DHAVE_CONFIG_H -o Objects/unicodectype.o Objects/unicodectype.c cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 362: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; ----^ cc: Error: Objects/unicodectype.c, line 366: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 378: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->lower; ----^ cc: Error: Objects/unicodectype.c, line 382: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ make: *** [Objects/unicodectype.o] Error 1 -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido at digicool.com Wed Jun 27 23:57:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 17:57:16 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: Your message of "Thu, 28 Jun 2001 05:45:17 +0800." <3B3A53ED.A8EEE265@csiro.au> References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <200106272157.f5RLvGo20101@odiug.digicool.com> > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. On the code, > not the compiler . VC++ also warns about this. I think the declaration of the Character Type APIs in unicodeobject.h really shouldn't include either register or char. Then their implementations should also lose the 'const'. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jun 27 23:58:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 17:58:34 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: <3B3A53ED.A8EEE265@csiro.au> Message-ID: [Mark Favas] > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. No real need, this was already brought up about 13 hours ago, although maybe that was only on the i18n-sig. I was left with the vague impression that Fredrik intended to fix it. If it's not fixed by tomorrow, you can make me feel guilty enough to fix it (I first reported it, so I guess it's my problem ). could've-been-yours!-ly y'rs - tim From fredrik at pythonware.com Thu Jun 28 00:42:14 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 28 Jun 2001 00:42:14 +0200 Subject: [Python-Dev] unicode, "const"s and lvalues References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <00b701c0ff5a$6ab8f660$4ffa42d5@hagrid> mark wrote: > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. it's fixed now, I think. (btw, unreasonable as it may seem, your mail server refuses to accept mail sent to your reply address, even if I speak nicely to it ;-) Cheers /F From fdrake at acm.org Thu Jun 28 04:44:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 22:44:54 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Is anyone here using NIS (Sun's old "Yellow Pages" service)? There's a bug for this on Linux that's been assigned to me for some time, but I don't have access to a network using NIS. Can anyone either confirm the bug or the fix? Or at least confirm that the suggested fix doesn't break the nis module on some other platform? (Testing this on a Sun SPARC box would be really nice!) I'd really appreciate some help on this one. The bug report is: http://sourceforge.net/tracker/index.php?func=detail&aid=233084&group_id=5470&atid=105470 Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Thu Jun 28 10:13:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 28 Jun 2001 10:13:09 +0200 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> References: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Message-ID: <20010628101309.X8098@xs4all.nl> On Wed, Jun 27, 2001 at 10:44:54PM -0400, Fred L. Drake, Jr. wrote: > Is anyone here using NIS (Sun's old "Yellow Pages" service)? > There's a bug for this on Linux that's been assigned to me for some > time, but I don't have access to a network using NIS. Can anyone > either confirm the bug or the fix? Or at least confirm that the > suggested fix doesn't break the nis module on some other platform? > (Testing this on a Sun SPARC box would be really nice!) > I'd really appreciate some help on this one. The bug report is: If noone else pops up, I'll setup a small NIS network at home to test it when my new computer arrives (a week or two.) We use NIS a lot at work, but not on Linux machines (the 16-bit uid limitation prevented us from using Linux for user-accessible machines for a long time.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu Jun 28 11:04:07 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 28 Jun 2001 11:04:07 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <3B3AF307.6496AFB4@lemburg.com> Guido van Rossum wrote: > > > Looking at the recent burst of checkins for the Unicode implementation > > completely bypassing the standard SF procedure and possible comments > > I might have on the different approaches, I guess I've been ruled out > > as maintainer and designer of the Unicode implementation. > > > > Well, I guess that's how things go. Was nice working for you guys, > > but no longer is... I'm tired of having to defend myself against > > meta-comments about the design, uncontrolled checkins and no true > > backup about my standing in all this from Guido. > > > > Perhaps I am misunderstanding the role of a maintainer and > > implementation designer, but as it is all respect for the work I've > > put into all this seems faded. That's the conclusion I draw from recent > > postings by Martin and Fredrik and their nightly "takeover". > > > > Thanks, > > -- > > Marc-Andre Lemburg > > [For those of us to whom Marc-Andre's complaint comes as a total > surprise: there was a thread on i18n-sig about whether we should > support Unicode surrogates, followed by a conclusion to skip > surrogates and jump directly to optional support for UCS-4, followed > by some checkins that enabled a configuration choice between UCS-2 and > UCS-4, and code to make it work. As a side effect, surrogate support > in the UCS-2 version actually improved slightly.] > > Now, now, Marc-Andre. > > The only comments I recall from you on my "surrogates: just say no" > post seemed favorable, except that you proposed to to all the way and > make UCS-4 mandatory. I explained why I didn't want to go that far, > and why I didn't believe your arguments against giving users a choice. > I didn't hear back from you then, and I didn't think you could have > much of a problem with my position. > > Our process requires the use of the SF patch manager only for > controversial changes. Based on your feedback, I didn't think there > was anything controversial about the changes that Fredrik and Martin > have made! (If there was, IMO it was temporarily breaking the Windows > build and the test suite -- but that's all fixed now.) > > I don't understand where you get the idea that we lost respect for > your work! In fact, the fact that it was so easy to make the changes > suggested to me that the original design was well suited to this > particular change (as opposed to the surrugate support proposals, > which all sounded like they would require a *lot* of changes). > > I don't think that we have very strict roles in this community anyway. > (My role as BDFL excluded -- that's why I get to write this > response. :-) I'd say that Fredrik owns SRE, because he has asserted > that ownership at various times: he's undone changes by others that > broke the 1.5.2 support, for example. > > But the Unicode support in Python isn't owned by one person: many > folks have contributed to that, including Fredrik, who designed and > wrote the original Unicode string object implementation. > > If you have specific comments about the changes made, please be > specific. If you feel slighted by meta-comments, please also be > specific. I don't think I've said anything derogatory about you or > your design. You didn't get my point. I feel responsable for the Unicode implementation design and would like to see it become a continued success. In that sense and taking into account that I am the maintainer of all this stuff, I think it is very reasonable to ask me before making any significant changes to the implementation and also respect any comments I put forward. Currently, I have to watch the checkins list very closely to find out who changed what in the implementation and then to take actions only after the fact. Since I'm not supporting Unicode as my full-time job this is simply impossible. We have the SF manager and there is really no need to rush anything around here. If I am offline or too busy with other things for a day or two, then I want to see patches on SF and not find new versions of the implementation already checked in. This has worked just fine during the last year, so I can only explain the latest actions in this direction with an urge to bypass my comments and any discussion this might cause. Needless to say that quality control is not possible anymore. Conclusion: I am not going to continue this work if this does not change. Another other problem for me is the continued hostility I feel on i18n against parts of the design and some of my decisions. I am not talking about your feedback and the feedback from many other people on the list which was excellent and to high standards. But reading the postings of the last few months you will find notices of what I am referring to here (no, I don't want to be specific). If people don't respect my comments or decision, then how can I defend the design and how can I stop endless discussions which simply don't lead anywhere ? So either I am missing something or there is a need for a clear statement from you about my status in all this. If I don't have the right to comment on proposals and patches, possibly even rejecting them, then I simply don't see any ground for keeping the implementation in a state which I can maintain. And last but not least: The fun-factor has faded which was the main motor driving my into working on Unicode in the first place. Nothing much you can do about this, though :-/ > Paul Prescod offered to write a PEP on this issue. My cynical half > believes that we'll never hear from him again, but my optimistic half > hopes that he'll actually write one, so that we'll be able to discuss > the various issues for the users with the users. I encourage you to > co-author the PEP, since you have a lot of background knowledge about > the issues. I guess your optimistic half won :-) I think Paul already did all the work, so I'll simply comment on what he wrote. > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > the historic record. It was very much a PEP before the PEP process > was invented. Barry, how much work would this be? No editing needed, > just formatting, and assignment of a PEP number (the lower the better). Thanks for converting the text to PEP format, Barry. Thanks for reading this far, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Thu Jun 28 14:25:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 28 Jun 2001 08:25:14 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Thu, 28 Jun 2001 11:04:07 +0200." <3B3AF307.6496AFB4@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> Message-ID: <200106281225.f5SCPIr20874@odiug.digicool.com> Hi Marc-Andre, I'm dropping the i18n-sig from the distribution list. I hear you: > You didn't get my point. I feel responsable for the Unicode > implementation design and would like to see it become a continued > success. I'm sure we all share this goal! > In that sense and taking into account that I am the > maintainer of all this stuff, I think it is very reasonable to > ask me before making any significant changes to the implementation > and also respect any comments I put forward. I understand you feel that we've rushed this in without waiting for your comments. Given how close your implementation was, I still feel that the changes weren't that significant, but I understand that you get nervous. If Christian were to check in his speed hack changes to the guts of ceval.c I would be nervous too! (Heck, I got nervous when Eric checked in his library-wide string method changes without asking.) Next time I'll try to be more sensitive to situations that require your review before going forward. > Currently, I have to watch the checkins list very closely > to find out who changed what in the implementation and then to > take actions only after the fact. Since I'm not supporting Unicode > as my full-time job this is simply impossible. We have the SF manager > and there is really no need to rush anything around here. Hm, apart from the fact that you ought to be left in charge, I think that in this case the live checkins were a big win over the usual SF process. At least two people were making changes, sometimes to each other's code, and many others on at least three continents were checking out the changes on many different platforms and immediately reporting problems. We would definitely not have a patch as solid as the code that's now checked in, after two days of using SF! (We could've used a branch, but I've found that getting people to actually check out the branch is not easy.) So I think that the net result was favorable. Sometimes you just have to let people work in the spur of the moment to get the results of their best thinking, otherwise they lose interest or their train of thought. > If I am offline or too busy with other things for a day or two, > then I want to see patches on SF and not find new versions of > the implementation already checked in. That's still the general rule, but in our enthousiasm (and mine was definitely part of this!) we didn't want to wait. Also, I have to admit that I mistook your silence for consent -- I didn't think the main proposed changes (making the size of Py_UNICODE a config choice) were controversial at all, so I didn't realize you would have a problem with it. > This has worked just fine during the last year, so I can only explain > the latest actions in this direction with an urge to bypass my comments > and any discussion this might cause. I think you're projecting your own stuff here. I honestly didn't think there was much disagreement on your part and thought we were doing you a favor by implementing the consensus. IMO, Martin and and Fredrik are familiar enough with both the code and the issues to do a good job. > Needless to say that > quality control is not possible anymore. Unclear. Lots of other people looked over the changes in your absence. And CVS makes code review after it's checked in easy enough. (Hey, in many other open source projects that's the normal procedure once the rough characteristics of a feature have been agreed upon: check in first and review later!) > Conclusion: > I am not going to continue this work if this does not change. That would be sad, and I hope you will stay with us. We certainly don't plan to ignore your comments! > Another other problem for me is the continued hostility I feel on i18n > against parts of the design and some of my decisions. I am > not talking about your feedback and the feedback from many other > people on the list which was excellent and to high standards. > But reading the postings of the last few months you will > find notices of what I am referring to here (no, I don't want > to be specific). I don't know what to say about this, and obviously nobody has the time to go back and read the archives. I'm sure it's not you as a person that was attacked. If the design isn't perfect -- and hey, since Python is the 80 percent language, few things in it are quite perfect! -- then (positive) criticism is an attempt to help, to move it closer to perfection. If people have at times said "the Unicode support sucks", well, that may hurt. You can't always stay friends with everybody. I get flames occasionally for features in Python that folks don't like. I get used to them, and it doesn't affect my confidence any more. Be the same! But sometimes, after saying "it sucks", people make specific suggestions for improvements, and it's important to be open for those even from sources that use offending language. (Within reason, of course. I don't ask you to listen to somebody who is persistently hostile to you as a person.) > If people don't respect my comments or decision, then how can > I defend the design and how can I stop endless discussions which > simply don't lead anywhere ? So either I am missing something > or there is a need for a clear statement from you about > my status in all this. Do you really *want* to be the Unicode BDFL? Being something's BDFL a full-time job, and you've indicated you're too busy. (Or is that temporary?) I see you as the original coder, which means that you know that section of the code better than anyone, and whenever there's a question that others can't answer about its design, implementation, or restrictions, I refer to you. But given that you've said you wouldn't be able to work much on it, I welcome contributions by others as long as they seem knowledgeable. > If I don't have the right to comment on proposals and patches, > possibly even rejecting them, then I simply don't see any > ground for keeping the implementation in a state which I can > maintain. Nobody said you couldn't comment, and you know that. When it comes to rejecting or accepting, I feel that I am still the final arbiter, even for Unicode, until I get hit by a bus. Since I don't always understand the implementation or the issues, I'll of course defer to you in cases where I think I can't make the decision, but I do reserve the right to be convinced by others to override your judgement, occasionally, if there's a good reason. And when you're not responsive, I may try to channel you. (I'll try to be more explicit about that.) > And last but not least: The fun-factor has faded which was > the main motor driving my into working on Unicode in the first > place. Nothing much you can do about this, though :-/ Yes, that happens to all of us at times. The fun factor goes up and down, and sometimes we must look for fun elsewhere for a while. Then the fun may come back where it appeared lost. Go on vacation, read a book, tackle a new project in a totally different area! Then come back and see if you can find some fun in the old stuff again. > > Paul Prescod offered to write a PEP on this issue. My cynical half > > believes that we'll never hear from him again, but my optimistic half > > hopes that he'll actually write one, so that we'll be able to discuss > > the various issues for the users with the users. I encourage you to > > co-author the PEP, since you have a lot of background knowledge about > > the issues. > > I guess your optimistic half won :-) I think Paul already did all the > work, so I'll simply comment on what he wrote. Your suggestions were very valuable. My opinion of Paul also went up a notch! > > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > > the historic record. It was very much a PEP before the PEP process > > was invented. Barry, how much work would this be? No editing needed, > > just formatting, and assignment of a PEP number (the lower the better). > > Thanks for converting the text to PEP format, Barry. > > Thanks for reading this far, You're welcome, and likewise. Just one more thing, Marc-Andre. Please know that I respect your work very much even if we don't always agree. We would get by without you, but Python would be hurt if you turned your back on us. --Guido van Rossum (home page: http://www.python.org/~guido/) From arigo at ulb.ac.be Thu Jun 28 15:04:06 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Thu, 28 Jun 2001 15:04:06 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B393E92.B0719A7A@ulb.ac.be> Message-ID: On Tue, 26 Jun 2001, Armin Rigo wrote: > I am considering using GNU Lightning to produce code from the Psyco > compiler. I just found "vcode" (http://www.pdos.lcs.mit.edu/~engler/pldi96-abstract.html), which seems very interesting for portable JIT code generation. I am considering using it for Psyco. Has someone some experience with vcode ? Or any other comments ? Armin. From gball at cfa.harvard.edu Thu Jun 28 17:26:36 2001 From: gball at cfa.harvard.edu (Greg Ball) Date: Thu, 28 Jun 2001 11:26:36 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: Short version: I can confirm that bug under linux, but the patch breaks nis module on solaris. Linux machine is: Linux malhar 2.2.16-3smp #1 SMP Mon Jun 19 17:37:04 EDT 2000 i686 unknown with python version from recent CVS. I see the reported bug and the suggested patch does fix the problem. Sparc box looks like this: SunOS cfa0 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-Enterprise using python2.0 source tree. The nis module works out of the box, but applying the suggested patch breaks it: 'nis.error: No such key in map'. --Greg Ball From gregor at hoffleit.de Thu Jun 28 21:56:35 2001 From: gregor at hoffleit.de (Gregor Hoffleit) Date: Thu, 28 Jun 2001 21:56:35 +0200 Subject: [Python-Dev] MAGIC after 2001 ? Message-ID: <20010628215635.A5621@53b.hoffleit.de> Correct me, but AFAICS there are only 186 days left until Python's MAGIC scheme overflows: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) I couldn't find this problem in the SF bug tracking system. Should I submit a new bug entry ? Gregor From jack at oratrix.nl Thu Jun 28 23:03:47 2001 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:03:47 +0200 Subject: [Python-Dev] Passing silly values to time.strftime Message-ID: <20010628210352.33157120260@oratrix.oratrix.nl> Just noted (that's Just-the-person, not me-just-noting:-) that on the Mac time.strftime() can blow up with an access violation if you pass silly values to it (such as 9 zeroes). Does anyone know enough of the ANSI standard to tell me how strftime should behave with out-of-range values? I.e. should I report this as a bug to MetroWerks or should we rig up time.strftime() to check that all the values are in range? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Thu Jun 28 23:12:45 2001 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:12:45 +0200 Subject: [Python-Dev] Passing silly values to time.strftime In-Reply-To: Message by Jack Jansen , Thu, 28 Jun 2001 23:03:47 +0200 , <20010628210352.33157120260@oratrix.oratrix.nl> Message-ID: <20010628211250.4A6BC120260@oratrix.oratrix.nl> Recently, Jack Jansen said: > Just noted (that's Just-the-person, not me-just-noting:-) that on the > Mac time.strftime() can blow up with an access violation if you pass > silly values to it (such as 9 zeroes). Following up to myself, after I just noticed (just-me-noticing, not Just-the-person this time) that all zeros is a legal C value: gettmarg() converts this all-zeroes tuple to (0, 0, 0, 0, -1, 100, 0, -1, 0) Fine with me, apparently Python wants to have human-understandable (1-based) monthnumbers and yeardaynumbers, but then I think it really should also check that the values are in-range. What do others think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Jason.Tishler at dothill.com Thu Jun 28 23:17:15 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Thu, 28 Jun 2001 17:17:15 -0400 Subject: [Python-Dev] Threaded Cygwin Python Import Problem Message-ID: <20010628171715.P488@dothill.com> Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now provides enough pthreads support so that Cygwin Python builds OOTB *and* functions reasonably well even with threads enabled. Unfortunately, there are still a few issues that need to be resolved. The one that I would like to address in this posting prevents a threaded Cygwin Python from building the standard extension modules (without some kind of intervention). :,( Specifically, the build would frequently hang during the Distutils part when Cygwin Python is attempting to execvp a gcc process. See the first attachment, test.py, for a minimal Python script that exhibits the hang. See the second attachment, test.c, for a rewrite of test.py in C. Since test.c did not hang, I was able to conclude that this was not just a straight Cygwin problem. Further tracing uncovered that the hang occurs in _execvpe() (in os.py), when the child tries to import tempfile. If I apply the third attachment, os.py.patch, then the hang is avoided. Hence, it appears that importing a module (or specifically the tempfile module) in a threaded Cygwin Python child cause a hang. I saw the following comment in _execvpe(): # Process handling (fork, wait) under BeOS (up to 5.0) # doesn't interoperate reliably with the thread interlocking # that happens during an import. The actual error we need # is the same on BeOS for posix.open() et al., ENOENT. The above makes me think that possibly Cygwin is having a similar problem. Can anyone offer suggestions on how to further debug this problem? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com -------------- next part -------------- import os cmd = ['ls', '-l'] pid = os.fork() if pid == 0: print 'child execvp-ing' os.execvp(cmd[0], cmd) else: (pid, status) = os.waitpid(pid, 0) print 'status =', status print 'parent done' -------------- next part -------------- #include #include char* const cmd[] = {"ls", "-l", 0}; int main() { int status; pid_t pid = fork(); if (pid == 0) { printf("child execvp-ing\n"); execvp(cmd[0], cmd); } else { waitpid(pid, &status, 0); printf("status = %d\n", status); printf("parent done\n"); } } -------------- next part -------------- --- os.py.orig Thu Jun 28 16:14:28 2001 +++ os.py Thu Jun 28 16:30:12 2001 @@ -329,8 +329,9 @@ def _execvpe(file, args, env=None): try: unlink('/_#.# ## #.#') except error, _notfound: pass else: - import tempfile - t = tempfile.mktemp() + #import tempfile + #t = tempfile.mktemp() + t = '/mnt/c/TEMP/@279.3' # Exec a file that is guaranteed not to exist try: execv(t, ('blah',)) except error, _notfound: pass From tim at digicool.com Thu Jun 28 23:24:17 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 28 Jun 2001 17:24:17 -0400 Subject: [Python-Dev] MAGIC after 2001 ? In-Reply-To: <20010628215635.A5621@53b.hoffleit.de> Message-ID: [Gregor Hoffleit] > Correct me, Can't: you're correct. > but AFAICS there are only 186 days left until Python's MAGIC scheme > overflows: > > /* XXX Perhaps the magic number should be frozen and a version field > added to the .pyc file header? */ > /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ > #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) > > I couldn't find this problem in the SF bug tracking system. Should I > submit a new bug entry ? Somebody should! It's a known problem, but the last crusade to redefine it ended up with 85% of a spec but no worker bees. If that continues, note that it has no effect on whether existing Python releases will continue to run, it just means we can't release new versions -- but now that the licensing issue is settled, I think we'll just close down the project instead . fun-while-it-lasted-ly y'rs - tim From paulp at ActiveState.com Fri Jun 29 04:59:45 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 28 Jun 2001 19:59:45 -0700 Subject: [Python-Dev] [Fwd: PEP: Support for "wide" Unicode characters] Message-ID: <3B3BEF21.63411C4C@ActiveState.com> Slow python-dev day...consider this exiting new proposal to allow deal with important new characters like the Japanese dentristy symbols and ecological symbols (but not Klingon) -------- Original Message -------- Subject: PEP: Support for "wide" Unicode characters Date: Thu, 28 Jun 2001 15:33:00 -0700 From: Paul Prescod Organization: ActiveState To: "python-list at python.org" PEP: 261 Title: Support for "wide" Unicode characters Version: $Revision: 1.3 $ Author: paulp at activestate.com (Paul Prescod) Status: Draft Type: Standards Track Created: 27-Jun-2001 Python-Version: 2.2 Post-History: 27-Jun-2001, 28-Jun-2001 Abstract Python 2.1 unicode characters can have ordinals only up to 2**16 -1. These characters are known as Basic Multilinual Plane characters. There are now characters in Unicode that live on other "planes". The largest addressable character in Unicode has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR and call characters in this range "wide characters". Glossary Character Used by itself, means the addressable units of a Python Unicode string. Code point If you imagine Unicode as a mapping from integers to characters, each integer represents a code point. Some are really used for characters. Some will someday be used for characters. Some are guaranteed never to be used for characters. Unicode character A code point defined in the Unicode standard whether it is already assigned or not. Identified by an integer. Code unit An integer representing a character in some encoding. Surrogate pair Two code units that represnt a single Unicode character. Proposed Solution One solution would be to merely increase the maximum ordinal to a larger value. Unfortunately the only straightforward implementation of this idea is to increase the character code unit to 4 bytes. This has the effect of doubling the size of most Unicode strings. In order to avoid imposing this cost on every user, Python 2.2 will allow 4-byte Unicode characters as a build-time option. Users can choose whether they care about wide characters or prefer to preserve memory. The 4-byte option is called "wide Py_UNICODE". The 2-byte option is called "narrow Py_UNICODE". Most things will behave identically in the wide and narrow worlds. * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a length-one string. * unichr(i) for 2**16 <= i <= TOPCHAR will return a length-one string representing the character on wide Python builds. On narrow builds it will return ValueError. ISSUE: Python currently allows \U literals that cannot be represented as a single character. It generates two characters known as a "surrogate pair". Should this be disallowed on future narrow Python builds? ISSUE: Should Python allow the construction of characters that do not correspond to Unicode characters? Unassigned Unicode characters should obviously be legal (because they could be assigned at any time). But code points above TOPCHAR are guaranteed never to be used by Unicode. Should we allow access to them anyhow? * ord() is always the inverse of unichr() * There is an integer value in the sys module that describes the largest ordinal for a Unicode character on the current interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds of Python and TOPCHAR on wide builds. ISSUE: Should there be distinct constants for accessing TOPCHAR and the real upper bound for the domain of unichr (if they differ)? There has also been a suggestion of sys.unicodewith which can take the values 'wide' and 'narrow'. * codecs will be upgraded to support "wide characters" (represented directly in UCS-4, as surrogate pairs in UTF-16 and as multi-byte sequences in UTF-8). On narrow Python builds, the codecs will generate surrogate pairs, on wide Python builds they will generate a single character. This is the main part of the implementation left to be done. * there are no restrictions on constructing strings that use code points "reserved for surrogates" improperly. These are called "isolated surrogates". The codecs should disallow reading these but you could construct them using string literals or unichr(). unichr() is not restricted to values less than either TOPCHAR nor sys.maxunicode. Implementation There is a new (experimental) define: #define PY_UNICODE_SIZE 2 There is a new configure options: --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses wchar_t if it fits --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses whchar_t if it fits --enable-unicode same as "=ucs2" The intention is that --disable-unicode, or --enable-unicode=no removes the Unicode type altogether; this is not yet implemented. Notes This PEP does NOT imply that people using Unicode need to use a 4-byte encoding. It only allows them to do so. For example, ASCII is still a legitimate (7-bit) Unicode-encoding. Rationale for Surrogate Creation Behaviour Python currently supports the construction of a surrogate pair for a large unicode literal character escape sequence. This is basically designed as a simple way to construct "wide characters" even in a narrow Python build. ISSUE: surrogates can be created this way but the user still needs to be careful about slicing, indexing, printing etc. Another option is to remove knowledge of surrogates from everything other than the codecs. Rejected Suggestions There were two primary solutions that were rejected. The first was more or less the status-quo. We could officially say that Python characters represent UTF-16 code units and require programmers to implement wide characters in their application logic. This is a heavy burden because emulating 32-bit characters is likely to be very inefficient if it is coded entirely in Python. Plus these abstracted pseudo-strings would not be legal as input to the regular expression engine. The other class of solution is to use some efficient storage internally but present an abstraction of wide characters to the programmer. Any of these would require a much more complex implementation than the accepted solution. For instance consider the impact on the regular expression engine. In theory, we could move to this implementation in the future without breaking Python code. A future Python could "emulate" wide Python semantics on narrow Python. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- http://mail.python.org/mailman/listinfo/python-list From fdrake at acm.org Fri Jun 29 16:03:28 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 29 Jun 2001 10:03:28 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: References: Message-ID: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Greg Ball writes: > Short version: I can confirm that bug under linux, but the patch breaks > nis module on solaris. I'm presuming that these were using the same NIS server? I'm wondering if this may be an endianess-related problem. I don't understand enough about the NIS protocols to know what's going on in that module. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at egenix.com Fri Jun 29 16:51:04 2001 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 29 Jun 2001 16:51:04 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> Message-ID: <3B3C95D8.518E5175@egenix.com> Paul Prescod wrote: > > Slow python-dev day...consider this exiting new proposal to allow deal > with important new characters like the Japanese dentristy symbols and > ecological symbols (but not Klingon) More comments... > -------- Original Message -------- > Subject: PEP: Support for "wide" Unicode characters > Date: Thu, 28 Jun 2001 15:33:00 -0700 > From: Paul Prescod > Organization: ActiveState > To: "python-list at python.org" > > PEP: 261 > Title: Support for "wide" Unicode characters > Version: $Revision: 1.3 $ > Author: paulp at activestate.com (Paul Prescod) > Status: Draft > Type: Standards Track > Created: 27-Jun-2001 > Python-Version: 2.2 > Post-History: 27-Jun-2001, 28-Jun-2001 > > Abstract > > Python 2.1 unicode characters can have ordinals only up to 2**16-1. > These characters are known as Basic Multilinual Plane characters. > There are now characters in Unicode that live on other "planes". > The largest addressable character in Unicode has the ordinal 17 * > 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR > and call characters in this range "wide characters". > > Glossary > > Character > > Used by itself, means the addressable units of a Python > Unicode string. > > Code point > > If you imagine Unicode as a mapping from integers to > characters, each integer represents a code point. Some are > really used for characters. Some will someday be used for > characters. Some are guaranteed never to be used for > characters. > > Unicode character > > A code point defined in the Unicode standard whether it is > already assigned or not. Identified by an integer. You're mixing terms here: being a character in Unicode is a property which is defined by the Unicode specs; not all code points are characters ! I'd suggest not to use the term character in this PEP at all; this is also what Mark Davis recommends in his paper on Unicode. That way people reading the PEP won't even start to confuse things since they will most likely have to read this glossary to understand what code point and code units are. Also, a link to the Unicode glossary would be a good thing. > Code unit > > An integer representing a character in some encoding. A code unit is the basic storage unit used by Unicode strings, e.g. u[0], not necessarily a character. > Surrogate pair > > Two code units that represnt a single Unicode character. Please add Unicode string A sequence of code units. and a note that on wide builds: code unit == code point. > Proposed Solution > > One solution would be to merely increase the maximum ordinal to a > larger value. Unfortunately the only straightforward > implementation of this idea is to increase the character code unit > to 4 bytes. This has the effect of doubling the size of most > Unicode strings. In order to avoid imposing this cost on every > user, Python 2.2 will allow 4-byte Unicode characters as a > build-time option. Users can choose whether they care about > wide characters or prefer to preserve memory. > > The 4-byte option is called "wide Py_UNICODE". The 2-byte option > is called "narrow Py_UNICODE". > > Most things will behave identically in the wide and narrow worlds. > > * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a > length-one string. > > * unichr(i) for 2**16 <= i <= TOPCHAR will return a > length-one string representing the character on wide Python > builds. On narrow builds it will return ValueError. > > ISSUE: Python currently allows \U literals that cannot be > represented as a single character. It generates two > characters known as a "surrogate pair". Should this be > disallowed on future narrow Python builds? Why not make the codec used by Python to convert Unicode literals to Unicode strings an option just like the default encoding ? That way we could have a version of the unicode-escape codec which supports surrogates and one which doesn't. > ISSUE: Should Python allow the construction of characters > that do not correspond to Unicode characters? > Unassigned Unicode characters should obviously be legal > (because they could be assigned at any time). But > code points above TOPCHAR are guaranteed never to > be used by Unicode. Should we allow access to them > anyhow? I wouldn't count on that last point ;-) Please note that you are mixing terms: you don't construct characters, you construct code points. Whether the concatenation of these code points makes a valid Unicode character string is an issue which applications and codecs have to decide. > * ord() is always the inverse of unichr() > > * There is an integer value in the sys module that describes the > largest ordinal for a Unicode character on the current > interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds > of Python and TOPCHAR on wide builds. > > ISSUE: Should there be distinct constants for accessing > TOPCHAR and the real upper bound for the domain of > unichr (if they differ)? There has also been a > suggestion of sys.unicodewith which can take the > values 'wide' and 'narrow'. > > * codecs will be upgraded to support "wide characters" > (represented directly in UCS-4, as surrogate pairs in UTF-16 and > as multi-byte sequences in UTF-8). On narrow Python builds, the > codecs will generate surrogate pairs, on wide Python builds they > will generate a single character. This is the main part of the > implementation left to be done. > > * there are no restrictions on constructing strings that use > code points "reserved for surrogates" improperly. These are > called "isolated surrogates". The codecs should disallow reading > these but you could construct them using string literals or > unichr(). unichr() is not restricted to values less than either > TOPCHAR nor sys.maxunicode. > > Implementation > > There is a new (experimental) define: > > #define PY_UNICODE_SIZE 2 > > There is a new configure options: > > --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses > wchar_t if it fits > --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses > whchar_t if it fits > --enable-unicode same as "=ucs2" > > The intention is that --disable-unicode, or --enable-unicode=no > removes the Unicode type altogether; this is not yet implemented. > > Notes > > This PEP does NOT imply that people using Unicode need to use a > 4-byte encoding. It only allows them to do so. For example, > ASCII is still a legitimate (7-bit) Unicode-encoding. > > Rationale for Surrogate Creation Behaviour > > Python currently supports the construction of a surrogate pair > for a large unicode literal character escape sequence. This is > basically designed as a simple way to construct "wide characters" > even in a narrow Python build. > > ISSUE: surrogates can be created this way but the user still > needs to be careful about slicing, indexing, printing > etc. Another option is to remove knowledge of > surrogates from everything other than the codecs. +1 on removing knowledge about surrogates from the Unicode implementation core (it's also the easiest: there is none :-) We should provide a new module which provides a few handy utilities though: functions which provide code point-, character-, word- and line- based indexing into Unicode strings. > Rejected Suggestions > > There were two primary solutions that were rejected. The first was > more or less the status-quo. We could officially say that Python > characters represent UTF-16 code units and require programmers to > implement wide characters in their application logic. This is a > heavy burden because emulating 32-bit characters is likely to be > very inefficient if it is coded entirely in Python. Plus these > abstracted pseudo-strings would not be legal as input to the > regular expression engine. > > The other class of solution is to use some efficient storage > internally but present an abstraction of wide characters > to the programmer. Any of these would require a much more complex > implementation than the accepted solution. For instance consider > the impact on the regular expression engine. In theory, we could > move to this implementation in the future without breaking Python > code. A future Python could "emulate" wide Python semantics on > narrow Python. > > Copyright > > This document has been placed in the public domain. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jepler at inetnebr.com Fri Jun 29 17:04:18 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Fri, 29 Jun 2001 10:04:18 -0500 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Jun 29, 2001 at 10:03:28AM -0400 References: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Message-ID: <20010629100416.A24069@inetnebr.com> On Fri, Jun 29, 2001 at 10:03:28AM -0400, Fred L. Drake, Jr. wrote: > > Greg Ball writes: > > Short version: I can confirm that bug under linux, but the patch breaks > > nis module on solaris. > > I'm presuming that these were using the same NIS server? I'm > wondering if this may be an endianess-related problem. I don't > understand enough about the NIS protocols to know what's going on in > that module. It's my suspicion that it depends how the "aliases" map is built. The patch that "broke" things for the Linux systems includes the comment /* created with 'makedbm -a' */ which makes me suspect that it's dependant on the way the map is constructed. (I couldn't find an online makedbm manpage which documents a -a option) Endian issues should not exist, the protocol below NIS/YP takes care of this. Jeff From guido at digicool.com Fri Jun 29 17:24:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 29 Jun 2001 11:24:56 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Fri, 29 Jun 2001 16:51:04 +0200." <3B3C95D8.518E5175@egenix.com> References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <200106291525.f5TFP0H29410@odiug.digicool.com> > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. I like this idea! I know that I *still* have a hard time not to think "C 'char' datatype, i.e. an 8-bit byte" when I read "character"... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Smart idea, but how practical is this? Can you spec this out a bit more? > +1 on removing knowledge about surrogates from the Unicode > implementation core (it's also the easiest: there is none :-) Except for \U currently -- or is that not part of the implementation core? > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. But its design is outside the scope of this PEP, I'd say. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Sat Jun 30 03:16:25 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 29 Jun 2001 18:16:25 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <3B3D2869.5C1DDCF1@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. That's fine, but Python does have a concept of character and I'm going to use the term character for discussing these. > Also, a link to the Unicode glossary would be a good thing. Funny how these little PEPs grow... >... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Adding more and more knobs to tweak just adds up to Python code being non-portable from one machine to another. > > ISSUE: Should Python allow the construction of characters > > that do not correspond to Unicode characters? > > Unassigned Unicode characters should obviously be legal > > (because they could be assigned at any time). But > > code points above TOPCHAR are guaranteed never to > > be used by Unicode. Should we allow access to them > > anyhow? > > I wouldn't count on that last point ;-) > > Please note that you are mixing terms: you don't construct > characters, you construct code points. Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. unichr() does not construct code points. It constructs 1-char Python Unicode strings...also known as Python Unicode characters. > ... Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. The concatenation of true code points would *always* make a valid Unicode string, right? It's code units that cannot be blindly concatenated. >... > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. Okay, I'll add: It has been proposed that there should be a module for working with UTF-16 strings in narrow Python builds through some sort of abstraction that handles surrogates for you. If someone wants to implement that, it will be another PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh at python.net Sat Jun 30 11:32:34 2001 From: mwh at python.net (Michael Hudson) Date: 30 Jun 2001 10:32:34 +0100 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Paul Prescod's message of "Fri, 29 Jun 2001 18:16:25 -0700" References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: Paul Prescod writes: > "M.-A. Lemburg" wrote: > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. As a Unicode Idiot (tm) can I please beg you to reconsider? There are so many possible meanings for "character" that I really think it's best to avoid the word altogether. Call Python characters "length 1 strings" or even "length 1 Python strings". [...] > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings This is what I think you should be saying. > ...also known as Python Unicode characters. Which I'm suggesting you forget! Cheers, M. -- I'm a keen cyclist and I stop at red lights. Those who don't need hitting with a great big slapping machine. -- Colin Davidson, cam.misc From paulp at ActiveState.com Sat Jun 30 13:28:28 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 04:28:28 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DB7DC.511A3D8@ActiveState.com> Michael Hudson wrote: > >... > > As a Unicode Idiot (tm) can I please beg you to reconsider? There are > so many possible meanings for "character" that I really think it's > best to avoid the word altogether. Call Python characters "length 1 > strings" or even "length 1 Python strings". Do you really feel that there are many possible meanings for the word "Python Unicode character?" This is a PEP: I have to assume a certain degree of common understanding. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at egenix.com Sat Jun 30 13:52:38 2001 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 30 Jun 2001 13:52:38 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DBD86.81F80D06@egenix.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. The term "character" in Python should really only be used for the 8-bit strings. In Unicode a "character" can mean any of: """ Unfortunately the term character is vastly overloaded. At various times people can use it to mean any of these things: - An image on paper (glyph) - What an end-user thinks of as a character (grapheme) - What a character encoding standard encodes (code point) - A memory storage unit in a character encoding (code unit) Because of this, ironically, it is best to avoid the use of the term character entirely when discussing character encodings, and stick to the term code point. """ Taken from Mark Davis' paper: http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ > > Also, a link to the Unicode glossary would be a good thing. > > Funny how these little PEPs grow... Is that a problem ? The Unicode glossary is very useful in providing a common base for understanding the different terms and tries very hard to avoid ambiguity in meaning. This discussion is partly caused by exactly these different understanding of the terms used in the PEP. I will update the Unicode PEP to the Unicode terminology too. > >... > > Why not make the codec used by Python to convert Unicode > > literals to Unicode strings an option just like the default > > encoding ? > > > > That way we could have a version of the unicode-escape codec > > which supports surrogates and one which doesn't. > > Adding more and more knobs to tweak just adds up to Python code being > non-portable from one machine to another. Not necessarily so; I'll write a more precise spec next week. The idea is to put the codec information into the Python source code, so that it is bound to the literals that way with the result of the Python source code being portable across platforms. Currently this is just an idea and still have to check how far this can go... > > > ISSUE: Should Python allow the construction of characters > > > that do not correspond to Unicode characters? > > > Unassigned Unicode characters should obviously be legal > > > (because they could be assigned at any time). But > > > code points above TOPCHAR are guaranteed never to > > > be used by Unicode. Should we allow access to them > > > anyhow? > > > > I wouldn't count on that last point ;-) > > > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings...also known as Python Unicode characters. > > > ... Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > The concatenation of true code points would *always* make a valid > Unicode string, right? It's code units that cannot be blindly > concatenated. Both wrong :-) U+D800 is a valid Unicode code point and can occur as code unit in both narrow and wide builds. Concatenating this with e.g. U+0020 will still make it a valid Unicode code point sequence (aka Unicode object), but not a valid Unicode character string (since the U+D800 is not a character). The same is true for e.g. U+FFFF. Note that the Unicode type should happily store these values, while the codecs complain. As a result and like I said above, dealing with these problems is left to the applications which use these Unicode objects. > >... > > We should provide a new module which provides a few handy > > utilities though: functions which provide code point-, > > character-, word- and line- based indexing into Unicode > > strings. > > Okay, I'll add: > > It has been proposed that there should be a module for working > with UTF-16 strings in narrow Python builds through some sort of > abstraction that handles surrogates for you. If someone wants > to implement that, it will be another PEP. Uhm, narrow builds don't support UTF-16... it's UCS-2 which is supported (basically: store everything in range(0x10000)); the codecs can map code points to surrogates, but it is solely their responsibility and the responsibility of the application using them to take care of dealing with surrogates. Also, the module will be useful for both narrow and wide builds, since the notion of an encoded character can involve multiple code points. In that sense Unicode is always a variable length encoding for characters and that's the application field of this module. Here's the adjusted text: It has been proposed that there should be a module for working with Unicode objects using character-, word- and line- based indexing. The details of the implementation is left to another PEP. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From bckfnn at worldonline.dk Sat Jun 30 15:07:55 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sat, 30 Jun 2001 13:07:55 GMT Subject: [Python-Dev] Corrupt Jython CVS (off topic). Message-ID: <3b3dccf6.26562024@mail.wanadoo.dk> A week ago I posted this on jython-dev, but no-one was able to give any advise on the best way to fix it. Maybe you can help. For some time now, our [jython] web CVS have not worked correctly: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ Finally I managed to track the problem to the Java2Accessibility.py,v file in the CVS repository. The "rlog" command cannot be executed on this file. From nhv at cape.com Sat Jun 30 15:16:48 2001 From: nhv at cape.com (Norman Vine) Date: Sat, 30 Jun 2001 09:16:48 -0400 Subject: [Python-Dev] RE: Threaded Cygwin Python Import Problem In-Reply-To: <20010628171715.P488@dothill.com> Message-ID: <015601c10166$eb79bb00$a300a8c0@nhv> Jason Tishler > >Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now >provides enough pthreads support so that Cygwin Python builds OOTB *and* >functions reasonably well even with threads enabled. Unfortunately, >there are still a few issues that need to be resolved. > >The one that I would like to address in this posting prevents a threaded >Cygwin Python from building the standard extension modules (without some >kind of intervention). :,( Specifically, the build would frequently >hang during the Distutils part when Cygwin Python is attempting to execvp >a gcc process. > >See the first attachment, test.py, for a minimal Python script that >exhibits the hang. See the second attachment, test.c, for a rewrite >of test.py in C. Since test.c did not hang, I was able to conclude that >this was not just a straight Cygwin problem. > >Further tracing uncovered that the hang occurs in _execvpe() (in os.py), >when the child tries to import tempfile. If I apply the third >attachment, >os.py.patch, then the hang is avoided. Hence, it appears that importing a >module (or specifically the tempfile module) in a threaded Cygwin Python >child cause a hang. > >I saw the following comment in _execvpe(): > > # Process handling (fork, wait) under BeOS (up to 5.0) > # doesn't interoperate reliably with the thread interlocking > # that happens during an import. The actual error we need > # is the same on BeOS for posix.open() et al., ENOENT. > >The above makes me think that possibly Cygwin is having a >similar problem. > >Can anyone offer suggestions on how to further debug this problem? I was experiencing the same problems as Jason with Win2k sp1 and had used the same work-around successfully. < I believe Jason is working with NT 4.0 sp 5 > Curiously after applying the Win2k sp2 I no longer need to do this and the original Python code works fine. Leading me to believe that this may be but a symptom of a another Windows mystery. Regards Norman Vine From aahz at rahul.net Sat Jun 30 16:15:24 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sat, 30 Jun 2001 07:15:24 -0700 (PDT) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3DB7DC.511A3D8@ActiveState.com> from "Paul Prescod" at Jun 30, 2001 04:28:28 AM Message-ID: <20010630141524.E029999C80@waltz.rahul.net> Paul Prescod wrote: > Michael Hudson wrote: >> >>... >> >> As a Unicode Idiot (tm) can I please beg you to reconsider? There are >> so many possible meanings for "character" that I really think it's >> best to avoid the word altogether. Call Python characters "length 1 >> strings" or even "length 1 Python strings". > > Do you really feel that there are many possible meanings for the word > "Python Unicode character?" This is a PEP: I have to assume a certain > degree of common understanding. After reading Michael's and MA's arguments, I'm +1 on making the change they're requesting. But what really triggered my posting this was your use of the phrase "common understanding"; IME, Python's "explicit is better than implicit" rule is truly critical in documentation. Particularly if "character" has been deprecated in standard Unicode documentation, I think sticking to a common vocabulary makes more sense. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From Jason.Tishler at dothill.com Sat Jun 30 17:20:19 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Sat, 30 Jun 2001 11:20:19 -0400 Subject: [Python-Dev] Re: Threaded Cygwin Python Import Problem In-Reply-To: <015601c10166$eb79bb00$a300a8c0@nhv> Message-ID: <20010630112019.B626@dothill.com> Norman, On Sat, Jun 30, 2001 at 09:16:48AM -0400, Norman Vine wrote: > Jason Tishler > >The one that I would like to address in this posting prevents a threaded > >Cygwin Python from building the standard extension modules (without some > >kind of intervention). :,( Specifically, the build would frequently > >hang during the Distutils part when Cygwin Python is attempting to execvp > >a gcc process. > I was experiencing the same problems as Jason with Win2k sp1 and > had used the same work-around successfully. > < I believe Jason is working with NT 4.0 sp 5 > > > Curiously after applying the Win2k sp2 I no longer need to do this > and the original Python code works fine. > > Leading me to believe that this may be but a symptom of a another > Windows mystery. After further reflection, I feel that I have found another race/deadlock issue with the Cygwin's pthreads implementation. If I'm correct, this would explain why you experienced it intermittently with Windows 2000 SP1 and it is "gone" with SP2. Probably SP2 slows down your machine so much that the problem is not triggered. :,) I am going to reconfigure --with-pydebug and set THREADDEBUG. Hopefully, the hang will still be reproducible under these conditions. If so, then I will attempt to produce a minimal C test case for Rob to use to isolate and solve this problem. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From guido at digicool.com Sat Jun 30 20:06:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 30 Jun 2001 14:06:35 -0400 Subject: [Python-Dev] Corrupt Jython CVS (off topic). In-Reply-To: Your message of "Sat, 30 Jun 2001 13:07:55 GMT." <3b3dccf6.26562024@mail.wanadoo.dk> References: <3b3dccf6.26562024@mail.wanadoo.dk> Message-ID: <200106301806.f5UI6Zq30293@odiug.digicool.com> > A week ago I posted this on jython-dev, but no-one was able to give any > advise on the best way to fix it. Maybe you can help. > > > For some time now, our [jython] web CVS have not worked correctly: > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ > > Finally I managed to track the problem to the Java2Accessibility.py,v > file in the CVS repository. The "rlog" command cannot be executed on > this file. > > >From the start of the Java2Accessibility.py,v: > > head 2.4; > access; > symbols > Release_2_1alpha1:2.4 > Release_2_0:2.2 > Release_2_0rc1:2.2 > Release_2_0beta2:2.2 > Release_2_0beta1:2.2 > Release_2_0alpha3:2.2 > Release_2_0alpha2:2.2 > Release_2_0alpha1:2.2 > Release_1_1rc1:2.2 > Release_1_1beta4:2.2 > Release_1_1beta3:2.2 > 2.0:1.1.0.2; > locks; strict; > > > As an experiment, I tried to remove the strange "2.0:1.1.0.2;" line from > the file and then I could run rlog on the file. Make sure to move the semicolon to the end of the previous line. > Does anyone know if/how we can fix this? > > As a last resort I suppose I can attach my hand edited version to a SF > support request where I ask them to copy my file to the CVS server. To > this day I have never been very successful whenever I have tried to edit > files in a CVS repository so I'm reluctant to do this. > > regards, > finn Yes, I think a SF request should be the way to go. I don't know how this could have happened; the "2.0" is illegal as a symbolic tag name... --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Sat Jun 30 21:09:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 12:09:07 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> Message-ID: <3B3E23D3.69D591DD@ActiveState.com> Aahz Maruch wrote: > > > After reading Michael's and MA's arguments, I'm +1 on making the change > they're requesting. But what really triggered my posting this was your > use of the phrase "common understanding"; IME, Python's "explicit is > better than implicit" rule is truly critical in documentation. The spec starts of with an absolutely water tight definition of the term: "the addressable units of a Python Unicode string." I can't get more explicit than that. Expanding every usage of the word to "length 1 Python Unicode string" does not make the document more explicit any more than this is a "more explicit" equation than Ensteins: "The Energy is the mass of the object times the speed of light times two." > Particularly if "character" has been deprecated in standard Unicode > documentation, I think sticking to a common vocabulary makes more sense. "Character" is still a central term in all unicode documentation. Go to their web page and look. It's right on the front page. "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." But I'm not using it in the Unicode sense anyhow, so it doesn't matter. If ISO deprecates the use of the word integer in some standard will we stop talking about Python integers as integers? The addressable unit of a Python string is a character. If it is a Python Unicode String then it is a Python Unicode character. The term "Python Unicode character" is not going away: http://www.python.org/doc/current/tut/node5.html#SECTION005120000000000000000 I will be alot more concerned about this issue when someone reads the PEP and is actually confused by something as opposed to worrying that somebody might be confused by something. If I start using a bunch of technical terms and obfuscatory expansions, it will just dissuade people from reading the PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From DavidA at ActiveState.com Sat Jun 30 23:28:39 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 30 Jun 2001 14:28:39 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> Message-ID: <3B3E4487.40054EAE@ActiveState.com> > "The Energy is the mass of the object times the speed of light times > two." Actually, it's "squared", not times two. At least in my universe =) --david-Unicode-idiot-much-to-Paul's-dismay-ascher From m.favas at per.dem.csiro.au Fri Jun 1 00:41:13 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Fri, 01 Jun 2001 06:41:13 +0800 Subject: [Python-Dev] One more dict trick Message-ID: <3B16C889.C01905BD@per.dem.csiro.au> Tried the patch (thanks, Tim!) - but I guess the things I'm running aren't too sensitive to dict speed . I see a slight speed-up, around 1-2%... Nice, elegant patch that should go places! Maybe the bio-informatics people on c.l.py (Andrew Dalke?) would be interested in trying it out? -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Fri Jun 1 02:24:01 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 20:24:01 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: Message-ID: Another version of the patch attached, a bit faster and with a large new comment block explaining it. It's looking good! As I hope the new comments make clear, nothing about this approach is "a mystery" -- there are explainable reasons for each fiddly bit. This gives me more confidence in it than in the previous approach, and, indeed, it turned out that when I *thought* "hmm! I bet this change would be a little faster!", it actually was . -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dict.txt URL: From tim.one at home.com Fri Jun 1 03:32:30 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 21:32:30 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com> Message-ID: Heh. I was implementing 128-bit floats in software, for Cray, in about 1980. They didn't do it because they *wanted* to make the Cray boxes look like pigs . A 128-bit float type is simply necessary for some scientific work: not all problems are well-conditioned, and the "extra" bits can vanish fast. Went thru the same bit at KSR. Just yesterday Konrad Hinsen was worrying on c.l.py that his scripts that took 2 hours using native floats zoomed to 5 days when he started using GMP's arbitrary-precision float type *just* to get 100 bits of precision. When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was never quite sure why the founders thought that would be a killer selling point, but it wasn't for floats. Down in the trenches we thought it would be mondo cool to have an address space so large that for the rest of our lives we'd never need to bother calling free() again <0.8 wink>. From tim.one at home.com Fri Jun 1 03:46:11 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 21:46:11 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531124533.J690@xs4all.nl> Message-ID: [Thomas Wouters] > Why ? Bumping register size doesn't mean Intel expects to use it all as > address space. They could be used for video-processing, Bingo. Common wisdom holds that vector machines are dead, but the truth is virtually *everyone* runs on a vector box now: Intel just renamed "vector" to "multimedia" (or AMD to "3D Now!"), and adopted a feeble (but ever-growing) subset of traditional vector machines' instruction sets. > or to represent a modest range of rationals , or to help core > 'net routers deal with those nasty IPv6 addresses. KSR's founders had in mind bit-level addressability of networks of machines spanning the globe. Were he to press the point, though, I'd have to agree with Eric that they didn't really *need* 128 bits for that modest goal. > I'm sure cryptomunchers would like bigger registers as well. Agencies we can't talk about would like them as big as they can get them. Each vector register in a Cray box actually consisted of 64 64-bit words, or 4K bits per register. Some "special" models were constructed where the vector FPU was thrown away and additional bit-fiddling units added in its place: they really treated the vector registers as giant bitstrings, and didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. > Oh wait... I get it! You were trying to get yourself in the > historybooks as the guy that said "64 bits ought to be enough for > everyone" :-) That would be foolish indeed! 128, though, now *that's* surely enough for at least a decade . From fdrake at acm.org Fri Jun 1 03:45:45 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 21:45:45 -0400 (EDT) Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531044332.B5026@thyrsus.com> Message-ID: <15126.62409.909290.736779@cj42289-a.reston1.va.home.com> Tim Peters writes: > When KSR died, the KSR-3 on the drawing board had 128-bit registers. I was > never quite sure why the founders thought that would be a killer selling > point, but it wasn't for floats. Down in the trenches we thought it would > be mondo cool to have an address space so large that for the rest of our > lives we'd never need to bother calling free() again <0.8 wink>. And given what (little) I know about the memory architecture on those things, that actually would have be quite reasonable on that platform! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Fri Jun 1 04:23:47 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 22:23:47 -0400 Subject: [Python-Dev] FW: CP4E and Python newbies, it works! Message-ID: Good for the soul! -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Ron Stephens [mailto:rdsteph at earthlink.net] Sent: Thursday, May 31, 2001 7:12 PM To: python-list at python.org Subject: CP4E and Python newbies, it works! I am a complete newbie, and with a very low programming IQ. Although I had programmed a little in college thirty years ago, in Basic, PL/1 and a very little assembler, and fooled around in later years on PC's at home with Basic, then tried PERL, then an effort at Java, they were all too much trouble to really use to program, given that it was a *hobby* that was supposed to be fun. After all, I have a demanding day job that has nothing to do with software, that requires extensive travel, and four kids, a wife, two dogs, and a cat. Java et al, by the time I had digested a couple of books and put in a lot of hours, was just no fun at all to program; and I had to look in the book every other line of code just to recall the syntax etc.; I could not keep it in my head. Now, four months into Python, after being attracted by reading a blurb about Guido van Rossum's Computer Programming for Everybody project, I am in awe of his achievement. I am having fun; and if I can do so then almost anyone can. I am really absent minded, lazy, and not good at detail. Yet I have done the following in four months, and I believe Python therefore has the potential to open up programming to a much wider audience for a lot of people, which is nice: 1. I have written a half dozen scripts that are meaningful to me in Python, more than I ever accomplished with any other language. 2. I am able to have fun by sitting down in the evening, or especially on a weekend, and just programming in Python. The syntax and keywords are gratifyingly just in my head, enough anyway that I can just program like I am having a conversation, and check the details later for errors etc. This is the most satisfying thing of all. 3. I find the debugger just works; magically, it helps me turn my scripts into actual working programs, simply by rather mindlessly following the road laid out for me by using the debugger. 4. I have pleasurably read more Python books from front cover to back than I care to admit. I must be enjoying myself ;-))) 5. I am exploring Jython, which is also pleasurable. After fooling around with Java a couple of years ago, it is really a kick to see jython generating such detailed Java code for me, just as if I had written it (but it would have taken me untold pain to actually do so in Java). Whether or not I actually end up using the java code so generated, I still am enjoying the sheer experience. 6. I have Zope and other things to look forward to. 7. I am able to enjoy the discussions on this newsgroup, even though they are over my head technically. I find them intriguing. Now, I may never actually accomplish anything truly useful by my programming. But I am happy. I hope that others, younger and brighter than myself, who have an interest in programming, but need the right stimulus to get going, will find Python and produce programs of real value. I think Guido van Rossum and his team should be very proud of what they are enabling. The CP4E idea is alive and well. My hat's off to Guido and the whole community which he has spawned, especially those on this newsgroup. I am humbled and honored to read your erudite technical discussions, as a voyeur of mysteries and wonders I can only dimly see on the horizon, but that nonetheless fill me with mental delight. Ron Stephens -- http://mail.python.org/mailman/listinfo/python-list From esr at thyrsus.com Fri Jun 1 05:51:48 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:51:48 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:32:30PM -0400 References: <20010531044332.B5026@thyrsus.com> Message-ID: <20010531235148.B14591@thyrsus.com> Tim Peters : > A 128-bit float type is simply necessary for some > scientific work: not all problems are well-conditioned, and the "extra" > bits can vanish fast. Makes me wonder how competent your customers' numerical analysts were. Where the heck did they think they were getting data with that many digits of accuracy? (Note that I didn't say "precision"...) -- Eric S. Raymond Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune From esr at thyrsus.com Fri Jun 1 05:54:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 23:54:33 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Thu, May 31, 2001 at 09:46:11PM -0400 References: <20010531124533.J690@xs4all.nl> Message-ID: <20010531235433.C14591@thyrsus.com> Tim Peters : > Agencies we can't talk about would like them as big as they can get them. > Each vector register in a Cray box actually consisted of 64 64-bit words, or > 4K bits per register. Some "special" models were constructed where the > vector FPU was thrown away and additional bit-fiddling units added in its > place: they really treated the vector registers as giant bitstrings, and > didn't want to burn 64 clock cycles just to do, e.g., "one" conceptual xor. You've got a point...but I don't think it's really economical to build that kind of hardware into general-purpose processors. You end up with a camel. You know, a horse designed by committee? -- Eric S. Raymond To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 From tim.one at home.com Fri Jun 1 08:58:08 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 02:58:08 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235148.B14591@thyrsus.com> Message-ID: [Tim] > A 128-bit float type is simply necessary for some scientific work: not > all problems are well-conditioned, and the "extra" bits can vanish fast. [ESR] > Makes me wonder how competent your customers' numerical analysts were. > Where the heck did they think they were getting data with that many > digits of accuracy? (Note that I didn't say "precision"...) Not all scientific work consists of predicting the weather with inputs known to half a digit on a calm day . Knuth gives examples of ill-conditioned problems where resorting to unbounded rationals is faster than any known stable f.p. approach (stuck with limited precision) -- think, e.g., chaotic systems here, which includes parts of many hydrodynamics problems in real life. Some scientific work involves modeling ab initio across trillions of computations (and on a Cray box in particular, where addition didn't even bother to round, nor multiplication bother to compute the full product tree, the error bounds per operation were much worse than in a 754 world). You shouldn't overlook either that algorithms often needed massive rewriting to exploit vector and parallel architectures, and in a world where a supremely competent numerical analysis can take a month to verify the numerical robustness of a new algorithm covering two pages of Fortran, a million lines of massively reworked seat-of-the-pants modeling code couldn't be trusted at all without running it under many conditions in at least two precisions (it only takes one surprise catastrophic cancellation to destroy everything). A major oil company once threatened to sue Cray when their reservoir model produced wildly different results under a new release of the compiler. Some exceedingly sharp analysts worked on that one for a solid week. Turned out the new compiler evaluated a subexpression A*B*C by doing (B*C) first instead of (A*B), because it was faster in context (and fine to do so by Fortran's rules). It so happened A was very large, and B and C both small, and doing B*C first caused the whole product to underflow to zero where doing A*B first left a product of roughly C's magnitude. I can't imagine how they ever would have found this if they weren't able to recompile the code using twice the precision (which worked fine thanks to the larger dynamic range), then tracing to see where the runs diverged. Even then it took a week because this was 100s of thousands of lines of crufty Fortran than ran for hours on the world's then-fastest machine before delivering bogus results. BTW, if you think the bulk of the world's numeric production code has even been *seen* by a qualified numerical analyst, you should ride on planes more often . From tim.one at home.com Fri Jun 1 09:08:28 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 03:08:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531235433.C14591@thyrsus.com> Message-ID: [EAR] > You've got a point... Well, really, they do -- but they had a much more compelling point when the Cold War came with an unlimited budget. > but I don't think it's really economical to build that kind of > hardware into general-purpose processors. Economical? The marginal cost of adding even nutso new features in silicon now for mass-market chips is pretty close to zero. Indeed, if you're in the speech recog or 3D imaging games (i.e., things that still tax a PC), Intel comes around *begging* for new ideas to use up all their chip real estate. The only one I recall them turning down was a request from Dragon's founder to add an instruction that, given x and y, returned log(exp(x)+exp(y)). They were skeptical, and turned out even *we* didn't need it . > You end up with a camel. You know, a horse designed by committee? Yup! But that's the camel Intel rides to the bank, so it will probably grow more humps, on which to hang more bags of gold. From esr at thyrsus.com Fri Jun 1 09:23:16 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 1 Jun 2001 03:23:16 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: ; from tim.one@home.com on Fri, Jun 01, 2001 at 02:58:08AM -0400 References: <20010531235148.B14591@thyrsus.com> Message-ID: <20010601032316.A15635@thyrsus.com> Tim Peters : > Not all scientific work consists of predicting the weather with inputs known > to half a digit on a calm day . Knuth gives examples of > ill-conditioned problems where resorting to unbounded rationals is faster > than any known stable f.p. approach (stuck with limited precision) -- think, > e.g., chaotic systems here, which includes parts of many hydrodynamics > problems in real life. Hmmm...good answer. I still believe it's the case that real-world measurements max out below 48 bits or so of precision because the real world is a noisy, fuzzy place. But I can see that most of the algorithms for partial differential equationss would multiply those by very small or very large quantities repeatedly. The range-doubling trick for catching divergences is neat, too. So maybe there's a market for 128-bit floats after all. I'm still skeptical about how likely those applications are to influence the architecture of general-purpose processors. I saw a study once that said heavy-duty scientific floating point only accounts for about 2% of the computing market -- and I think it's significant that MMX instructions and so forth entered the Intel line to support *games*, not Navier-Stokes calculations. That 2% will have to get a lot bigger before I can see Intel doubling its word size again. It's not just the processor design; the word size has huge implications for buses, memory controllers, and the whole system architecture. -- Eric S. Raymond The United States is in no way founded upon the Christian religion -- George Washington & John Adams, in a diplomatic message to Malta. From pf at artcom-gmbh.de Fri Jun 1 09:22:50 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Fri, 1 Jun 2001 09:22:50 +0200 (MEST) Subject: [Python-Dev] precision thread (was One more dict trick) Message-ID: Eric: > > You end up with a camel. You know, a horse designed by committee? Tim: > Yup! But that's the camel Intel rides to the bank, so it will probably grow > more humps, on which to hang more bags of gold. cam*ls? Guido is only one week on vacation and soon heretical words show up here. ;-) sorry, couldn't resist, Peter From thomas at xs4all.net Fri Jun 1 09:28:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 1 Jun 2001 09:28:01 +0200 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 01:06:01PM -0500 References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <20010601092800.K690@xs4all.nl> On Thu, May 31, 2001 at 01:06:01PM -0500, Skip Montanaro wrote: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? You had a sticky tag on the file, probably because you used '-rrelease21-maint' on a cvs checkout or update. Good thing it was release21-maint, though, and not some random other revision, or you would have created another branch :-) You can remove stickyness by using 'cvs update -A'. I personally just have two trees, ~/python/python-2.2 and ~/python/python-2.1.1, where the last one was checked out with -rrelease21-maint. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From gmcm at hypernet.com Fri Jun 1 13:29:28 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 1 Jun 2001 07:29:28 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: References: <20010531235433.C14591@thyrsus.com> Message-ID: <3B174458.1998.46DEEE2B@localhost> [ESR] > > You end up with a camel. You know, a horse designed by > > committee? [Tim] > Yup! But that's the camel Intel rides to the bank, so it will > probably grow more humps, on which to hang more bags of gold. Been a camel a long time, too. x86 assembler is the, er, Perl of assemblers. - Gordon From mwh at python.net Fri Jun 1 13:54:40 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 12:54:40 +0100 Subject: [Python-Dev] another dict crasher Message-ID: Adapted from a report on comp.lang.python from Wolfgang Lipp: class Child: def __init__(self, parent): self.__dict__['parent'] = parent def __getattr__(self, attr): self.parent.a = 1 self.parent.b = 1 self.parent.c = 1 self.parent.d = 1 self.parent.e = 1 self.parent.f = 1 self.parent.g = 1 self.parent.h = 1 self.parent.i = 1 return getattr(self.parent, attr) class Parent: def __init__(self): self.a = Child(self) print Parent().__dict__ segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't tried Tim's latest patch, but I don't believe that will make any difference. It's obvious what's happening; the dict's resizing inside the for loop in dict_repr and the ep pointer is dangling. By the time we've shaken all of these out of dictobject.c it's going to be pretty close to free-threading safe, I'd have thought. reentrancy-sucks-ly y'rs M. -- But since I'm not trying to impress anybody in The Software Big Top, I'd rather walk the wire using a big pole, a safety harness, a net, and with the wire not more than 3 feet off the ground. -- Grant Griffin, comp.lang.python From mwh at python.net Fri Jun 1 14:12:55 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 13:12:55 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: Michael Hudson's message of "01 Jun 2001 12:54:40 +0100" References: Message-ID: Michael Hudson writes: > Adapted from a report on comp.lang.python from Wolfgang Lipp: [snip] > segfaults both 2.1 and current (well, maybe a day old) CVS. Haven't > tried Tim's latest patch, but I don't believe that will make any > difference. > > It's obvious what's happening; the dict's resizing inside the > for loop in dict_repr and the ep pointer is dangling. Actually this crash was dict_print (I always forget about tp_print...). It's pretty easy to mend: *** dictobject.c Fri Jun 1 13:08:13 2001 --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 *************** *** 793,795 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { if (ep->me_value != NULL) { --- 793,796 ---- any = 0; ! for (i = 0; i < mp->ma_size; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { *************** *** 833,835 **** any = 0; ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { if (ep->me_value != NULL) { --- 834,837 ---- any = 0; ! for (i = 0; i < mp->ma_size && v; i++) { ! ep = &mp->ma_table[i]; if (ep->me_value != NULL) { I'm not sure this stops still more Machiavellian behaviour from crashing the interpreter, and you can certainly get items being printed more than once or not at all. I'm not sure this last is a problem; if the user's being this contrary there's only so much we can do to help him or her. Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4 From pedroni at inf.ethz.ch Fri Jun 1 14:49:11 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 1 Jun 2001 14:49:11 +0200 (MET DST) Subject: [Python-Dev] __xxxattr__ caching semantic Message-ID: <200106011249.OAA05837@core.inf.ethz.ch> Hi. What is the intendend semantic wrt to __xxxattr__ caching: class X: pass def cga(self,name): print name def iga(name): print name x=X() x.__dict__['__getattr__'] = iga # 1. x.__getattr__ = iga # 2. X.__dict__['__getattr__'] = cga # 3. X.__getattr__ = cga # 4. x.a for the manual http://www.python.org/doc/current/ref/customization.html with all the variants x.a should fail, they should have no effect. In practice 4. work. Is that an implementation manual mismatch, is this indented, is there code around using 4. ? I'm asking this because jython has differences/bugs in this respect? I imagine that 1.-4. should work for all other __magic__ methods (this should be fixed in jython for some methods), OTOH jython has such a restriction on __del__ too, and this one cannot be removed (is not simply a matter of caching/non caching). regards, Samuele Pedroni. From Greg.Wilson at baltimore.com Fri Jun 1 14:59:28 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 1 Jun 2001 08:59:28 -0400 Subject: [Python-Dev] re: %b format Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1E47@nsamcanms1.ca.baltimore.com> My thanks to everyone who commented on the idea of adding a binary format specifier to Python. I'll volunteer to draft the PEP --- volunteers for a co-author? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From tismer at tismer.com Fri Jun 1 15:56:26 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 15:56:26 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B179F0A.CFA3B2C@tismer.com> Tim Peters wrote: > > Another version of the patch attached, a bit faster and with a large new > comment block explaining it. It's looking good! As I hope the new comments > make clear, nothing about this approach is "a mystery" -- there are > explainable reasons for each fiddly bit. This gives me more confidence in > it than in the previous approach, and, indeed, it turned out that when I > *thought* "hmm! I bet this change would be a little faster!", it actually > was . Thanks a lot for this nice patch. It looks like a real improvement. Also thanks for mentioning my division idea. Since all bits of the hash are eventually taken into account, this idea has somehow survived in an even more efficient solution, good end, file closed. (and good that I saved the time to check my patch in, lately :-) cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From pedroni at inf.ethz.ch Fri Jun 1 16:18:20 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 1 Jun 2001 16:18:20 +0200 (MET DST) Subject: [Python-Dev] Re: [Jython-dev] Using PyChecker in Jython Message-ID: <200106011418.QAA13570@core.inf.ethz.ch> Hi. [Neal Norwitz] > Hello! > > I have created a program PyChecker to perform Python source code checking. > (http://pychecker.sourceforge.net). > > PyChecker is implemented in C Python and does some "tricky" things. > It doesn't currently work in Jython due to the module dis (disassemble code) > not being available in Jython. > > Is there any fundamental problem with getting PyChecker to work under Jython? > > Here's a high-level overview of what PyChecker does: > > imp.find_module() > imp.load_module() > for each object in dir(module): > # object can be a class, function, imported module, etc. > for each instruction in disassembled byte code: > # handle each instruction appropriately > > This hides a lot of details, but I do lots of things like getting the code objects from the classes, methods, and > functions, look at the arguments > in functions, etc. > > Is it possible to make work in Jython? Easy? > > Thanks for any guidance, > Neal It would be great - really - but about easy? As easy as making PyChecker working on source code without using dis and without importing/executing modules and their top defs, I think there will be no dis support on jython side (we produce java bytecode and getting "back" to python vm bytecode would be very tricky, not very elegant, etc. ) any time soon . Seriously, two possible workaround hacks (they are also not very easy), this is just after small brainstorming and ignoring the concrete needs and code of PyChecker: +) more elegant one, but maybe still too difficult or requiring too much work: let PyChecker run under CPython even when checking jython code, jython code can compile down to py vm bytecode but then does not run: why? java classes imports and the jython specific builtin modules (not so many) So one needs to implement a sufficient amount of python (an import hook, etc) code that does the minimal partial evalution required and the required amount of loading&introspection on java, jython specific stuff in order to have the imports work and PyChecher feeded with the things it needs. This means dealing with the java class format, or a two passes approach: run the code under jython in order to gather the information needed to load it succesfully under python. If the top level code contains conditionals that depend on jython stuff this could be hard, but one can ignore that (at least for starting). Clearly the main PyChecker loop would require some adaptation, and maybe include some logic to check some jython specific stuff (subclassing from java, etc). *) let an adapted PyChecker run under jython, obtain someway the needed py vm bytecode stream from a source -> py vm bytecode compiler written in python (such a thing exists - if I remember well) . And similar ideas ... regards, Samuele Pedroni. From barry at digicool.com Fri Jun 1 16:43:59 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 10:43:59 -0400 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.43567.202950.192811@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> You can remove stickyness by using 'cvs update -A'. I TW> personally just have two trees, ~/python/python-2.2 and TW> ~/python/python-2.1.1, where the last one was checked out with TW> -rrelease21-maint. Very good advice for anybody playing with branches! -Barry From barry at digicool.com Fri Jun 1 17:12:33 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 11:12:33 -0400 Subject: [Python-Dev] another dict crasher References: Message-ID: <15127.45281.435849.822222@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that MH> will make any difference. That is highly, highly nasty. Sounds to me like there ought to be an emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if necessary. And if we can trojan in the NAIPL (New And Improved Python License), I wouldn't mind. :) -Barry From jeremy at digicool.com Fri Jun 1 17:18:05 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Fri, 1 Jun 2001 11:18:05 -0400 (EDT) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <15127.45613.947590.246269@slothrop.digicool.com> >>>>> "BAW" == Barry A Warsaw writes: >>>>> "MH" == Michael Hudson writes: MH> segfaults both 2.1 and current (well, maybe a day old) CVS. MH> Haven't tried Tim's latest patch, but I don't believe that will MH> make any difference. BAW> That is highly, highly nasty. Sounds to me like there ought to BAW> be an emergency 2.1.1 patch made for this, bumping Thomas's BAW> work to 2.1.2 if necessary. And if we can trojan in the NAIPL BAW> (New And Improved Python License), I wouldn't mind. :) We can release a critical patch for this bug, ala the CriticalPatches page for the Python 2.0 release. Jeremy From mwh at python.net Fri Jun 1 18:03:55 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 1 Jun 2001 17:03:55 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: On Fri, 1 Jun 2001, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Yes. > Sounds to me like there ought to be an emergency 2.1.1 patch made for > this, bumping Thomas's work to 2.1.2 if necessary. Really? Two mild counterpoints: 1) It's *old*; 1.5.2 at least, and that's only because that's the oldest version I happen to have lying around. It's quite similar to the test_mutants oddness in some ways. 2) There's at least one other crasher in 2.1; the one in the compiler where a variable is referenced in a class and in a contained method. (I've actually run into that one). But a "fix these crashers" release seems reasonable if there's someone with the time to put it out (not me!). > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) Well me neither... Cheers, M. From skip at pobox.com Fri Jun 1 18:26:35 2001 From: skip at pobox.com (Skip Montanaro) Date: Fri, 1 Jun 2001 11:26:35 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <20010601092800.K690@xs4all.nl> References: <15126.34825.167026.520535@beluga.mojam.com> <20010601092800.K690@xs4all.nl> Message-ID: <15127.49723.186388.220648@beluga.mojam.com> Thomas> I personally just have two trees, ~/python/python-2.2 and Thomas> ~/python/python-2.1.1, where the last one was checked out with Thomas> -rrelease21-maint. Thanks, good advice. httplib.py has now been updated on both the head and release21-maint branches. Skip From loewis at informatik.hu-berlin.de Fri Jun 1 19:07:52 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 1 Jun 2001 19:07:52 +0200 (MEST) Subject: [Python-Dev] METH_NOARGS calling convention Message-ID: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> The patch http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 introduces two new calling conventions, METH_O and METH_NOARGS. The rationale for METH_O has been discussed already; the rationale for METH_NOARGS is that it allows a convient simplification (plus a marginal speed-up) of functions which do either PyArg_NoArgs(args) or PyArg_ParseTuple(args, ":function_name"). Now, one open issue is whether the METH_NOARGS functions should have a signature of PyObject * (*unaryfunc)(PyObject *); or of PyObject *(*PyCFunction)(PyObject *, PyObject *); which then would be called with a NULL second argument; the first argument would be self in either case. IMO, the advantage of passing the NULL argument is that NOARGS methods don't need to be cast into PyCFunction in the method table; the advantage of the second approach is that it is clearer in the function implementation. Any opinions which signature to use? Regards, Martin From mal at lemburg.com Fri Jun 1 19:18:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 19:18:21 +0200 Subject: [Python-Dev] METH_NOARGS calling convention References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: <3B17CE5D.9D4CE8D4@lemburg.com> Martin von Loewis wrote: > > The patch > > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=427190&group_id=5470 > > introduces two new calling conventions, METH_O and METH_NOARGS. The > rationale for METH_O has been discussed already; the rationale for > METH_NOARGS is that it allows a convient simplification (plus a > marginal speed-up) of functions which do either PyArg_NoArgs(args) or > PyArg_ParseTuple(args, ":function_name"). > > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The second... I'm not sure how you will get extension writers who have to maintain packages for all three Python versions to ever change their code to use the new style calling scheme: there simply is no clean way to use the same code base unless you are willing to add tons of #ifdefs. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fdrake at acm.org Fri Jun 1 19:31:15 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 1 Jun 2001 13:31:15 -0400 (EDT) Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <3B17CE5D.9D4CE8D4@lemburg.com> References: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> <3B17CE5D.9D4CE8D4@lemburg.com> Message-ID: <15127.53603.87216.103262@cj42289-a.reston1.va.home.com> M.-A. Lemburg writes: > > Any opinions which signature to use? > > The second... Seconded. ;-) > I'm not sure how you will get extension writers who > have to maintain packages for all three Python versions to > ever change their code to use the new style calling scheme: > there simply is no clean way to use the same code base unless > you are willing to add tons of #ifdefs. You won't, and that's OK. Even if 3rd-party extensions never use it, there are plenty of functions/methods in the standard distribution which can use it, and I imagine those would be converted fairly quickly. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tismer at tismer.com Fri Jun 1 20:29:11 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:29:11 +0200 Subject: [Python-Dev] Marshal bug in 2.1? Message-ID: <3B17DEF7.3E7C6BC6@tismer.com> Hi friends, there is a script which generates encrypted passwords for Starship users. There is a series of marshal, zlib and base64 calls, which is reversed by the script. Is there a known bug in Marshal, or should I start the debugger now? The passwphrase for the attached script is "hey". cheers - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ -------------- next part -------------- import marshal,base64,zlib exec marshal.loads(zlib.decompress(base64.decodestring(""" eJytVM+PGzUUfs6PzWZYwapAqbbAuiyF6Yqsqt2iomq1HGkvuQQJaS+pM3YzbjP2yHY6CdrVHNr+ Exz5L/gn4MidC2f+Az5Pkq0QlFMnmTf2s+d73/vmPWeEq43b/wxT498mSXSOwbskGZ0zqm+QbNF5 i+o9km16idU21bdIdUh26GmLrCRWf0ayS8+6dN6l+oAU0XcP689JbZHcohfA6VF9mxQj1SbVi57r 2PAFqS7p7bVH9+kFkew1mDvA/JJUCziGEYs3AozS7ch1yIiSg7dwJfjxzCkRVFml4Q7ng8F6zgUv hfeVdZLzJ84WXJgln+rnyvCgFuEIbzoV5s54/g3PcuFEFpTzvMp1lnPhFM9sUc6DklwboEmF5UIb 7YPO8PJkHvhz5ZbcWDOYaaOE45VYrmI18N/n2sctXlvDMczmPthC/wjEJ9bxUrtFTOBt6OAPoqSH h4c85MqrdUaeT1SoFDIenJ0OmpyWdu5AxDllwmuB8GLC33gNzm7700EytBWfA3s0esiD5TM7hTAY +IBIuS6PymXIrTkyKiRYjKL5+MI607nXZsrVAjLPlpHmFck0m+lyYgWIOAXRC2UkNHowuJMII+Mm M10zv2K8QosojUvy0tmpE0WyomQLFfK4o7BIGgUhxWSmjhJ/F/U3CdVX/BHPRKyE2SwiA0mEVQgI g49agXtmIVMWbmWMOvi1yZexyfaovhmb7BnRJWsGjC7RXh/TBZqgFdsO3XCJJvuELtqkO3RB0cPq T5v5VmyTSwDt00WLdI/CduxQNGbc14pNGm2H+Ajgo7SLoEPfhz25e3x8cv/eyX0wYuADRjepAQpE ga3jIP514H2E4SiNZ8NQj2E1h2nmPposd80TYnrUDi3SaFdD/37c8O9q9bF7T2eimEhxtk8+Hj6N 0XEh7W+wC/m134qT4PANGpdRVYMtm4V5KdGijSM0DqmnygffwfCp1WaFIsq0s+EU/gt4Bfh/ZDdn wx75JJ6U7EN2je2y91izOh4XQpvxeOj3MStnSqC88f1RsqtSiMXKy9zB/8DvYs/jH/46fWR+q3+v fv3lz5/+eJUmm5ylzRr6eB5vBif/4LAOaUShxuOrdKJoTlRjbXDWNN6wCFeSvdYmbcR+U65RiW9R Dh/gufNOP+m3dnq7bIdtI9VrbJ/9DYOcdyU= """))) From tismer at tismer.com Fri Jun 1 20:47:02 2001 From: tismer at tismer.com (Christian Tismer) Date: Fri, 01 Jun 2001 20:47:02 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> Message-ID: <3B17E326.41D82CCE@tismer.com> Christian Tismer wrote: > > Hi friends, > > there is a script which generates encrypted passwords for > Starship users. There is a series of marshal, zlib and base64 > calls, which is reversed by the script. > > Is there a known bug in Marshal, or should I start the debugger now? > The passwphrase for the attached script is "hey". Aehmmm... can it be that code objects are no longer compatible between Python 2.0 and 2.1? sigh - ciao - chris -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mwh at python.net Fri Jun 1 20:52:17 2001 From: mwh at python.net (Michael Hudson) Date: 01 Jun 2001 19:52:17 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: barry@digicool.com's message of "Fri, 1 Jun 2001 11:12:33 -0400" References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: Warning! VERY SICK CODE INDEED ahead! barry at digicool.com (Barry A. Warsaw) writes: > >>>>> "MH" == Michael Hudson writes: > > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > > That is highly, highly nasty. Not as nasty as this, though: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli: def __repr__(self): dict.clear() print # doesn't crash without this. don't know why return `"machiavelli"` def __hash__(self): return 0 dict[Machiavelli()] = Machiavelli() print dict gives, even with my posted patch to dictobject.c $ ./python crash2.py { Segmentation fault (core dumped) Any ideas what the above code should do? (Other than use the secret PSU website to hire a hitman and shoot whoever wrote the code, I mean). Cheers, M. -- Well, yes. I don't think I'd put something like "penchant for anal play" and "able to wield a buttplug" in a CV unless it was relevant to the gig being applied for... -- Matt McLeod, alt.sysadmin.recovery From mal at lemburg.com Fri Jun 1 21:01:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 01 Jun 2001 21:01:38 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> Message-ID: <3B17E692.281A329B@lemburg.com> Christian Tismer wrote: > > Christian Tismer wrote: > > > > Hi friends, > > > > there is a script which generates encrypted passwords for > > Starship users. There is a series of marshal, zlib and base64 > > calls, which is reversed by the script. > > > > Is there a known bug in Marshal, or should I start the debugger now? > > The passwphrase for the attached script is "hey". > > Aehmmm... can it be that code objects are no longer compatible > between Python 2.0 and 2.1? Yes, not suprisingly though... AFAIK the pyc format changed in every single version between 1.5.2 and 2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 1 22:36:21 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 16:36:21 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: I suspect there are many ways to get the dict code to blow up, and always have been. I picked on dict compare a month or so ago mostly because nobody cares how fast that runs except in the == and != cases. Others are a real bitch; for example, the fundamental lookdict function caches dictentry *ep0 = mp->ma_table; at the start as if it were invariant -- but very unlikely sequences of collisions with identical hash codes combined with mutating comparisons can turn that into a bogus pointer. List objects used to have similar vulnerabilities during sorting (where comparison is the *norm*, not a one-in-a-billion freak occurrence), and no amount of slow-the-code paranoia sufficed to plug all conceivable holes. In the end we invented an internal "immutable list type", and replace the list object's type pointer for the duration of the sort (you can still try to mutate a list during a sort, but all the mutating list methods are redirected to raise an exception when you do). The dict code has even more holes and in more places, but they're generally much harder to provoke, so they've gone unnoticed for 10 years. All in all, seemed like a good tradeoff to me . From tim.one at home.com Sat Jun 2 00:08:32 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 18:08:32 -0400 Subject: [Python-Dev] METH_NOARGS calling convention In-Reply-To: <200106011707.TAA21329@pandora.informatik.hu-berlin.de> Message-ID: Cool! [Martin von Loewis] > ... > Now, one open issue is whether the METH_NOARGS functions should have > a signature of > > PyObject * (*unaryfunc)(PyObject *); > > or of > > PyObject *(*PyCFunction)(PyObject *, PyObject *); > > which then would be called with a NULL second argument; the first > argument would be self in either case. > > IMO, the advantage of passing the NULL argument is that NOARGS methods > don't need to be cast into PyCFunction in the method table; the > advantage of the second approach is that it is clearer in the function > implementation. > > Any opinions which signature to use? The one that makes sense : delcare functions with the number of arguments they use. I don't care about needing to cast in the table: you do that once, but people read the *code* over and over, and an unused arg will be a mystery (or even a source of compiler warnings) every time you bump into one. The only way needing to cast could be "a problem" is if this remains an undocumented gimmick that developers have to reverse-engineer from staring at the (distributed all over the place) implementation. I like what the patch does, but I'd reject it just for continuing to leave this stuff Utterly Mysterious: please add comments saying what METH_NOARGS and METH_O *mean*: what's the point, why are these defined, how and when are you supposed to use them? That's where to explain the need to cast METH_NOARGS. From thomas at xs4all.net Sat Jun 2 00:42:35 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:42:35 +0200 Subject: [Python-Dev] another dict crasher In-Reply-To: <15127.45281.435849.822222@anthem.wooz.org>; from barry@digicool.com on Fri, Jun 01, 2001 at 11:12:33AM -0400 References: <15127.45281.435849.822222@anthem.wooz.org> Message-ID: <20010602004235.Q690@xs4all.nl> On Fri, Jun 01, 2001 at 11:12:33AM -0400, Barry A. Warsaw wrote: > > >>>>> "MH" == Michael Hudson writes: > MH> segfaults both 2.1 and current (well, maybe a day old) CVS. > MH> Haven't tried Tim's latest patch, but I don't believe that > MH> will make any difference. > That is highly, highly nasty. Sounds to me like there ought to be an > emergency 2.1.1 patch made for this, bumping Thomas's work to 2.1.2 if > necessary. Why bump 'my work' ? I'm just reviewing patches checked into the head. A fix for the above problems would fit in a patch release very nicely, and a release is a release. Besides, releasing 2.1.1 as 2.1 + dict fix would be a CVS nightmare. Unless you propose to keep it out of CVS, Barry ? :) > And if we can trojan in the NAIPL (New And Improved Python > License), I wouldn't mind. :) I'll channel Guido by saying he wouldn't even allow us to ship it with anything other than the PSF licence :) Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly y'rs -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Sat Jun 2 00:47:16 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sat, 2 Jun 2001 00:47:16 +0200 Subject: [Python-Dev] Marshal bug in 2.1? In-Reply-To: <3B17E692.281A329B@lemburg.com>; from mal@lemburg.com on Fri, Jun 01, 2001 at 09:01:38PM +0200 References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> Message-ID: <20010602004716.R690@xs4all.nl> On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > Yes, not suprisingly though... AFAIK the pyc format changed > in every single version between 1.5.2 and 2.1. Worse, it's changed several times between each release :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From barry at digicool.com Sat Jun 2 01:12:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 1 Jun 2001 19:12:30 -0400 Subject: [Python-Dev] another dict crasher References: <15127.45281.435849.822222@anthem.wooz.org> <20010602004235.Q690@xs4all.nl> Message-ID: <15128.8542.51241.192412@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: >> That is highly, highly nasty. Sounds to me like there ought to >> be an emergency 2.1.1 patch made for this, bumping Thomas's >> work to 2.1.2 if necessary. TW> Why bump 'my work' ? I'm just reviewing patches checked into TW> the head. A fix for the above problems would fit in a patch TW> release very nicely, and a release is a release. Besides, TW> releasing 2.1.1 as 2.1 + dict fix would be a CVS TW> nightmare. Unless you propose to keep it out of CVS, Barry ? TW> :) Oh no! You know me, I like to release those maintenance releases early and often. :) Anyway, that's why /you're/ the 2.1.1 czar. >> And if we can trojan in the NAIPL (New And Improved Python >> License), I wouldn't mind. :) TW> I'll channel Guido by saying he wouldn't even allow us to ship TW> it with anything other than the PSF licence :) :) TW> Gee-I'd-almost-think-you-had-a-need-for-an-FSF-suffered-licence-ly TW> y'rs Where'd you get /that/ idea? :) -Barry From mwh at python.net Sat Jun 2 01:20:26 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 00:20:26 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Fri, 1 Jun 2001 16:36:21 -0400" References: Message-ID: "Tim Peters" writes: > The dict code has even more holes and in more places, but they're > generally much harder to provoke, so they've gone unnoticed for 10 > years. All in all, seemed like a good tradeoff to me . Are you suggesting that we should just leave these crashers in? They're not *particularly* hard to provoke if you know the implementation - and I was inspired to look for them by someone's report of actually running into one. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From tim.one at home.com Sat Jun 2 03:04:36 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 1 Jun 2001 21:04:36 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Are you suggesting that we should just leave these crashers in? > They're not *particularly* hard to provoke if you know the > implementation - and I was inspired to look for them by someone's > report of actually running into one. I certainly don't object to fixing ones that bite innocent users, but there are also costs of several kinds. In this case, I couldn't care less how long printing a dict takes -- go for it. When adversarial abuse starts interfering with the speed of crucial operations, though, I'm simply not a "safety at any cost" person. Guido is much more of one, although the number of holes remaining in Python could plausibly fill Albert Hall . short-of-50-easy-ways-to-crash-win98-just-think-hard-about-each-"+"-in- the-code-base-ly y'rs - tim From gstein at lyra.org Sat Jun 2 07:52:03 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:52:03 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 09:42:30PM -0400 References: <3B10D758.3741AC2F@lemburg.com> Message-ID: <20010601225203.R23560@lyra.org> On Sun, May 27, 2001 at 09:42:30PM -0400, Tim Peters wrote: >... > [Greg Ewing] > > I think it would be safe if: > > > > 1) it kept a reference to the underlying object, and > > That much it already does. > > > 2) it re-fetched the pointer and length info each time it was > > needed, using the underlying object's buffer interface. > > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. Huh? I don't think it would be all that slow. It is just a function call. And I don't think that the getitem slot is really used all that frequently (in a loop) for buffer type objects. I've been thinking that refetching the ptr/len is the right fix. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Jun 2 07:54:23 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:54:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, May 26, 2001 at 02:44:04AM -0400 References: <3B0ED784.FC53D01@lemburg.com> Message-ID: <20010601225423.S23560@lyra.org> On Sat, May 26, 2001 at 02:44:04AM -0400, Tim Peters wrote: > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "Works for me" :-) Part of the neglect is also based on Guido's ambivalence. Part is that I haven't needed more from it. The day that I do, then I'll code it up :-) But that doesn't help the "generic" case, unfortunately. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Jun 2 07:55:33 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 1 Jun 2001 22:55:33 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com>; from mal@lemburg.com on Sat, May 26, 2001 at 05:47:47PM +0200 References: <3B0FD023.C4588919@lemburg.com> Message-ID: <20010601225533.T23560@lyra.org> On Sat, May 26, 2001 at 05:47:47PM +0200, M.-A. Lemburg wrote: >... > Even the idea of replacing the usage of strings as data buffers > with buffer object didn't get very far; common habits are simply > hard to break. That idea was shot down when Guido said that 'c' arrays should be the "official form of a data buffer." Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Sat Jun 2 08:13:49 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:13:49 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Actually this crash was dict_print (I always forget about tp_print...). We all should . > It's pretty easy to mend: > > *** dictobject.c Fri Jun 1 13:08:13 2001 > --- dictobject.c-fixed Fri Jun 1 12:59:07 2001 > *************** > *** 793,795 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) { > if (ep->me_value != NULL) { > --- 793,796 ---- > any = 0; > ! for (i = 0; i < mp->ma_size; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > *************** > *** 833,835 **** > any = 0; > ! for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { > if (ep->me_value != NULL) { > --- 834,837 ---- > any = 0; > ! for (i = 0; i < mp->ma_size && v; i++) { > ! ep = &mp->ma_table[i]; > if (ep->me_value != NULL) { > > I'm not sure this stops still more Machiavellian behaviour from > crashing the interpreter, Alas, it doesn't. You can't trust *anything* about a container you're iterating over across any call that may call back into Python. In these cases, the call to PyObject_Repr() can execute any code at all, including code that mutates the dict you're crawling over. In particular, calling PyObject_Repr() to format the key means the ep = &mp->ma_table[i] pointer may be trash by the time PyObject_Repr() is called again to format the value. See characterize() for the pain it takes to guard against everything, including encouraging comments like: if (cmp > 0 || i >= a->ma_size || a->ma_table[i].me_value == NULL) { /* Not the *smallest* a key; or maybe it is * but the compare shrunk the dict so we can't * find its associated value anymore; or * maybe it is but the compare deleted the * a[thiskey] entry. */ Py_DECREF(thiskey); continue; } It should really add "or maybe it just shuffled the dict around and the value at ma_table[i] is no longer associated with the key that *used* to be at ma_table[i], but since there's still *some* non-NULL pointer there we'll just pretend that didn't happen and press onward". > and you can certainly get items being printed more than once or not > at all. I'm not sure this last is a problem; Those don't matter: in a long tradition, we buy "safety" not only at the cost of bloating the code, but also by making the true behavior in case of mutation unpredictable & inexplicable. That's why I *really* liked the "immutable list" trick in list.sort(): even if we could have made the code bulletproof without it, we couldn't usefully explain what the heck it actually did. It's not Pythonic to blow up, but neither is it Pythonic to be incomprehensible. You simply can't win here. > if the user's being this contrary there's only so much we can > do to help him or her. I'd prefer a similar internal immutable-dict trick that raised an exception if the user was pushing Python into a corner where "blow up or do something baffling" were its only choices. That would render the original example illegal, of course. But would that be a bad thing? What *should* it mean when the user invokes an operation on a container and mutates the container during that operation? There's almost no chance that Jython does the same thing as CPython in all these cases, so it's effectively undefined behavior no matter how you plug the holes (short of raising an exception). From tim.one at home.com Sat Jun 2 08:34:43 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 02:34:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010601225203.R23560@lyra.org> Message-ID: [Tim] > If after > > b = buffer(some_object) > > b.__getitem__ needed to refetch the info between > > b[i] > and > b[i+1] > > I expect it would be so slow even Greg wouldn't want it anymore. [Greg] > Huh? I don't think it would be all that slow. It is just a function > call. And I don't think that the getitem slot is really used all that > frequently (in a loop) for buffer type objects. I expect they index into the buffer memory directly then, right? Then for buffers obtained from mutable objects, any such loop is unsafe in the absence of the GIL, or even in its presence if the loop contains code that may call back into Python. > I've been thinking that refetching the ptr/len is the right fix. So is calling __getitem__ all the time then, unless you want to dance on the razor's edge. The idea that you can safely "borrow" memory from a mutable object without copying it is brittle. > Part of the neglect is also based on Guido's ambivalence. Part is > that I haven't needed more from it. The day that I do, then I'll > code it up :-) But that doesn't help the "generic" case, > unfortunately. I take that as "yes" to my "nobody cares about it enough to maintain it?". In that light, Guido's ambivalence is indeed surprising . From mwh at python.net Sat Jun 2 09:09:07 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 08:09:07 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 02:13:49 -0400" References: Message-ID: "Tim Peters" writes: > [Michael Hudson] > > Actually this crash was dict_print (I always forget about tp_print...). > > We all should . > > > It's pretty easy to mend: [snip] > > I'm not sure this stops still more Machiavellian behaviour from > > crashing the interpreter, > > Alas, it doesn't. No, that's what my "dict[Machiavelli()] = Machiavelli()" example was demonstrating. If noone beats me to it, I'll post a better fix to sf next week, complete with test-cases and suitably "encouraging" comments. I can't easily see other examples of the problem; there certainly might be things you could do with comparisons that could trigger crashes, but that code's so hairy that it's almost impossible for me to be sure. There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. -- C. A. R. Hoare > > and you can certainly get items being printed more than once or not > > at all. I'm not sure this last is a problem; > > Those don't matter: in a long tradition, we buy "safety" not only at the > cost of bloating the code, but also by making the true behavior in case of > mutation unpredictable & inexplicable. This is what I thought. [snip] > > if the user's being this contrary there's only so much we can > > do to help him or her. > > I'd prefer a similar internal immutable-dict trick that raised an exception > if the user was pushing Python into a corner where "blow up or do something > baffling" were its only choices. That would render the original example > illegal, of course. But would that be a bad thing? It's hard to see how. > What *should* it mean when the user invokes an operation on a > container and mutates the container during that operation? I don't think there's a meaning you can attach to this kind of behaviour. The "immutable dict trick" looks better the more I think about it, but I guess that will have to wait until Guido gets back from the sun... Cheers, M. -- incidentally, asking why things are "left out of the language" is a good sign that the asker is fairly clueless. -- Erik Naggum, comp.lang.lisp From gstein at lyra.org Sat Jun 2 09:40:05 2001 From: gstein at lyra.org (Greg Stein) Date: Sat, 2 Jun 2001 00:40:05 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sat, Jun 02, 2001 at 02:34:43AM -0400 References: <20010601225203.R23560@lyra.org> Message-ID: <20010602004005.F23560@lyra.org> On Sat, Jun 02, 2001 at 02:34:43AM -0400, Tim Peters wrote: > [Tim] > > If after > > > > b = buffer(some_object) > > > > b.__getitem__ needed to refetch the info between > > > > b[i] > > and > > b[i+1] > > > > I expect it would be so slow even Greg wouldn't want it anymore. > > [Greg] > > Huh? I don't think it would be all that slow. It is just a function > > call. And I don't think that the getitem slot is really used all that > > frequently (in a loop) for buffer type objects. > > I expect they index into the buffer memory directly then, right? Then for > buffers obtained from mutable objects, any such loop is unsafe in the > absence of the GIL, or even in its presence if the loop contains code that > may call back into Python. Most access is: fetch ptr/len, index into the memory. And yes: anything within that loop which could conceivably change the target object (especially a call into Python) could move that ptr. I was saying that, at the Python level, using a loop and doing b[i] into a buffer/string/unicode object would seem to be relatively rare. b[0] and stuff is reasonably common. > > I've been thinking that refetching the ptr/len is the right fix. > > So is calling __getitem__ all the time then, unless you want to dance on the > razor's edge. The idea that you can safely "borrow" memory from a mutable > object without copying it is brittle. Stay in C code and don't call into Python. It is safe then. The buffer API is exactly what you're saying: borrow a memory reference. The concept makes a lot of things possible that weren't before. The buffer object's storing of that reference was a mistake. > > Part of the neglect is also based on Guido's ambivalence. Part is > > that I haven't needed more from it. The day that I do, then I'll > > code it up :-) But that doesn't help the "generic" case, > > unfortunately. > > I take that as "yes" to my "nobody cares about it enough to maintain it?". > In that light, Guido's ambivalence is indeed surprising . Eh? I'll maintain the thing, but you're confusing that with adding more features into it. Different question. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim.one at home.com Sat Jun 2 10:17:39 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 04:17:39 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > ... > If noone beats me to it, I'll post a better fix to sf next week, > complete with test-cases and suitably "encouraging" comments. Ah, no need -- looks like I was doing that while you were writing this. Checked in already. So long as we're happy to settle for senseless results that simply don't blow up, the only other trick you really needed was to save away the value in a local vrbl and incref it across the key->string bit; then you don't have to worry about key->string deleting the value, or about the table entry it lived in going away (because you get the value from the (still-incref'ed) *local* vrbl later, not from the table again). > I can't easily see other examples of the problem; there certainly > might be things you could do with comparisons that could trigger > crashes, but that code's so hairy that it's almost impossible for me > to be sure. It's easy to be sure: any code that tries to remember anything about a dict (ditto any mutable object) across a "dangerous" call, other than the mere address of the object, is a place you *can* provoke a core dump. It may not be easy to provoke, and a given provoking test case may not fail across all platforms, or even every time you run it on a single platform, but it's "an obvious" hole all the same. From tismer at tismer.com Sat Jun 2 11:49:35 2001 From: tismer at tismer.com (Christian Tismer) Date: Sat, 02 Jun 2001 11:49:35 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> Message-ID: <3B18B6AE.88EA6926@tismer.com> Thomas Wouters wrote: > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > Yes, not suprisingly though... AFAIK the pyc format changed > > in every single version between 1.5.2 and 2.1. > > Worse, it's changed several times between each release :) But I didn't use .pyc at all, just a marshalled code object. There are no version headers or such. The same object worked in fact for Py 1.5.2 and 2.0, but no longer with 2.1 . I debugged the unmarshalling and saw what happened: The new code objects with their new scoping features were the problem. The new structures were simply added, and there is no way to skip these for older code objects, since there isn't any info. Some option for marshal to umarshal old-style code objects would ave helped. But then, I'm not sure if the opcodes are still assigned the same way in 2.1, or if there was some movement? This would kill it anyway. ciao - chris (now looking for another cheap way to do something invisible in Python without installing *anything* ) -- Christian Tismer :^) Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net/ 14163 Berlin : PGP key -> http://wwwkeys.pgp.net/ PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com/ From mal at lemburg.com Sat Jun 2 13:09:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 02 Jun 2001 13:09:13 +0200 Subject: [Python-Dev] Marshal bug in 2.1? References: <3B17DEF7.3E7C6BC6@tismer.com> <3B17E326.41D82CCE@tismer.com> <3B17E692.281A329B@lemburg.com> <20010602004716.R690@xs4all.nl> <3B18B6AE.88EA6926@tismer.com> Message-ID: <3B18C958.598A9891@lemburg.com> Christian Tismer wrote: > > Thomas Wouters wrote: > > > > On Fri, Jun 01, 2001 at 09:01:38PM +0200, M.-A. Lemburg wrote: > > > > > Yes, not suprisingly though... AFAIK the pyc format changed > > > in every single version between 1.5.2 and 2.1. > > > > Worse, it's changed several times between each release :) > > But I didn't use .pyc at all, just a marshalled code object. That's the point: the header in pyc files is meant to signal the incompatibility of the following code object. Perhaps we should moev this version information into the marshal format of code objects themselves... > There are no version headers or such. > The same object worked in fact for Py 1.5.2 and 2.0, but no > longer with 2.1 . > I debugged the unmarshalling and saw what happened: > The new code objects with their new scoping features were > the problem. The new structures were simply added, and there > is no way to skip these for older code objects, since there > isn't any info. > Some option for marshal to umarshal old-style code objects > would ave helped. > But then, I'm not sure if the opcodes are still assigned > the same way in 2.1, or if there was some movement? This would > kill it anyway. AFAIK, the assignments did not change, but several opcodes were added in 2.1, so code compiled in 2.1 will no run in 2.0. > ciao - chris > > (now looking for another cheap way to do something invisible in > Python without installing *anything* ) Why don't you use freeze or py2exe or Gordon's installer for these one file executables ? Alternatively, you should check the Python version and make sure that it matches the one used for compiling the byte code. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Sat Jun 2 13:40:56 2001 From: mwh at python.net (Michael Hudson) Date: 02 Jun 2001 12:40:56 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sat, 2 Jun 2001 04:17:39 -0400" References: Message-ID: "Tim Peters" writes: > > I can't easily see other examples of the problem; there certainly > > might be things you could do with comparisons that could trigger > > crashes, but that code's so hairy that it's almost impossible for me > > to be sure. > > It's easy to be sure: any code that tries to remember anything about a dict > (ditto any mutable object) across a "dangerous" call, other than the mere > address of the object, is a place you *can* provoke a core dump. It may not > be easy to provoke, and a given provoking test case may not fail across all > platforms, or even every time you run it on a single platform, but it's "an > obvious" hole all the same. Ah, like this one: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli2: def __eq__(self, other): dict.clear() return 1 def __hash__(self): return 0 dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] I'll attach a patch, but it's another branch inside lookdict (though not lookdict_string which is I guess the really performance sensitive one). Cheers, M. Index: dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.100 diff -c -1 -r2.100 dictobject.c *** dictobject.c 2001/06/02 08:27:39 2.100 --- dictobject.c 2001/06/02 11:36:47 *************** *** 273,274 **** --- 273,281 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { *************** *** 310,311 **** --- 317,325 ---- cmp = PyObject_RichCompareBool(ep->me_key, key, Py_EQ); + if (ep0 != mp->ma_table) { + PyErr_SetString(PyExc_RuntimeError, + "dict resized on comparison"); + ep = mp->ma_table; + while (ep->me_value) ep++; + return ep; + } if (cmp > 0) { Here's another test case to work out the second of those new if statements: dict = {} # let's force dict to malloc its table for i in range(1,10): dict[i] = i class Machiavelli3: def __init__(self, id): self.id = id def __eq__(self, other): if self.id == other.id: dict.clear() return 1 else: return 0 def __repr__(self): return "%s(%s)"%(self.__class__.__name__, self.id) def __hash__(self): return 0 dict[Machiavelli3(1)] = Machiavelli3(0) dict[Machiavelli3(2)] = Machiavelli3(0) print dict[Machiavelli3(2)] -- M-x psych[TAB][RETURN] -- try it From pedroni at inf.ethz.ch Sat Jun 2 20:58:55 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Sat, 2 Jun 2001 20:58:55 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? Message-ID: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Hi. Is this a case that only the BDFL could know and pronounce on ... or I'm missing somenthing ... Thanks for any feedback, Samuele Pedroni. ----- Original Message ----- From: Samuele Pedroni To: Sent: Friday, June 01, 2001 2:49 PM Subject: [Python-Dev] __xxxattr__ caching semantic > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). > > regards, Samuele Pedroni. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > From tim.one at home.com Sun Jun 3 00:57:57 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 18:57:57 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <004d01c0eb96$24b5f460$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > Is this a case that only the BDFL could know and pronounce on ... > or I'm missing somenthing ... The referenced URL http://www.python.org/doc/current/ref/customization.html appears irrelevant to me, so unsure what you're asking about. Perhaps http://www.python.org/doc/current/ref/attribute-access.html was intended? If so, the these methods are cached in the class object at class definition time; therefore, they cannot be changed after the class definition is executed. there doesn't mean exactly what it says: it's trying to say that the __XXXattr__ methods *inherited from base classes* (if any) are cached in the class object at class definition time, so that changing them in the base classes later has no effect on the derived class. It should be clearer. A direct class setattr can still change them; indirect assignment via class.__dict__ is ineffective for the __dict__, __bases__, __name__, __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create a dict entry then, but class getattr doesn't look in the dict to get the value of these specific keys). Didn't understand the program snippet. Much of this is due to hoary optimizations and I agree is ill-documented. I hope Guido's current rework of all this stuff will leave the endcases more explainable. > ----- Original Message ----- > From: Samuele Pedroni > To: > Sent: Friday, June 01, 2001 2:49 PM > Subject: [Python-Dev] __xxxattr__ caching semantic > > > Hi. > > What is the intendend semantic wrt to __xxxattr__ caching: > > class X: > pass > > def cga(self,name): > print name > > def iga(name): > print name > > x=X() > x.__dict__['__getattr__'] = iga # 1. > x.__getattr__ = iga # 2. > X.__dict__['__getattr__'] = cga # 3. > X.__getattr__ = cga # 4. > x.a > > for the manual > > http://www.python.org/doc/current/ref/customization.html > > with all the variants x.a should fail, they should have > no effect. In practice 4. work. > > Is that an implementation manual mismatch, is this indented, is there > code around using 4. ? > > I'm asking this because jython has differences/bugs in this respect? > > I imagine that 1.-4. should work for all other __magic__ methods > (this should be fixed in jython for some methods), > OTOH jython has such a restriction on __del__ too, and this one cannot > be removed (is not simply a matter of caching/non caching). From pedroni at inf.ethz.ch Sun Jun 3 01:46:42 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Sun, 3 Jun 2001 01:46:42 +0200 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? References: Message-ID: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Hi. Thanks a lot for the answer, and sorry for the ill-formed question. [Tim Peters] > [Samuele Pedroni] > > Is this a case that only the BDFL could know and pronounce on ... > > or I'm missing somenthing ... > > The referenced URL > > http://www.python.org/doc/current/ref/customization.html > > appears irrelevant to me, so unsure what you're asking about. Perhaps > > http://www.python.org/doc/current/ref/attribute-access.html > > was intended? If so, the Yes, pilot error with browser and copy&pasted, I intented the latter. > these methods are cached in the class object at class > definition time; therefore, they cannot be changed after > the class definition is executed. > > there doesn't mean exactly what it says: it's trying to say that the > __XXXattr__ methods *inherited from base classes* (if any) are cached in the > class object at class definition time, so that changing them in the base > classes later has no effect on the derived class. It should be clearer. > > A direct class setattr can still change them; indirect assignment via > class.__dict__ is ineffective for the __dict__, __bases__, __name__, > __getattr__, _setattr__ and __delattr__ class attributes (yes, you'll create > a dict entry then, but class getattr doesn't look in the dict to get the > value of these specific keys). > This matches what I understood reading CPython C code (yes I did that too ), and what the snippets was trying to point out. And I see the problem with derived classes too. > Didn't understand the program snippet. Sorry it is not one snippet, but the 4 variants should be considered indipendently. > > Much of this is due to hoary optimizations and I agree is ill-documented. I > hope Guido's current rework of all this stuff will leave the endcases more > explainable. That will be a lot to work for porting it to jython . In any case the manual is really not clear (euphemism ) about this. The point is that jython implements the letter of the manual, and even extend the caching opt to some others __magic__ methods. I wanted to know the intended behaviour in order to fix that in jython. regards Samuele Pedroni. From tim.one at home.com Sun Jun 3 01:56:34 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 2 Jun 2001 19:56:34 -0400 Subject: [Python-Dev] What should changing/setting __getattr__ (and similars) after classdef time do ? In-Reply-To: <001801c0ebbe$47b60a40$8a73fea9@newmexico> Message-ID: [Samuele Pedroni] > ... > The point is that jython implements the letter of the manual, and even > extend the caching opt to some others __magic__ methods. I wanted to > know the intended behaviour in order to fix that in jython. You got that one right the first time: this requires BDFL pronouncement! As semantically significant optimizations (the only reason for caching __getattr__, e.g.) creep into the code but the docs lag behind, it gets more and more unclear what's mandatory behavior and what's implementation-defined. This came up a couple weeks ago again in the context of what, exactly, rich comparisons are supposed to do in all cases. After poking holes in everything Guido wrote, he turned it around and told me to write up what I think it should say (which I have yet to do, as it's time-consuming and it appears some of the current CPython behavior is at least partly accidental -- but unclear exactly which parts). So don't be surprised if the same trick gets played on you ... From tim.one at home.com Sun Jun 3 06:04:57 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 00:04:57 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson] > Ah, like this one: > > dict = {} > > # let's force dict to malloc its table > for i in range(1,10): > dict[i] = i > > class Machiavelli2: > def __eq__(self, other): > dict.clear() > return 1 > def __hash__(self): > return 0 > > dict[Machiavelli2()] = Machiavelli2() > > print dict[Machiavelli2()] Told you it was easy . > I'll attach a patch, but it's another branch inside lookdict (though > not lookdict_string which is I guess the really performance sensitive > one). lookdict_string is crucial to Python's own performance. Dicts indexed by ints or class instances or ... are vital to other apps. > Index: dictobject.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v > retrieving revision 2.100 > diff -c -1 -r2.100 dictobject.c > *** dictobject.c 2001/06/02 08:27:39 2.100 > --- dictobject.c 2001/06/02 11:36:47 > *************** > *** 273,274 **** > --- 273,281 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { > *************** > *** 310,311 **** > --- 317,325 ---- > cmp = > PyObject_RichCompareBool(ep->me_key, key, Py_EQ); > + if (ep0 != mp->ma_table) { > + PyErr_SetString(PyExc_RuntimeError, > + "dict resized on > comparison"); > + ep = mp->ma_table; > + while (ep->me_value) ep++; > + return ep; > + } > if (cmp > 0) { Then we have other problems. Note the comment before lookdict: Exceptions are never reported by this function, and outstanding exceptions are maintained. The patched code doesn't preserve that. Looking for "the first" unused or dummy slot isn't good enough either, as surely the user has the right to expect that after, e.g., d[m] = 1, d[m] retrieves 1. That is, picking a reusable slot "at random" doesn't respect the *semantics* of dict operations ("just because" the dict resized doesn't mean the key they're looking for went away!). It would be better in this case to go back to the top and start over. However, then an adversarial user can construct a case that never terminates. Unclear what to do. From tim.one at home.com Sun Jun 3 09:55:43 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 03:55:43 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010602004005.F23560@lyra.org> Message-ID: [Greg Stein] > ... > I was saying that, at the Python level, using a loop and doing b[i] into > a buffer/string/unicode object would seem to be relatively rare. b[0] > and stuff is reasonably common. Well, at the Python level buffer objects seem never to be used, probably because all the people who know about them don't advertise it because it's an easy way to provoke core dumps now. I don't have any real objection to any way anyone wants to fix that, just so long as it gets fixed. >> I take that as "yes" to my "nobody cares about it enough to >> maintain it?". In that light, Guido's ambivalence is indeed >> surprising . > Eh? I'll maintain the thing, but you're confusing that with adding more > features into it. Different question. I haven't asked for new features, just that what's already there get fixed: Python-level buffer objects are unsafe, the docs remain incomplete, there's random stuff like file.readinto() that's not documented at all (could be that's the only one -- it's certainly "discovered" on c.l.py often enough, though), and there are no buffer tests in the std test suite. The work to introduce the type wasn't completed, nobody works on it, and finishing work 3 years late doesn't count as "new feature" in my book . From gstein at lyra.org Sun Jun 3 11:10:36 2001 From: gstein at lyra.org (Greg Stein) Date: Sun, 3 Jun 2001 02:10:36 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: ; from tim.one@home.com on Sun, Jun 03, 2001 at 03:55:43AM -0400 References: <20010602004005.F23560@lyra.org> Message-ID: <20010603021036.U23560@lyra.org> On Sun, Jun 03, 2001 at 03:55:43AM -0400, Tim Peters wrote: > [Greg Stein] > > ... > > I was saying that, at the Python level, using a loop and doing b[i] into > > a buffer/string/unicode object would seem to be relatively rare. b[0] > > and stuff is reasonably common. > > Well, at the Python level buffer objects seem never to be used, probably I'm talking about string objects and unicode objects, too. The point is that b[i] loops don't have to be all that speedy because it isn't used often. > because all the people who know about them don't advertise it because it's > an easy way to provoke core dumps now. Easy? Depends on what you use them with. >... > >> I take that as "yes" to my "nobody cares about it enough to > >> maintain it?". In that light, Guido's ambivalence is indeed > >> surprising . > > > Eh? I'll maintain the thing, but you're confusing that with adding more > > features into it. Different question. > > I haven't asked for new features, just that what's already there get fixed: > Python-level buffer objects are unsafe, the docs remain incomplete, I'll fix the code. > there's > random stuff like file.readinto() that's not documented at all (could be > that's the only one -- it's certainly "discovered" on c.l.py often enough, > though), Find another goat to screw for that one. I don't know anything about it. Hmm... Using the "annotate" feature of ViewCVS, I see that Guido added it. Go blame him if you want to scream about that function and its lack of doc. > and there are no buffer tests in the std test suite. The work to > introduce the type wasn't completed, nobody works on it, and finishing work > 3 years late doesn't count as "new feature" in my book . Now you're just being bothersome. You want all that stuff, then feel free. I'll volunteer to do the code. You can go beat some heads, or find other volunteers. I'll do the code fixing just to placate you, and to get all this ranting about the buffer object to quiet down, but not because I'm joyful to do it. not-cheers, -g -- Greg Stein, http://www.lyra.org/ From dgoodger at bigfoot.com Sun Jun 3 16:39:42 2001 From: dgoodger at bigfoot.com (David Goodger) Date: Sun, 03 Jun 2001 10:39:42 -0400 Subject: [Python-Dev] new PEP candidates Message-ID: I have just posted three related PEP candidates to the Doc-SIG: - PEP: Docstring Processing System Framework http://mail.python.org/pipermail/doc-sig/2001-June/001855.html - PEP: DPS Generic Implementation Details http://mail.python.org/pipermail/doc-sig/2001-June/001856.html - PEP: Docstring Conventions http://mail.python.org/pipermail/doc-sig/2001-June/001857.html These are all part of the newly created Python Docstring Processing System project, http://docstring.sf.net. Barry: Please assign PEP numbers to these if possible. Once PEP numbers have been assigned, I will post to comp.lang.python. Thanks. A related project is the second draft of reStructuredText, a docstring markup syntax definition. The project is http://structuredtext.sf.net, and I've posted the following to Doc-SIG: - An Introduction to reStructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001858.html - Problems With StructuredText http://mail.python.org/pipermail/doc-sig/2001-June/001859.html - reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001860.html - Python Extensions to the reStructuredText Markup Specification http://mail.python.org/pipermail/doc-sig/2001-June/001861.html I am not seeking PEP status for reStructuredText at this time; I think it's one step too far removed from the Python language to warrant a PEP. If you think it *should* be a PEP, I will be happy to convert it. -- David Goodger dgoodger at bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net From mwh at python.net Sun Jun 3 23:47:48 2001 From: mwh at python.net (Michael Hudson) Date: 03 Jun 2001 22:47:48 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 00:04:57 -0400" References: Message-ID: "Tim Peters" writes: > It would be better in this case to go back to the top and start > over. Yes. What you checked in is obviously better. I'll stick to being the bearer of bad tidings... > However, then an adversarial user can construct a case that never > terminates. I seem to have done this - it was odd, though - it only loops when I bump the dict to fairly enormous preportions for reasons I don't really (want to) understand. > Unclear what to do. Not worrying about it seems entirely reasonable - I now have sitting on my hard drive the wierdest way of spelling "while 1: pass" *I've* ever seen. and-I'll-stop-poking-holes-now-ly y'rs m. -- The rapid establishment of social ties, even of a fleeting nature, advance not only that goal but its standing in the uberconscious mesh of communal psychic, subjective, and algorithmic interbeing. But I fear I'm restating the obvious. -- Will Ware, comp.lang.python From tim.one at home.com Mon Jun 4 01:03:31 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 19:03:31 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Tim] >> It would be better in this case to go back to the top and start >> over. [Michael Hudson] > Yes. What you checked in is obviously better. I'll stick to being > the bearer of bad tidings... Hey, if it's fun, do whatever what you want! If you hadn't provoked me, I would have let it slide. Guido only cares about the end result . >> However, then an adversarial user can construct a case that never >> terminates. > I seem to have done this - it was odd, though - it only loops when I > bump the dict to fairly enormous preportions for reasons I don't > really (want to) understand. Pass it on. I deliberately "started over" via a recursive call instead of a goto so that an offending program would eventually die with a stack fault instead of just running forever. So if you're seeing something run forever, it may be a different problem. >> Unclear what to do. > Not worrying about it seems entirely reasonable I don't think anyone is happy leaving an exploitable hole in Python -- we endure enormous pain to plug those. Except, I guess, for buffer objects . I simply haven't thought of a good and efficient way to plug this one. Implementing an "internal immutable dict" type appeals to me, but it conflicts with that the affected routines believe to the core of their souls that exceptions raised during comparisons are to be ignored -- and raising a "hey, you can't change the dict *now*!" exception doesn't do the user any good if they never see it. Would plug the hole, but an *innocent* user would never know why their program failed to work as (probably) expected. From tim.one at home.com Mon Jun 4 02:38:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 3 Jun 2001 20:38:53 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010603021036.U23560@lyra.org> Message-ID: [Tim] >> because all the people who know about them don't advertise it >> because it's an easy way to provoke core dumps now. [Greg Stein] > Easy? Depends on what you use them with. "Easy" and "depends" both, sure. I don't understand the argument: core dumps are always presumed to be errors in the Python implementation, not the users's fault. In this case, they are Python's fault by any accounting. On rare occasions we just give up and say "sorry, but we simply don't know a reasonable way fix it -- but it's still Python's fault" (for example, see the dict thread this weekend). >> I haven't asked for new features, just that what's already there get >> fixed: Python-level buffer objects are unsafe > I'll fix the code. Thank you! >> the docs remain incomplete, there's random stuff like file.readinto() >> that's not documented at all (could be that's the only one -- it's >> certainly "discovered" on c.l.py often enough, though), > Find another goat to screw for that one. I don't know anything about it. > > Hmm... Using the "annotate" feature of ViewCVS, I see that Guido > added it. Go blame him if you want to scream about that function and > its lack of doc. I don't care who added it: I haven't asked anyone specific to do anything. I've been asking whether *anyone* cares enough to address the backlog of buffer maintenance work. I don't even know who dreamed up the buffer object -- although at this point I bet I can guess . >> and there are no buffer tests in the std test suite. The work to >> introduce the type wasn't completed, nobody works on it, and >> finishing work 3 years late doesn't count as "new feature" in my book > Now you're just being bothersome. You bet. It's the same list of things I gave in my first msg; nobody volunteered to do any work then, so I repeated them. > You want all that stuff, then feel free. "All that stuff" is the minimum now required of new features. Buffers got in before Guido got tougher about this stuff, but if they're worth having at all then surely they're worth bringing up to current standards. > I'll volunteer to do the code. You can go beat some heads, or find other > volunteers. Anyone else care to chip in? > I'll do the code fixing just to placate you, and to get all this ranting > about the buffer object to quiet down, but not because I'm joyful > to do it. OK, I feel guitly -- but if that's enough to make you feel joyful again, the psychology here is just sick . From Barrett at stsci.edu Mon Jun 4 15:22:14 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon, 04 Jun 2001 09:22:14 -0400 Subject: [Python-Dev] strop vs. string References: <3B1214B3.9A4C295D@lemburg.com> Message-ID: <3B1B8B86.68E99328@STScI.Edu> "M.-A. Lemburg" wrote: > > Tim Peters wrote: > > > > [Tim] > > > About combining strop and buffers and strings, don't forget > > > unicodeobject.c: that's got oodles of basically duplicate code too. > > > /F suggested dealing with the minor differences via maintaining one > > > code file that gets compiled multiple times w/ appropriate #defines. > > > > [MAL] > > > Hmm, that only saves us a few kB in source, but certainly not > > > in the object files. > > > > That's not the point. Manually duplicated code blocks always get out of > > synch, as people fix bugs in, or enhance, one of them but don't even know > > about the others. /F brought this up after I pissed away a few hours trying > > to repair one of these in all places, and he noted that strop.replace() and > > string.replace() are woefully inefficient anyway. > > Ok, so what we'd need is a bunch of generic low-level string > operations: one set for 8-bit and one for 16-bit code. > > Looking at unicodeobject.c it seems that the section "Helpers" would > be a good start, plus perhaps a few bits from the method implementations > refactored to form a low-level string template library. > > Perhaps we should move this code into > a file stringhelpers.h which then gets included by stringobject.c > and unicodeobject.c with appropriate #defines set up for > 8-bit strings and for Unicode. > > > > The better idea would be making the types subclass from a generic > > > abstract string object -- I just don't know how this will be > > > possible with Guido's type patches. We'll just have to wait, > > > I guess. From fdrake at acm.org Mon Jun 4 16:07:37 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 10:07:37 -0400 (EDT) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> References: <3B1214B3.9A4C295D@lemburg.com> <3B1B8B86.68E99328@STScI.Edu> Message-ID: <15131.38441.301314.46009@cj42289-a.reston1.va.home.com> Paul Barrett writes: > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. I've seen no mention > of their use for binary data objects, such as multidimensional arrays > and matrices. Will the buffer object also support these objects? If > no, then I suggest it be renamed to one that is less generic and more > descriptive. In a development version of my bindings to a Type-1 font rasterizer, I exposed a buffer interface to the resulting image data. Unfortunately, that code was lost and I've not had time to work that up again. I *think* that sort of thing was part of the intended application for the buffer interface, but I was not one of the "movers & shakers" for it, so I'm not entirely sure. > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, because the current design/implementation falls far > short of what I would expect for a buffer object. First, it is overly > complex: the support for multiple buffers does not appear necessary. > Second, the dangling pointer issue has not been resolved. I suggest I agree. From the discussions I remember, I don't recall a clear explanation of the need for "segmented" buffers. But that may just be a failing of my recollection. > the addition of lock flag which indicates that the data is currently > inaccessible, ie. that data and/or data pointer is in the process of > being modified. > > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; I'm not sure about the "rf_flags" field -- I see two aspects that you seem to be describing, and wouldn't call either use a "flag". There's data type (characters, anonymous binary data, image data, etc.), and element size (1 byte, 2 bytes, variable width). Those values may or may not be associated with the specific buffer or the type implementing the buffer (I'd go with the specific buffer just to allow buffer types that support different flavors). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. PEPs are good; I'll look forward to seeing it! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Mon Jun 4 18:29:53 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 11:29:53 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist Message-ID: <15131.46977.861815.323386@beluga.mojam.com> I recently upgraded to Mandrake 8.0. I find that the readline module is no longer getting built. When building, it builds rgbimb followed immediately by crypt. Readline, which is tested for in between, is not built. Apparently, it can't find one of the libraries required to build it. On my system, both readline and termcap are in /lib. Neither has a static version available and neither as a plain .so file available. The .so file always has a version number tacked onto the end: % ls -l /lib/libtermcap* /lib/libreadline* lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 If I create the necessary .so symlinks it builds okay. Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first one), but if it is valid for shared libraries to be installed with only a version-numbered .so file, then it seems to me that distutils ought to handle that. There are several programs in /usr/bin on my machine that seem to be dynamically linked to libreadline. In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, which suggests that the .so-without version number is valid as far as ld is concerned. Skip From Greg.Wilson at baltimore.com Mon Jun 4 19:33:29 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:33:29 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> The 'struct' module allows packing and unpacking orders to be specified, but doesn't provide a hook to report on the order used by the machine the script is running on. As I'm likely going to be using this module in future runs of my course, I'd like to add 'struct.getorder()', which would return either "<" or ">" (the characters used to signal little-endian and big-endian respectively). Does this duplicate something in some other standard module? Does it seem like a sensible idea? Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From fdrake at acm.org Mon Jun 4 19:42:28 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 4 Jun 2001 13:42:28 -0400 (EDT) Subject: [Python-Dev] struct.getorder() ? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F2E1F1D@nsamcanms1.ca.baltimore.com> Message-ID: <15131.51332.73137.795543@cj42289-a.reston1.va.home.com> Greg Wilson writes: > The 'struct' module allows packing and unpacking > orders to be specified, but doesn't provide a hook > to report on the order used by the machine the Python 2.0 introduced sys.byteorder; check it out: http://www.python.org/doc/current/lib/module-sys.html -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Greg.Wilson at baltimore.com Mon Jun 4 19:41:45 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 4 Jun 2001 13:41:45 -0400 Subject: [Python-Dev] struct.getorder() ? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1F1E@nsamcanms1.ca.baltimore.com> > Python 2.0 introduced sys.byteorder; check it out: > http://www.python.org/doc/current/lib/module-sys.html Woo hoo! Thanks, Fred --- should've guessed someone would be ahead of me :-). Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From barry at scottb.demon.co.uk Mon Jun 4 20:00:05 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Mon, 4 Jun 2001 19:00:05 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: <000201c0ed20$2f295c30$060210ac@private> Eric wrote: > While I'm at it, I should note that the design of the 11 was ancestral > to both the 8088 and 68000 microprocessors, and thus to essentially > every new general-purpose computer designed in the last fifteen years. The key to PDP-11 and VAX was lots of registers all a like and rich addressing modes for the instructions. The 8088 is very far from this design, its owes its design more to 4004 then the PDP-11. However the 68000 is the closer, but not as nice to program as there are too many special cases in its instruction set for my liking. BArry From mwh at python.net Mon Jun 4 20:05:10 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 19:05:10 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 11:29:53 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: Skip Montanaro writes: > I recently upgraded to Mandrake 8.0. I find that the readline > module is no longer getting built. When building, it builds rgbimb > followed immediately by crypt. Readline, which is tested for in > between, is not built. Apparently, it can't find one of the > libraries required to build it. On my system, both readline and > termcap are in /lib. Neither has a static version available and > neither as a plain .so file available. The .so file always has a > version number tacked onto the end: > > % ls -l /lib/libtermcap* /lib/libreadline* > lrwxrwxrwx 1 root root 18 May 29 10:53 /lib/libreadline.so.4 -> libreadline.so.4.1 > -rwxr-xr-x 1 root root 152440 Mar 25 01:26 /lib/libreadline.so.4.1 > lrwxrwxrwx 1 root root 19 May 29 10:53 /lib/libtermcap.so.2 -> libtermcap.so.2.0.8 > -rwxr-xr-x 1 root root 11608 Mar 26 10:32 /lib/libtermcap.so.2.0.8 > > If I create the necessary .so symlinks it builds okay. > > Perhaps this is a bug in Mandrake 8.0 (it wouldn't be the first > one), but if it is valid for shared libraries to be installed with > only a version-numbered .so file, then it seems to me that distutils > ought to handle that. Hmm. Does compiling a proggie $ gcc foo.c -lreadline work? It doesn't here if I move libreadline.so & libreadline.a out of the way. If the C compiler isn't going to find readline, there ain't much point distutils trying to find it... > There are several programs in /usr/bin on my machine that seem to be > dynamically linked to libreadline. Those things will be directly linked to libreadline.so.whatever; I believe the libfoo.so files are only for the (compile time) linker's benefit. > In addition, /usr/lib/python2.0/lib-dynload/readline.so exists, > which suggests that the .so-without version number is valid as far > as ld is concerned. ld != ld.so. Do you need a readline-devel package or something? Cheers, M. -- It's actually a corruption of "starling". They used to be carried. Since they weighed a full pound (hence the name), they had to be carried by two starlings in tandem, with a line between them. -- Alan J Rosenthal explains "Pounds Sterling" on asr From mwh at python.net Mon Jun 4 21:01:10 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 20:01:10 +0100 Subject: [Python-Dev] another dict crasher In-Reply-To: "Tim Peters"'s message of "Sun, 3 Jun 2001 19:03:31 -0400" References: Message-ID: "Tim Peters" writes: > >> However, then an adversarial user can construct a case that never > >> terminates. > > > I seem to have done this - it was odd, though - it only loops when I > > bump the dict to fairly enormous preportions for reasons I don't > > really (want to) understand. > > Pass it on. I deliberately "started over" via a recursive call instead of a > goto so that an offending program would eventually die with a stack fault > instead of just running forever. So if you're seeing something run forever, > it may be a different problem. I left it running overnight, and it terminated! (with a KeyError). I can't say I really understand what's going on, but I'm in Exam Hell at the moment (for the last time! Yippee!), so don't have any spare cycles to think about it hard. Anyway, this is what I was running: dict = {} # let's force dict to malloc its table for i in range(1,10000): dict[i] = i hashcode = 0 class Machiavelli2: def __eq__(self, other): global hashcode d2 = dict.copy() dict.clear() hashcode += 1 for k,v in d2.items(): dict[k] = v return 1 def __hash__(self): return hashcode dict[Machiavelli2()] = Machiavelli2() print dict[Machiavelli2()] If you thought my last test case was contrived, I look forward to you finding adjectives for this one... Cheers, M. -- (ps: don't feed the lawyers: they just lose their fear of humans) -- Peter Wood, comp.lang.lisp From barry at digicool.com Mon Jun 4 21:42:34 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 4 Jun 2001 15:42:34 -0400 Subject: [Python-Dev] Status of 2.0.1? Message-ID: <15131.58538.121723.671374@anthem.wooz.org> I've just fixed two buglets in the regression test suite for Python 2.0.1 (release20-maint branch). Now I get the following results from regrtest: 88 tests OK. 20 tests skipped: test_al test_audioop test_cd test_cl test_dbm test_dl test_gl test_imageop test_imgfile test_largefile test_linuxaudiodev test_minidom test_nis test_pyexpat test_rgbimg test_sax test_sunaudiodev test_timing test_winreg test_winsound Has anybody else tested out the 2.0.1 branch on anything? I'm going to run some quick tests with Mailman 2.0.x on Python 2.0.1 over the next hour or so. I'm just wondering what's left to do for this release, and how I can help out. -Barry From esr at thyrsus.com Mon Jun 4 22:11:14 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 16:11:14 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <000201c0ed20$2f295c30$060210ac@private>; from barry@scottb.demon.co.uk on Mon, Jun 04, 2001 at 07:00:05PM +0100 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> Message-ID: <20010604161114.A20979@thyrsus.com> Barry Scott : > Eric wrote: > > While I'm at it, I should note that the design of the 11 was ancestral > > to both the 8088 and 68000 microprocessors, and thus to essentially > > every new general-purpose computer designed in the last fifteen years. > > The key to PDP-11 and VAX was lots of registers all a like and rich > addressing modes for the instructions. > > The 8088 is very far from this design, its owes its design more to > 4004 then the PDP-11. Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, which was descended from the 11. Admiitedly, in the chain of transmission here were two stages of redesign so bad that the connection got really tenuous. -- Eric S. Raymond ...Virtually never are murderers the ordinary, law-abiding people against whom gun bans are aimed. Almost without exception, murderers are extreme aberrants with lifelong histories of crime, substance abuse, psychopathology, mental retardation and/or irrational violence against those around them, as well as other hazardous behavior, e.g., automobile and gun accidents." -- Don B. Kates, writing on statistical patterns in gun crime From skip at pobox.com Mon Jun 4 22:49:07 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 15:49:07 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> Message-ID: <15131.62531.595208.65994@beluga.mojam.com> [my readline woes snipped] Michael> Hmm. Does compiling a proggie Michael> $ gcc foo.c -lreadline Michael> work? It doesn't here if I move libreadline.so & libreadline.a Michael> out of the way. Yup, it does: beluga:tmp% cc -o foo foo.c -lreadline -ltermcap beluga:tmp% ./foo >>sdfsdfsdf sdfsdfsdf (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) In this case, foo.c is #include #include #include main() { printf("%s\n", readline(">>" )); } Michael> Do you need a readline-devel package or something? Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" does list readline-devel as the provider. I just reinstalled it using --force. Now the .so symlinks are there. Go figure... Oh well, probably ought to drop it unless another Mandrake user complains. I'm really amazed at how many packages Mandrake chose *not* to install even though I selected all the groups during install and was installing into fresh / and /usr partitions. I've been dribbling various packages in bit-by-bit as I've discovered omissions. In the past I've also noticed files apparently not installed even though the packages that were supposed to provide them were installed. Skip From guido at digicool.com Mon Jun 4 23:03:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 04 Jun 2001 17:03:35 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: Your message of "Tue, 29 May 2001 02:15:07 EDT." References: Message-ID: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > > used to extend Idle. We've used this extensively, building entire > > "applications" as Idle extensions. > > > > Now that we're moving to Python 2.1, we find the same old directions > > for extending Idle (in extend.txt), but there appears to be no > > extend.py in Idle-0.8. > > > > Does anyone know how we can add extensions to Idle-0.8? It's simpler than before. Extensions are now loaded simply by being named in config.txt (or any of the other custom configuration files). For example, ZoomHeight.py is a very simple extension; it is loaded because of the line [ZoomHeight] somewhere in config.txt. The interface for extensions is the same as before; ZoomHeight.py hasn't changed since 1999. I'll update extend.txt. Can someone forward this to the original asker of the question, or to the list where it was posted? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Jun 4 23:03:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 16:03:58 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> Message-ID: <15131.63422.695297.393477@beluga.mojam.com> Eric> Yes, but the 4004 was designed as a sort of lobotomized imitation Eric> of the 65xx, which was descended from the 11. Really? I was always under the impression the 4004 was considered the first microprocessor. The page below says that and gives a date of 1971 for it. I have no idea if the author is correct, just that what he says agrees with my memory. He does seem to have an impressive collection of old computer iron: http://www.piercefuller.com/collect/i4004/ I haven't found a statement about the origins of the 6502, but this page suggests that commercial computers were being made from 8080's before 6502's: http://www.speer.org/2backup/pcbs_pch.html Ah, wait a minute... This page: http://www.geocities.com/SiliconValley/Byte/6508/6502/english/versoes.htm says the 6502 was descended from the 6800. I'm getting less and less convinced that the 4004 somehow descended from the 65xx family. (Maybe we should shift this thread to the always entertaining folks at comp.arch... ;-) Skip From esr at thyrsus.com Mon Jun 4 23:19:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 4 Jun 2001 17:19:08 -0400 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <15131.63422.695297.393477@beluga.mojam.com>; from skip@pobox.com on Mon, Jun 04, 2001 at 04:03:58PM -0500 References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> Message-ID: <20010604171908.A21831@thyrsus.com> Skip Montanaro : > Really? I was always under the impression the 4004 was considered the first > microprocessor. The page below says that and gives a date of 1971 for it. First sentence is widely believed, but there was an earlier micro called the Star-8 designed at Burroughs that has been almost completely forgotten. I only know about it because I worked there in 1980 with one of the people who designed it. I think I had a brain fart and it's the Z80 that was descended from the 6502. I was going by a remark in some old lecture notes. I've got a copy of the definitive reference on history of computer architecture and will check. -- Eric S. Raymond "Extremism in the defense of liberty is no vice; moderation in the pursuit of justice is no virtue." -- Barry Goldwater (actually written by Karl Hess) From mwh at python.net Mon Jun 4 23:55:34 2001 From: mwh at python.net (Michael Hudson) Date: 04 Jun 2001 22:55:34 +0100 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: Skip Montanaro's message of "Mon, 4 Jun 2001 15:49:07 -0500" References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: Skip Montanaro writes: > [my readline woes snipped] > > Michael> Hmm. Does compiling a proggie > > Michael> $ gcc foo.c -lreadline > > Michael> work? It doesn't here if I move libreadline.so & libreadline.a > Michael> out of the way. > > Yup, it does: > > beluga:tmp% cc -o foo foo.c -lreadline -ltermcap > beluga:tmp% ./foo > >>sdfsdfsdf > sdfsdfsdf > > (This after deleting both /lib/libreadline.so and /lib/libhistory.so.) Odd. What does the output of $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose look like? In particular the bit at the end where you get things like: attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.so failed attempt to open /usr/lib/gcc-lib/i386-redhat-linux/2.95.1/libreadline.a failed attempt to open /usr/i386-redhat-linux/lib/libreadline.so failed attempt to open /usr/i386-redhat-linux/lib/libreadline.a failed attempt to open /usr/bin/../lib/libreadline.so succeeded -lreadline (/usr/bin/../lib/libreadline.so) (this is more for my personal curiosity than any important reason). > Got that. I just noticed that "rpm -q --whatprovides /lib/libreadline.so" > does list readline-devel as the provider. I just reinstalled it using > --force. Now the .so symlinks are there. Go figure... No :-) > Oh well, probably ought to drop it unless another Mandrake user complains. Sounds reasonable. Cheers, M. -- After a heavy night I travelled on, my face toward home - the comma being by no means guaranteed. -- paraphrased from cam.misc From tim.one at home.com Mon Jun 4 23:58:48 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 4 Jun 2001 17:58:48 -0400 Subject: [Python-Dev] Re: What happened to Idle's extend.py? In-Reply-To: <200106042103.RAA04077@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Can someone forward this to the original asker of the question, or to > the list where it was posted? Done. Thanks! From skip at pobox.com Tue Jun 5 03:01:01 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 20:01:01 -0500 Subject: [Python-Dev] readline not getting built when .so symlink doesn't exist In-Reply-To: References: <15131.46977.861815.323386@beluga.mojam.com> <15131.62531.595208.65994@beluga.mojam.com> Message-ID: <15132.12109.914981.110774@beluga.mojam.com> >> (This after deleting both /lib/libreadline.so and >> /lib/libhistory.so.) Michael> Odd. What does the output of Michael> $ gcc -o foo foo.c -lreadline -ltermcap -Wl,--verbose Michael> look like? Well, what it looks like is "Skip's a dunce...". Turns out there was a libreadline.so symlink /usr/lib also. It found that. When I deleted that it found /usr/lib/libreadline.a. Getting rid of that caused the link to (finally) fail. With just the version-based .so files cc apparently can't do the trick. Sorry to have wasted the bandwidth. Skip From skip at pobox.com Tue Jun 5 03:16:00 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 4 Jun 2001 20:16:00 -0500 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604171908.A21831@thyrsus.com> References: <20010530183833.B1654@thyrsus.com> <000201c0ed20$2f295c30$060210ac@private> <20010604161114.A20979@thyrsus.com> <15131.63422.695297.393477@beluga.mojam.com> <20010604171908.A21831@thyrsus.com> Message-ID: <15132.13008.429800.585157@beluga.mojam.com> Eric> Skip Montanaro : >> Really? I was always under the impression the 4004 was considered >> the first microprocessor. The page below says that and gives a date >> of 1971 for it. Eric> First sentence is widely believed, but there was an earlier micro Eric> called the Star-8 designed at Burroughs that has been almost Eric> completely forgotten. There was also a GE-8 (I think that was the name) developed at GE's R&D Center in the early 1970's timeframe - long before my time there. It was apparently very competitive with the other microprocessors produced about that time but never saw the light of day. I suspect that was at least due in part to the fact that GE built mainframes back then. Skip From tim.one at home.com Tue Jun 5 06:07:27 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 00:07:27 -0400 Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: [Michael Hudson, taking a break from exams] > I left it running overnight, and it terminated! (with a KeyError). I > can't say I really understand what's going on, but I'm in Exam Hell at > the moment (for the last time! Yippee!), so don't have any spare > cycles to think about it hard. Good luck! I really shouldn't tell you this now, but the real reason people dread turning 30, 40, 50, 60-- and so on --is that every 10th birthday starting at 30 they test you *again*! On every course you ever took. It's grueling. The penalty for failure is severe: flunk just one review exam, and they pick a date at random over the following 10 years for you to die. No point fighting it, it's just civilization's nasty little secret. This is why life expectancy correlates with education, but it does appear that the human limit for remembering both plane geometry and the names of hundreds of dead psychopaths is about 120 years. In the meantime, I built a test case to tickle stack overflow directly, and it does so quickly: class Yuck: def __init__(self): self.i = 0 def make_dangerous(self): self.i = 1 def __hash__(self): # direct to slot 4 in table of size 8; slot 12 when size 16 return 4 + 8 def __eq__(self, other): if self.i == 0: # leave dict alone pass elif self.i == 1: # fiddle to 16 slots self.__fill_dict(6) self.i = 2 else: # fiddle to 8 slots self.__fill_dict(4) self.i = 1 return 1 def __fill_dict(self, n): self.i = 0 dict.clear() for i in range(n): dict[i] = i dict[self] = "OK!" y = Yuck() dict = {y: "OK!"} z = Yuck() y.make_dangerous() print dict[z] It just arranges to move y to a different slot in a different-sized table each time __eq__ is invoked, alternating between slot 4 in a size-8 table and slot 12 in a size-16 table. However, if I stick "print self.i" at the start of __eq__, it dies with a KeyError instead! That's why I'm mentioning it -- could be the same misdirection you're seeing. I can't account for the KeyError in any rational way: under Windows, it's actually hitting a stack overflow in the bowels of the system malloc() then. Windows "recovers" from that and presses on. Everything that happens after appears to be an accident. win98-as-usual-ly y'rs - tim PS: You'll be tested on this, too . From greg at cosc.canterbury.ac.nz Tue Jun 5 07:00:30 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 05 Jun 2001 17:00:30 +1200 (NZST) Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> "Eric S. Raymond" : > I think it's significant that MMX > instructions and so forth entered the Intel line to support *games*, > not Navier-Stokes calculations. But when version 1.0 of FlashFlood! comes out, requiring high-quality real-time hydrodynamics simulation, Navier-Stokes calculations will suddenly become very important... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jun 5 07:18:50 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:18:50 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: [Paul Barrett] > From the discussion so far, it appears that the buffer object is > intended solely to support string-like objects. Unsure where that impression came from. Since buffers wrap a slice "of memory", they don't make much sense except where raw memory makes sense. That includes the guts of strings, but also (in the core distribution) memory-mapped files (the mmap module) and arrays (the array module), which also support the buffer interface. > I've seen no mention of their use for binary data objects, I mentioned two above. The use of buffers with mutable objects is dangerous, though, because of the dangling-pointer problem, and Python itself never uses buffers except for strings. Even arrays are stretching it; e.g., >>> import array >>> a = array.array('i') >>> a.append(2) >>> a.append(3) >>> a array('i', [2, 3]) >>> b = buffer(a) >>> len(b) 8 >>> [b[i] for i in range(len(b))] ['\x02', '\x00', '\x00', '\x00', '\x03', '\x00', '\x00', '\x00'] >>> While of *some* conceivable use, that's not exactly destined to become wildly popular . > such as multidimensional arrays and matrices. Since core Python has no such things, of course it doesn't use buffers for those either. > Will the buffer object also support these objects? In what sense? If you have an implementation of such things, and believe that getting at raw memory slices is useful, sure -- fill in its tp_as_buffer slot. > ... > On the otherhand, if yes, then I think the buffer C/API needs to be > reimplemented, Or do you mean redesigned? > because the current design/implementation falls far short of what I > would expect for a buffer object. First, it is overly complex: the > support for multiple buffers does not appear necessary. AFACT it's entirely unused; everything in the core that supports the buffer interface returns a segment count of 1, and the buffer object itself appears to raise exceptions whenever it sees a reference to a segment other than "the first". I don't know why it's there. > Second, the dangling pointer issue has not been resolved. I expect Greg will fix that now. > I suggest the addition of lock flag which indicates that the data is > currently inaccessible, ie. that data and/or data pointer is in the > process of being modified. To sell that (but please save it for the PEP ) I expect you have to provide some compelling uses for it. The current uses have no need of it. In the absence of specific good uses, I'm afraid it just sounds like another variant of "I can't prove segments *won't* be useful, so let's toss them in too!". > I would suggest the following structure to be much more useful for > char and binary data: > > typedef struct { > char* rf_pointer; > int rf_length; > int rf_access; /* read, write, etc. */ > int rf_lock; /* data is in use */ > int rf_flags; /* type of data; char, binary, unicode, etc. */ > } PyBufferProcs; > > But I'm guessing my proposal is way off base. Depends on what you want to do. You've only mentioned multidimensional arrays, and the need for umpteen flavors of access control there, beyond the current object's b_readonly flag, is simply unclear. Also unclear why you've dropped the current object's b_base pointer: without it, the buffer has no way to get back to the object from which the memory is borrowed, nor even a guarantee that the object won't die while the buffer is still active. If you do pursue this, please please please boost the rf_length field! An int is too small to hold real-life sizes anymore, and "large files" are becoming common even on 32-bit boxes. Python needs to grow a wholly supported way to pass 8-byte ints around (and it looks like I'll be adding that to the struct module, possibly to the array module and marshal too). > If I find some time, I'll prepare a PEP to air these issues, since > they are very important to those of us working on and with > multidimensional arrays. We find the current buffer API lacking. A PEP is always a good idea. From aahz at rahul.net Tue Jun 5 07:41:28 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 4 Jun 2001 22:41:28 -0700 (PDT) Subject: [Python-Dev] strop vs. string In-Reply-To: from "Tim Peters" at Jun 05, 2001 01:18:50 AM Message-ID: <20010605054129.933C199C83@waltz.rahul.net> Tim Peters wrote: > > If you do pursue this, please please please boost the rf_length field! An > int is too small to hold real-life sizes anymore, and "large files" are > becoming common even on 32-bit boxes. Python needs to grow a wholly > supported way to pass 8-byte ints around (and it looks like I'll be adding > that to the struct module, possibly to the array module and marshal too). Hey! Are you discriminating against 128-bit ints? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From tim.one at home.com Tue Jun 5 07:53:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:53:26 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010601032316.A15635@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > So maybe there's a market for 128-bit floats after all. I think very small. There's a much larger market for 128-bit float *registers*, though -- in the "treat it as 2 64-bit, or 4 32-bit, floats, and operate on them in parallel" sense. That's the baby vector register view, and is already happening. > I'm still skeptical about how likely those applications are to > influence the architecture of general-purpose processors. I saw a > study once that said heavy-duty scientific floating point only > accounts for about 2% of the computing market -- and I think it's > significant that MMX instructions and so forth entered the Intel > line to support *games*, not Navier-Stokes calculations. Heh. I used to wonder about that, but not any more: games may have no more than entertainment (sometimes disguised as education ) in mind, but what do the latest & greatest games do? Strive to simulate physical reality (sometimes with altered physical laws), just as closely as possible. Whether it's ray-tracing, effective motion-compression, or N-body simulations, games are easily as demanding as what computational chemists do. A difference is that general-purpose *compilers* aren't being taught how to use these "new" architectural gimmicks. All that new hardware sits unused unless you've got an app dipping into assembler, or into a hand-coded utility library written in assembler. The *general* market for pure floating-point can barely support what's left of the supercomputer industry anymore (btw, Cray never became a billion-dollar company even in its heyday, and what's left of them gets passed around for peanuts now). > That 2% will have to get a lot bigger before I can see Intel doubling > its word size again. It's not just the processor design; the word size > has huge implications for buses, memory controllers, and the whole > system architecture. Intel is just now getting its foot wet with with 64-bit boxes. That was old news to me 20 years ago. All I hope to see 20 years from now is that somewhere along the way I got smart enough to drop computers and get a real life . by-then-the-whole-system-will-exist-in-the-superposition-of-a- single-plutonium-atom's-states-anyway-ly y'rs - tim From tim.one at home.com Tue Jun 5 07:55:48 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 5 Jun 2001 01:55:48 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010605054129.933C199C83@waltz.rahul.net> Message-ID: [Aahz] > Hey! Are you discriminating against 128-bit ints? Nope! I'm Guido's marketing guy: 128-bit ints will be the killer reason you need to upgrade to Python 3000, when the time comes. Python didn't get to where it is by giving away all the good stuff early . From MarkH at ActiveState.com Tue Jun 5 09:10:53 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 5 Jun 2001 17:10:53 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1B8B86.68E99328@STScI.Edu> Message-ID: > complex: the support for multiple buffers does not appear necessary. I seem to recall Guido telling me once that this was implemented for NumPy, specifically for some of their matrices. Not being a user of that package means that unfortunately I can not be any more specific... I am confident Guido will recall the specific details... Mark. From mwh at python.net Tue Jun 5 10:39:24 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 5 Jun 2001 09:39:24 +0100 (BST) Subject: [Python-Dev] another dict crasher In-Reply-To: Message-ID: Haven't run your example yet as my machine's not on at the moment. On Tue, 5 Jun 2001, Tim Peters wrote: > However, if I stick "print self.i" at the start of __eq__, it dies > with a KeyError instead! That's why I'm mentioning it -- could be the > same misdirection you're seeing. I can't account for the KeyError in > any rational way: under Windows, it's actually hitting a stack > overflow in the bowels of the system malloc() then. Hmm. It's quite likely that PyMem_Malloc (or whatever) crapping out and returning NULL will get turned into a MemoryError, which will then get turned into a KeyError, isn't it? I could believe that malloc would set up some fancy sigsegv-type handlers for memory management purposes which then get called when it tramples all over the end of the stack. But I'm making this up as I go along... > Windows "recovers" from that and presses on. Everything that happens > after appears to be an accident. > > win98-as-usual-ly y'rs - tim Well, linux seems to be similarly inscrutable here. One problem is that this is a pig to run under the debugger - setting a breakpoint on lookdict isn't terribly interesting way to spend your time. I suppose you could just set the breakpoint on the recursive call... later. > PS: You'll be tested on this, too . Oh, piss off . Cheers, M. From guido at digicool.com Tue Jun 5 11:07:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 05:07:34 -0400 Subject: [Python-Dev] Happy event Message-ID: <200106050907.FAA08198@cj20424-a.reston1.va.home.com> I just wanted to send a note about a happy event in the Python family. Jeremy Hylton and his wife became the proud parents of twin girls on Sunday June 3rd. Please join Pythonlabs and Digital Creations in congratulating them, and wishing them much joy and luck. Also, don't expect Jeremy to be too responsive to email for the next 6-8 weeks. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From uche.ogbuji at fourthought.com Tue Jun 5 14:28:45 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:28:45 -0600 Subject: [Python-Dev] One more dict trick In-Reply-To: Message from Greg Ewing of "Tue, 05 Jun 2001 17:00:30 +1200." <200106050500.RAA02362@s454.cosc.canterbury.ac.nz> Message-ID: <200106051228.f55CSjk18336@localhost.local> > "Eric S. Raymond" : > > > I think it's significant that MMX > > instructions and so forth entered the Intel line to support *games*, > > not Navier-Stokes calculations. > > But when version 1.0 of FlashFlood! comes out, requiring > high-quality real-time hydrodynamics simulation, > Navier-Stokes calculations will suddenly become very > important... Shoot, I thought that was what Microsoft Hailstorm was all about. Path integrals about the atmospheric isobars, and all that... -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji at fourthought.com Tue Jun 5 14:32:07 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Tue, 05 Jun 2001 06:32:07 -0600 Subject: [Python-Dev] Happy event In-Reply-To: Message from Guido van Rossum of "Tue, 05 Jun 2001 05:07:34 EDT." <200106050907.FAA08198@cj20424-a.reston1.va.home.com> Message-ID: <200106051232.f55CW7618353@localhost.local> > I just wanted to send a note about a happy event in the Python family. > Jeremy Hylton and his wife became the proud parents of twin girls on > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > congratulating them, and wishing them much joy and luck. > > Also, don't expect Jeremy to be too responsive to email for the next > 6-8 weeks. :) *twin* girls? Try 6-8 years. Congrats and felicits of the highest order, of course, Jeremy. -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Barrett at stsci.edu Tue Jun 5 14:53:46 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Tue, 05 Jun 2001 08:53:46 -0400 Subject: [Python-Dev] Happy event References: <200106051232.f55CW7618353@localhost.local> Message-ID: <3B1CD65A.595E8CD@STScI.Edu> Uche Ogbuji wrote: > > > I just wanted to send a note about a happy event in the Python family. > > Jeremy Hylton and his wife became the proud parents of twin girls on > > Sunday June 3rd. Please join Pythonlabs and Digital Creations in > > congratulating them, and wishing them much joy and luck. > > > > Also, don't expect Jeremy to be too responsive to email for the next > > 6-8 weeks. :) > > *twin* girls? Try 6-8 years. > > Congrats and felicits of the highest order, of course, Jeremy. Actually girls are fine until about 13, after that I expect Jeremy won't be too responsive. Something about hormones and such. In any case, all the best, Jeremy! -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From aahz at rahul.net Tue Jun 5 16:41:10 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <3B1CD65A.595E8CD@STScI.Edu> from "Paul Barrett" at Jun 05, 2001 08:53:46 AM Message-ID: <20010605144110.DD90C99C84@waltz.rahul.net> Paul Barrett wrote: > Uche Ogbuji wrote: >> Guido: >>> >>> Also, don't expect Jeremy to be too responsive to email for the next >>> 6-8 weeks. :) >> >> *twin* girls? Try 6-8 years. > > Actually girls are fine until about 13, after that I expect Jeremy > won't be too responsive. Something about hormones and such. Are you trying to imply that there's a difference between girls and boys? compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr at thyrsus.com Tue Jun 5 16:55:59 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 10:55:59 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 07:41:10AM -0700 References: <3B1CD65A.595E8CD@STScI.Edu> <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: <20010605105559.A28963@thyrsus.com> Aahz Maruch : > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? Of course there's a difference. Girls, er, *mature* sooner. Congratulations, Jeremy! -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From pedroni at inf.ethz.ch Tue Jun 5 17:05:03 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Tue, 5 Jun 2001 17:05:03 +0200 (MET DST) Subject: [Python-Dev] Happy event Message-ID: <200106051505.RAA24810@core.inf.ethz.ch> > Subject: Re: [Python-Dev] Happy event > To: Barrett at stsci.edu (Paul Barrett) > Cc: python-dev at python.org > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > From: aahz at rahul.net (Aahz Maruch) > X-BeenThere: python-dev at python.org > X-Mailman-Version: 2.0.5 (101270) > List-Help: > List-Post: > List-Subscribe: , > List-Id: Python core developers > List-Unsubscribe: , > List-Archive: > Date: Tue, 5 Jun 2001 07:41:10 -0700 (PDT) > > Paul Barrett wrote: > > Uche Ogbuji wrote: > >> Guido: > >>> > >>> Also, don't expect Jeremy to be too responsive to email for the next > >>> 6-8 weeks. :) > >> > >> *twin* girls? Try 6-8 years. > > > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. > > Are you trying to imply that there's a difference between girls and > boys? > > compressing-a-five-screen-rant-down-to-a-single-sentence-ly y'rs > -- The simple fact that we are still moving from the previous bad habit of considering them different to considering them equal just implies/evolves differences. A neutral view-point would be: the N/S ratio between gender-phisiological- differences and the overall interpersonal differences is very big, at least when considering the whole personality and not single aspects. There is no established truth, we are just longing for equiblibrium: in the actual transition phase boys and girls are under different kind of cultural tensions related to self-identification,etc ... this makes differences. regards, Samuele Pedroni. From aahz at rahul.net Tue Jun 5 17:17:38 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 08:17:38 -0700 (PDT) Subject: [Python-Dev] Happy event In-Reply-To: <20010605105559.A28963@thyrsus.com> from "Eric S. Raymond" at Jun 05, 2001 10:55:59 AM Message-ID: <20010605151739.3864199C83@waltz.rahul.net> Eric S. Raymond wrote: > Aahz Maruch : >> >> Are you trying to imply that there's a difference between girls and >> boys? > > Of course there's a difference. Girls, er, *mature* sooner. Not legally. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From esr at thyrsus.com Tue Jun 5 17:30:08 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 5 Jun 2001 11:30:08 -0400 Subject: [Python-Dev] Happy event In-Reply-To: <20010605151739.3864199C83@waltz.rahul.net>; from aahz@rahul.net on Tue, Jun 05, 2001 at 08:17:38AM -0700 References: <20010605105559.A28963@thyrsus.com> <20010605151739.3864199C83@waltz.rahul.net> Message-ID: <20010605113008.A29236@thyrsus.com> Aahz Maruch : > Eric S. Raymond wrote: > > Aahz Maruch : > >> > >> Are you trying to imply that there's a difference between girls and > >> boys? > > > > Of course there's a difference. Girls, er, *mature* sooner. > > Not legally. My point was that the hormone thing is likely to be an issue sooner with twin girls. Hey, Jeremy...fraternal or identical? -- Eric S. Raymond What is a magician but a practicing theorist? -- Obi-Wan Kenobi, 'Return of the Jedi' From guido at digicool.com Tue Jun 5 19:21:32 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 13:21:32 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106051721.f55HLW729400@odiug.digicool.com> While thinking about metatypes, I had an interesting idea. In PEP 252 and 253 (which still need much work, please bear with me!) I describe making classes and types more similar to each other. In particular, you'll be able to subclass built-in object types in much the same way as you can subclass user-defined classes today. One nice property of classes is that a class is a factory function for its instances; in other words, if C is a class, C() returns a C instance. Now, for built-in types, it makes sense to do the same. In my current prototype, after "from types import *", DictType() returns an empty dictionary and ListType() returns an empty list. It would be nice take this much further: IntType() could return an integer, TupleType() could return a tuple, StringType() could return a string, and so on. These are immutable types, so to make this useful, these constructors need to take an argument to specify a specific value. What should the type of such an argument be? It's not very interesting to require that int(x) takes an integer argument! Most of the popular standard types already have a constructor function that's named after their type: int(), long(), float(), complex(), str(), unicode(), tuple(), list() We could make the constructor take the same argument(s) as the corresponding built-in function. Now invoke the Zen of Python: "There should be one-- and preferably only one --obvious way to do it." So why not make these built-in functions *be* the corresponding types? Then instead of >>> int you would see >>> int but otherwise the behavior would be identical. (Note that I don't require that a factory function returns a *new* object each time.) If we did this for all built-in types, we'd have to add maybe a dozen new built-in names -- I think that's no big deal and actually helps naming types. The types module, with its awkward names and usage, can be deprecated. There are details to be worked out, e.g. - Do we really want to have built-in names for code objects, traceback objects, and other figments of Python's internal workings? - What should the argument to dict() be? A list of (key, value) pairs, a list of alternating keys and values, or something else? - What else? Comments? --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Tue Jun 5 19:34:35 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 5 Jun 2001 19:34:35 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <001301c0ede5$cb804a10$e46940d5@hagrid> guido wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? +1 from here. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? nope. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? how about supporting the following: d == dict(d.items()) d == dict(d.keys(), d.values()) and also: d = dict(k=v, k=v, ...) Cheers /F From ping at lfw.org Tue Jun 5 19:41:22 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 5 Jun 2001 12:41:22 -0500 (CDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > I'm all in favour of this. In fact, i had the impression that you were planning to do exactly this all along. I seem to recall some conversation about this a long time ago -- am i dreaming? > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. I would love this. > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Perhaps we would only provide built-in names for objects that are commonly constructed. For things like code objects that are never user-constructed, their type objects could be set aside in a module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A list of (key, value) pairs. It's the only sensible choice, given that dict.items() is the obvious way to get all the information out of a dictionary into a list. -- ?!ng From aahz at rahul.net Tue Jun 5 19:40:27 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 10:40:27 -0700 (PDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> from "Guido van Rossum" at Jun 05, 2001 01:21:32 PM Message-ID: <20010605174027.17A4199C83@waltz.rahul.net> I'm +1 on the general concept; I think it will make explaining Python easier in the long run. I'm not competent to vote on the details, but I'll complain if something seems too confused to me. Currently in the Decimal class I'm working on, I can take any of the following types in the constructor: Decimal, tuple, string, int, float. I'm wondering whether that approach makes sense, that any "compatible" type should be accepted in an explicit constructor. So for your question about dict(), perhaps any sequence/iterator type that returns 2-element sequences would be be accepted. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From donb at abinitio.com Tue Jun 5 19:50:34 2001 From: donb at abinitio.com (Donald Beaudry) Date: Tue, 05 Jun 2001 13:50:34 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <200106051750.NAA25458@localhost.localdomain> Guido van Rossum wrote, > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? I like it! > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) Of course... singletons (which would also break that requirement) are quite useful. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I dont think so. Having easy access to these things might be good but since they are implementation specific it might be best to discourage their use by putting them somewhere more implementation specific, like the newmodule or even sys. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? At a minimum, I'd like to see a list of key/value tuples. I seem to find myself reconstructing dicts from the .items() of other dicts. For 'something else', I'd like to be able to pass keyword arguments to initialize the new dict. Going really crazy, I'd like to be able to pass a dict as an argument to dict()... just another way to spell copy, but combined with keywords, it would be more like copy followed by an update. > - What else? Well, since you are asking ;) I havnt read the PEP, so perhaps I shouldnt be commenting just yet, but. I'd hope that the built-in types are sub-classable from C as well as from Python. This is most interesting for types like instance, class, method, but I can imagine reasons for doing it to tuple, list, dict, and even int. > Comments? Fantastic! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From mal at lemburg.com Tue Jun 5 19:53:18 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 19:53:18 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3B1D1C8E.B7770419@lemburg.com> Guido van Rossum wrote: > > While thinking about metatypes, I had an interesting idea. > > In PEP 252 and 253 (which still need much work, please bear with me!) > I describe making classes and types more similar to each other. In > particular, you'll be able to subclass built-in object types in much > the same way as you can subclass user-defined classes today. One nice > property of classes is that a class is a factory function for its > instances; in other words, if C is a class, C() returns a C instance. > > Now, for built-in types, it makes sense to do the same. In my current > prototype, after "from types import *", DictType() returns an empty > dictionary and ListType() returns an empty list. It would be nice > take this much further: IntType() could return an integer, TupleType() > could return a tuple, StringType() could return a string, and so on. > These are immutable types, so to make this useful, these constructors > need to take an argument to specify a specific value. What should the > type of such an argument be? It's not very interesting to require > that int(x) takes an integer argument! > > Most of the popular standard types already have a constructor function > that's named after their type: > > int(), long(), float(), complex(), str(), unicode(), tuple(), list() > > We could make the constructor take the same argument(s) as the > corresponding built-in function. > > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of > > >>> int > > > you would see > > >>> int > > > but otherwise the behavior would be identical. (Note that I don't > require that a factory function returns a *new* object each time.) -1 While this looks cute, I think it would break a lot of introspection code or other code which special cases Python functions for some reason since type(int) would no longer return types.BuiltinFunctionType. If you don't like the names, why not take the change and create a new module which then exposes the Python class hierarchy (much like we did with the exceptions.py module before it was intregrated as C module) ?! > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. > > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Not really. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? As function, I'd say: take either a sequence of tuples or another dictionary as argument. mxTools already has such a function, BTW. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Tue Jun 5 20:12:09 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 13:12:09 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <15133.8441.983687.572159@beluga.mojam.com> Just catching up on a little c.l.py and I noticed the effbot's response to the Unicode degree inquiry. I tried to create and print one and got this: % python Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 Type "copyright", "credits" or "license" for more information. >>> u"\N{DEGREE SIGN}" u'\xb0' >>> print u"\N{DEGREE SIGN}" Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Shouldn't I be able to print arbitrary Unicode objects? What am I missing (this time)? Skip From mwh at python.net Tue Jun 5 20:16:52 2001 From: mwh at python.net (Michael Hudson) Date: 05 Jun 2001 19:16:52 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 13:12:09 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Just catching up on a little c.l.py and I noticed the effbot's response to > the Unicode degree inquiry. I tried to create and print one and got this: > > % python > Python 2.1.1a1 (#9, Jun 4 2001, 11:32:33) > [GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2 > Type "copyright", "credits" or "license" for more information. > >>> u"\N{DEGREE SIGN}" > u'\xb0' > >>> print u"\N{DEGREE SIGN}" > > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Shouldn't I be able to print arbitrary Unicode objects? What am I missing > (this time)? The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") ? Cheers, Skippy's little helper. -- In case you're not a computer person, I should probably point out that "Real Soon Now" is a technical term meaning "sometime before the heat-death of the universe, maybe". -- Scott Fahlman From guido at digicool.com Tue Jun 5 20:26:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:26:22 -0400 Subject: [Python-Dev] SourceForget Python Foundry needs help Message-ID: <200106051826.f55IQMS29540@odiug.digicool.com> The Python Foundry at SF could use a hand. If you're interested in helping out, please write to Chuck Esterbrook, below! --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Tue, 05 Jun 2001 14:12:07 -0400 From: Chuck Esterbrook To: guido at python.org Subject: SourceForget Python Foundry Hi Guido, I'm one of the admins of the SourceForge Python Foundry. In case you're not familiar with them, foundries are simply SF web portals centered around a particular topic. Admins can customize the HTML text and graphics and SourceForge stats are integrated on the side. I haven't had much time to give the Python Foundry the attention it deserves. I was wondering if you knew of anyone who had the inclination, time and energy to join the Foundry as an admin and expand it. If it becomes strong enough, we could possibly get it featured on the sidebar of the main SF page, which would then bring more attention to Python and its related projects. The foundry is at: http://sourceforge.net/foundry/python-foundry/ - -Chuck ------- End of Forwarded Message From barry at digicool.com Tue Jun 5 20:31:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 14:31:12 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.9584.871074.255497@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Now invoke the Zen of Python: "There should be one-- and GvR> preferably only one --obvious way to do it." So why not make GvR> these built-in functions *be* the corresponding types? Then GvR> instead of >> int GvR> GvR> you would see >> int GvR> +1 GvR> but otherwise the behavior would be identical. (Note that I GvR> don't require that a factory function returns a *new* object GvR> each time.) GvR> If we did this for all built-in types, we'd have to add maybe GvR> a dozen new built-in names -- I think that's no big deal and GvR> actually helps naming types. The types module, with its GvR> awkward names and usage, can be deprecated. I'm a little concerned about this, since the names that would be added are probably in common use as variable and/or argument names. I.e. At one point `list' was a very common identifier in Mailman, and I'm sure `dict' is used quite often still. I guess this would be okay as long as working code doesn't break because of it. OTOH, I've had fewer needs for a dict builtin (though not non-zero), and easily zero needs for traceback objects, code objects, etc. GvR> There are details to be worked out, e.g. GvR> - Do we really want to have built-in names for code objects, GvR> traceback objects, and other figments of Python's internal GvR> workings? I'd say no. However, we could probably C-ify the types module, a la, the exceptions module, and that would be the logical place to put the type factories. GvR> - What should the argument to dict() be? A list of (key, GvR> value) pairs, a list of alternating keys and values, or GvR> something else? You definitely want to at least accept a sequence of key/value 2-tuples, so that d.items() can be retransformed into a dictionary object. -Barry From guido at digicool.com Tue Jun 5 20:38:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 14:38:23 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 14:31:12 EDT." <15133.9584.871074.255497@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> Message-ID: <200106051838.f55IcNk29624@odiug.digicool.com> > I'm a little concerned about this, since the names that would be added > are probably in common use as variable and/or argument names. I.e. At > one point `list' was a very common identifier in Mailman, and I'm sure > `dict' is used quite often still. I guess this would be okay as long > as working code doesn't break because of it. It would be hard to see how this would break code, since built-ins are searched *after* all variables that the user defines. --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Tue Jun 5 20:46:04 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Tue, 05 Jun 2001 18:46:04 GMT Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <3b1d2894.16564838@smtp.worldonline.dk> [Guido] >Now invoke the Zen of Python: "There should be one-- and preferably >only one --obvious way to do it." So why not make these built-in >functions *be* the corresponding types? Then instead of > > >>> int > > >you would see > > >>> int > > >but otherwise the behavior would be identical. (Note that I don't >require that a factory function returns a *new* object each time.) I think that it will be difficult to avoid creating a new object under jython because calling a type already directly calls the type's java constructor. >If we did this for all built-in types, we'd have to add maybe a dozen >new built-in names -- I think that's no big deal and actually helps >naming types. The types module, with its awkward names and usage, can >be deprecated. > >There are details to be worked out, e.g. > >- Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? > >- What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? Jython already interprets the arguments to the dict type as alternating key/values: >>> from types import DictType as dict >>> dict('a', 97, 'b', 98, 'c', 99) {'b': 98, 'a': 97, 'c': 99} >>> This behaviour isn't documented on the python side so it can be changed. However, it it is necessary to maintain this API on the java side and we have currently no way to prevent the type constructors from being visible and callable from python. Whatever is decided, I hope jython can keep the current semantics of its dict type. regards, finn From fdrake at acm.org Tue Jun 5 21:11:58 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 5 Jun 2001 15:11:58 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3b1d2894.16564838@smtp.worldonline.dk> References: <200106051721.f55HLW729400@odiug.digicool.com> <3b1d2894.16564838@smtp.worldonline.dk> Message-ID: <15133.12030.538647.295809@cj42289-a.reston1.va.home.com> Finn Bock writes: > >>> from types import DictType as dict > >>> dict('a', 97, 'b', 98, 'c', 99) > {'b': 98, 'a': 97, 'c': 99} > >>> > > This behaviour isn't documented on the python side so it can be changed. > However, it it is necessary to maintain this API on the java side and we > have currently no way to prevent the type constructors from being > visible and callable from python. This should not be a problem: If dict() is called with one arg, the new semantics can be used, but with an odd number of args, your existing semantics can be used. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Tue Jun 5 21:23:54 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 14:23:54 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> Message-ID: <15133.12746.666351.127286@beluga.mojam.com> Me> [what am I missing?] Michael> The encoding: >>> print u"\N{DEGREE SIGN}".encode("latin1") ? Hmmm... I don't believe I've ever encountered an object in Python before that you couldn't simply print. Are Unicode objects unique in this respect? Seems like a bug (or at least a feature) to me. Skip From mwh at python.net Tue Jun 5 21:31:33 2001 From: mwh at python.net (Michael Hudson) Date: 05 Jun 2001 20:31:33 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: Skip Montanaro's message of "Tue, 5 Jun 2001 14:23:54 -0500" References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: Skip Montanaro writes: > Me> [what am I missing?] > > Michael> The encoding: > > >>> print u"\N{DEGREE SIGN}".encode("latin1") > ? > > Hmmm... I don't believe I've ever encountered an object in Python before > that you couldn't simply print. Are Unicode objects unique in this respect? > Seems like a bug (or at least a feature) to me. Well, what would you have >>> print u"\N{DEGREE SIGN}" (or equivalently str(u"\N{DEGREE SIGN}") since we're eventually going to have to stuff an 8-bit string down stdout) do? I don't think >>> print u"\N{DEGREE SIGN}" u'\xb0' is really an option. This is old news. It must have been discussed here before 1.6, I'd have thought. Cheers, M. -- 58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From barry at digicool.com Tue Jun 5 21:46:54 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 15:46:54 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> Message-ID: <15133.14126.221568.235269@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm a little concerned about this, since the names that would >> be added are probably in common use as variable and/or argument >> names. I.e. At one point `list' was a very common identifier >> in Mailman, and I'm sure `dict' is used quite often still. I >> guess this would be okay as long as working code doesn't break >> because of it. GvR> It would be hard to see how this would break code, since GvR> built-ins are searched *after* all variables that the user GvR> defines. Wasn't there talk about issuing warnings for locals shadowing built-ins (or was that globals?). If not, fergitaboutit. If so, that would fall under the category of "breaking". -Barry From tim at digicool.com Tue Jun 5 21:56:59 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 15:56:59 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: Just to reduce this to its most trivial point , > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? the middle one (perhaps generalized to "iterable object alternately producing keys and values") is most useful in practice. Perl gets a lot of mileage of that, e.g. think of using re.findall() to build a list of mail-header field, value, field, value, ... thingies to feed to a dict. A list of (key, value) pairs is prettiest, but almost nothing *produces* such a list except for dict.items(); we don't need another way to spell dict.copy(). From guido at digicool.com Tue Jun 5 21:56:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 15:56:05 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 15:46:54 EDT." <15133.14126.221568.235269@anthem.wooz.org> References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> Message-ID: <200106051956.f55Ju5130078@odiug.digicool.com> > >>>>> "GvR" == Guido van Rossum writes: > > >> I'm a little concerned about this, since the names that would > >> be added are probably in common use as variable and/or argument > >> names. I.e. At one point `list' was a very common identifier > >> in Mailman, and I'm sure `dict' is used quite often still. I > >> guess this would be okay as long as working code doesn't break > >> because of it. > > GvR> It would be hard to see how this would break code, since > GvR> built-ins are searched *after* all variables that the user > GvR> defines. > > Wasn't there talk about issuing warnings for locals shadowing > built-ins (or was that globals?). If not, fergitaboutit. If so, that > would fall under the category of "breaking". > > -Barry You may be thinking of this: >>> def f(int): def g(): int :1: SyntaxWarning: local name 'int' in 'f' shadows use of 'int' as global in nested scope 'g' >>> This warns you when you override a built-in or global *and* you use that same name in a nested function. This code will mean something different in 2.2 anyway (g's reference to int will become a reference to f's int because of nested scopes). But this does not cause a warning: >>> def g(): int = 12 >>> Nor does this: >>> int = 12 >>> So we're safe. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Jun 5 22:01:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:01:47 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> Message-ID: <15133.15019.237484.605267@beluga.mojam.com> Michael> Well, what would you have >>>> print u"\N{DEGREE SIGN}" Michael> (or equivalently Michael> str(u"\N{DEGREE SIGN}") Michael> since we're eventually going to have to stuff an 8-bit string Michael> down stdout) do? How about if print calls the .encode("latin1") method for me it gets an ASCII encoding error? If "latin1" isn't a reasonable default choice, it could pick an encoding based on the current locale. Michael> I don't think >>>> print u"\N{DEGREE SIGN}" Michael> u'\xb0' Michael> is really an option. I agree. I'd like to see a little circle. Michael> This is old news. It must have been discussed here before 1.6, Michael> I'd have thought. Perhaps, but I suspect many people suffered from glazing over of the eyes reading all that the messages exchanged about Unicode arcana. I know I did. Skip From barry at digicool.com Tue Jun 5 22:01:29 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 16:01:29 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <15133.9584.871074.255497@anthem.wooz.org> <200106051838.f55IcNk29624@odiug.digicool.com> <15133.14126.221568.235269@anthem.wooz.org> <200106051956.f55Ju5130078@odiug.digicool.com> Message-ID: <15133.15001.19308.108288@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> You may be thinking of this: Yup. GvR> So we're safe. Cool! Count me as a solid +1 then. -Barry From aahz at rahul.net Tue Jun 5 22:10:06 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 5 Jun 2001 13:10:06 -0700 (PDT) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <15133.15019.237484.605267@beluga.mojam.com> from "Skip Montanaro" at Jun 05, 2001 03:01:47 PM Message-ID: <20010605201006.15CAD99C83@waltz.rahul.net> Skip Montanaro wrote: > > Perhaps, but I suspect many people suffered from glazing over of the eyes > reading all that the messages exchanged about Unicode arcana. I know I did. Ditto. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From mal at lemburg.com Tue Jun 5 22:14:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:14:39 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> Message-ID: <3B1D3DAF.DAE727AE@lemburg.com> > > [Guido] > > Now invoke the Zen of Python: "There should be one-- and preferably > > only one --obvious way to do it." So why not make these built-in > > functions *be* the corresponding types? Then instead of > > > > >>> int > > > > > > you would see > > > > >>> int > > > > > > but otherwise the behavior would be identical. (Note that I don't > > require that a factory function returns a *new* object each time.) > > -1 > > While this looks cute, I think it would break a lot of introspection > code or other code which special cases Python functions for > some reason since type(int) would no longer return > types.BuiltinFunctionType. > > If you don't like the names, why not take the change and > create a new module which then exposes the Python class hierarchy > (much like we did with the exceptions.py module before it was > intregrated as C module) ?! Looks like I'm alone with my uncertain feeling about this move... oh well. BTW, we should consider having more than one contructor for an object rather than trying to stuff all possible options and parameters into one overloaded super-constructor. I've done this in many of my mx extensions and have so far had great success with it (better programming error detection, better docs, more intuitive interfaces, etc.). In that sense, more than one way to do something will actually help clarify what the programmer really wanted. Just a thought... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue Jun 5 22:16:02 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 22:16:02 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> Message-ID: <3B1D3E02.3C9AE1F4@lemburg.com> Skip Montanaro wrote: > > Michael> Well, what would you have > > >>>> print u"\N{DEGREE SIGN}" > > Michael> (or equivalently > > Michael> str(u"\N{DEGREE SIGN}") > > Michael> since we're eventually going to have to stuff an 8-bit string > Michael> down stdout) do? > > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. Please see Lib/site.py for details on how to enable all these goodies -- it's all there, just disabled and meant for super-users only ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Tue Jun 5 22:22:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 16:22:43 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 22:14:39 +0200." <3B1D3DAF.DAE727AE@lemburg.com> References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> Message-ID: <200106052022.f55KMhq30227@odiug.digicool.com> > > -1 > > > > While this looks cute, I think it would break a lot of introspection > > code or other code which special cases Python functions for > > some reason since type(int) would no longer return > > types.BuiltinFunctionType. > > Looks like I'm alone with my uncertain feeling about this move... > oh well. Well, I don't see how someone could be doing introspection on int and be confused when it's not a function -- either you (think you) know it's a function, so you use it as a function without introspecting it, and that continues to work; or you're open to all possibilities, and then you'll introspect it, and then you'll discover what it is. > BTW, we should consider having more than one contructor for an > object rather than trying to stuff all possible options and parameters > into one overloaded super-constructor. I've done this in many of > my mx extensions and have so far had great success with it (better > programming error detection, better docs, more intuitive interfaces, > etc.). In that sense, more than one way to do something will > actually help clarify what the programmer really wanted. Just > a thought... Yes, but the other ways are spelled as factory functions. Maybe, *maybe* the other factory functions could be class-methods, but don't hold your hopes high. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Tue Jun 5 22:30:18 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 5 Jun 2001 22:30:18 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: <200106052030.f55KUIu02762@mira.informatik.hu-berlin.de> > How about if print calls the .encode("latin1") method for me it gets an > ASCII encoding error? If "latin1" isn't a reasonable default choice, it > could pick an encoding based on the current locale. These are both bad ideas. First, there is no guarantee that your terminal is capable of displaying the circle at all. Maybe the typewriter connected to your computer doesn't even have a degree type. Further, maybe it does support displaying the degree sign, but then it likely fails for >>> print u"\N{EURO SIGN}" Or, worse, instead of displaying the EURO SIGN, it may just display the CURRENCY SIGN (since it may chose to use ISO-8859-15, but the terminal assumes ISO-8859-1). So unless you can come up with a really good way to find out what the terminal is capable of displaying (plus finding out how to make it display these things), I think Python is better off raising an exception than producing garbage output. In addition, what you see is the "default encoding", i.e. it doesn't just apply to print; it also applies to all places where Unicode objects are converted into byte strings. Assuming any default other than ASCII has been considered as a bad idea by the authors of the Unicode support. IMO, the next-most reasonable default would have been UTF-8, *not* Latin-1, since UTF-8 can represent the EURO SIGN and every other character in Unicode. Most likely, you terminal will have difficulties producing a circle symbol when it gets the UTF-8 representation of the DEGREE SIGN, though. So the best thing is still to give it into the hands of the application author. As MAL points out, the administrator can give a different default encoding in site.py. Since the default default is ASCII, applications assuming that the default is ASCII won't break on your system. OTOH, applications developed on your system may then break elsewhere, since the default in site.py might be different. Regards, Martin From sdm7g at Virginia.EDU Tue Jun 5 22:41:11 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Tue, 5 Jun 2001 16:41:11 -0400 (EDT) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: On Tue, 5 Jun 2001, Guido van Rossum wrote: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? I would say to put all of the common constructors in __builtin__, and all of the odd ducks can go into the new module. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? A varargs list of (key,value) tuples would probably be most useful. Since most of these functions, before being classed as constructors, were considered coercion function, I wouldn't be against having it try to do something sensible with a variety of args. -- sdm From skip at pobox.com Tue Jun 5 22:47:17 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:47:17 -0500 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1D3E02.3C9AE1F4@lemburg.com> References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> Message-ID: <15133.17749.390756.115544@beluga.mojam.com> mal> Please see Lib/site.py for details on how to enable all these mal> goodies -- it's all there, just disabled and meant for super-users mal> only ;-) Okay, I found the encoding section. I changed the encoding variable assignment to be encoding = "latin1" and now the degree sign print works. What other side-effects will that have besides on printed representations? It appears I can create (but not see properly?) variable names containing latin1 characters: >>> ?mlaut = "?mlaut" >>> print locals().keys() ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] I am having trouble printing some strings containing latin1 characters: >>> print ?mlaut mlaut >>> type("?mlaut") >>> type(string.letters) >>> print "?mlaut" mlaut >>> print string.letters abcdefghijklmnopqrstuvwxyz?????????????????????????????????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? >>> print string.letters[55:] ????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? The above was pasted from Python running in a shell session in XEmacs, which is certainly latin1-aware. Why did I have trouble seeing the ? in some situations, but not in others? Are the ramifications of all this encoding stuff documented somewhere? Skip From skip at pobox.com Tue Jun 5 22:56:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 5 Jun 2001 15:56:58 -0500 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <15133.18330.910736.249838@beluga.mojam.com> Is the intent of using int and friends as constructors instead of just coercion functions that I should (eventually) be able to do this: class NonNegativeInt(int): def __init__(self, val): if int(val) < 0: raise ValueError, "Value must be >= 0" int.__init__(self, val) self.a = 47 ... ? Skip From tim at digicool.com Tue Jun 5 23:01:23 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:01:23 -0400 Subject: [Python-Dev] another dict crasher Message-ID: [Tim's dict-crasher dies w/ a stack overflow, but with a KeyError when he sticks a print inside __eq__] OK, I understand this now, at least on Windows. In PyObject_Print(), #ifdef USE_STACKCHECK if (PyOS_CheckStack()) { PyErr_SetString(PyExc_MemoryError, "stack overflow"); return -1; } #endif On Windows, PyOs_CheckStack() is __try { /* _alloca throws a stack overflow exception if there's not enough space left on the stack */ _alloca(PYOS_STACK_MARGIN * sizeof(void*)); return 0; } __except (EXCEPTION_EXECUTE_HANDLER) { /* just ignore all errors */ } return 1; The _alloca dies, so the __except falls thru and PyOs_CheckStack returns 1. PyObject_Print sets the "stack overflow" error and returns -1. This winds its way thru the rich comparison attempt, until lookdict() sees it and says, Hmm. I can't compare this thing without raising error. So this can't be the key I'm looking for. First I'll clear the error. Hmm. Can't find it anywhere else in the dict either. Hmm. There were no errors pending at the time I got called, so I'll leave things that way and return "not found". At that point about 15,000 levels of recursion unwind, and KeyError gets raised. I don't believe PyOS_CheckStack() is implemented on Unixoid systems (just Windows and Macs), so some other accident must account for the KeyError on Linux. Remains unclear what to do about it; the idea that all errors raised by dict lookup comparisons are ignorable is sure a tempting target. From mal at lemburg.com Tue Jun 5 23:00:23 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 05 Jun 2001 23:00:23 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1D4866.A40AAB1C@lemburg.com> Skip Montanaro wrote: > > mal> Please see Lib/site.py for details on how to enable all these > mal> goodies -- it's all there, just disabled and meant for super-users > mal> only ;-) > > Okay, I found the encoding section. I changed the encoding variable > assignment to be > > encoding = "latin1" > > and now the degree sign print works. What other side-effects will that have > besides on printed representations? It appears I can create (but not see > properly?) variable names containing latin1 characters: > > >>> ?mlaut = "?mlaut" Huh ? That should not be possible ! Python literals are still ASCII. >>> ?mlaut = '?mlaut' File "", line 1 ?mlaut = '?mlaut' ^ SyntaxError: invalid syntax > >>> print locals().keys() > ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help'] > > I am having trouble printing some strings containing latin1 characters: > > >>> print ?mlaut > mlaut > >>> type("?mlaut") > > >>> type(string.letters) > > >>> print "?mlaut" > mlaut > >>> print string.letters > abcdefghijklmnopqrstuvwxyz?????????????????????????????????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? > >>> print string.letters[55:] > ????ABCDEFGHIJKLMNOPQRSTUVWXYZ?????????????????????????????? > > The above was pasted from Python running in a shell session in XEmacs, which > is certainly latin1-aware. Why did I have trouble seeing the ? in some > situations, but not in others? No idea what's going on there... the encoding parameter should not have any effect on printing normal 8-bit strings. It only defines the standard encoding used in coercion and auto-conversion from Unicode to 8-bit strings and vice-versa. > Are the ramifications of all this encoding stuff documented somewhere? The basic things can be found in Misc/unicode.txt, on the i18n sig page and some resources on the web. I'll give a talk in Bordeaux about Unicode too, which will probably provide some additional help as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Tue Jun 5 23:14:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 05 Jun 2001 17:14:07 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Your message of "Tue, 05 Jun 2001 16:59:01 EDT." References: Message-ID: <200106052114.f55LE7P30481@odiug.digicool.com> > Is the intent of using int and friends as constructors instead of just > coercion functions that I should (eventually) be able to do this: > > class NonNegativeInt(int): > def __init__(self, val): > if int(val) < 0: > raise ValueError, "Value must be >= 0" > int.__init__(self, val) > self.a = 47 > ... > > ? Yes, sort-of. The details will be slightly different. I'm not comfortable with letting a user-provided __init__() method change the value of self, so I am brooding on a work-around that separates allocation and one-time initialization from __init__(). Watch PEP 253. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim at digicool.com Tue Jun 5 23:16:03 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 5 Jun 2001 17:16:03 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? Message-ID: [MAL, to Skip] > Huh ? That should not be possible ! Python literals are still > ASCII. > > >>> ?mlaut = '?mlaut' > File "", line 1 > ?mlaut = '?mlaut' > ^ > SyntaxError: invalid syntax That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug . From gward at python.net Wed Jun 6 00:29:49 2001 From: gward at python.net (Greg Ward) Date: Tue, 5 Jun 2001 18:29:49 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <200106051721.f55HLW729400@odiug.digicool.com>; from guido@digicool.com on Tue, Jun 05, 2001 at 01:21:32PM -0400 References: <200106051721.f55HLW729400@odiug.digicool.com> Message-ID: <20010605182949.A7545@gerg.ca> On 05 June 2001, Guido van Rossum said: > Now invoke the Zen of Python: "There should be one-- and preferably > only one --obvious way to do it." So why not make these built-in > functions *be* the corresponding types? Then instead of +1 from me too. > If we did this for all built-in types, we'd have to add maybe a dozen > new built-in names -- I think that's no big deal and actually helps > naming types. The types module, with its awkward names and usage, can > be deprecated. Cool! > There are details to be worked out, e.g. > > - Do we really want to have built-in names for code objects, traceback > objects, and other figments of Python's internal workings? Probably not, as long as they are accessible somewhere. I could live with either a C-ified 'types' module or shoving these into the 'new' module, although I think I prefer the latter slightly. > - What should the argument to dict() be? A list of (key, value) > pairs, a list of alternating keys and values, or something else? I love /F's suggestion dict(k=v, k=v, ...) but that's icing on the cake -- cool feature, looks pretty, etc. (And *finally* Python will have all the syntactic sugar that Perl programmers like to have. ;-) I think the real answer should be dict(k, v, k, v) like Jython. If both can be supported, that would be swell. Greg -- Greg Ward - Linux geek gward at python.net http://starship.python.net/~gward/ Does your DRESSING ROOM have enough ASPARAGUS? From barry at digicool.com Wed Jun 6 00:45:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 5 Jun 2001 18:45:00 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <15133.24812.791796.557452@anthem.wooz.org> >>>>> "GW" == Greg Ward writes: GW> I love /F's suggestion GW> dict(k=v, k=v, ...) One problem with this syntax is that the `k's can only be valid Python identifiers, so you'd at least need /some/ other syntax to support construction with arbitrary hashable keys. -Barry From fredrik at pythonware.com Wed Jun 6 00:57:43 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 6 Jun 2001 00:57:43 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> Message-ID: <011f01c0ee12$eeda9ba0$0900a8c0@spiff> greg wrote: > > - What should the argument to dict() be? A list of (key, value) > > pairs, a list of alternating keys and values, or something else? > > I love /F's suggestion > > dict(k=v, k=v, ...) > > but that's icing on the cake -- cool feature, looks pretty, etc. note that the python interpreter builds that dictionary for you if you use the METH_KEYWORDS flag... > I think the real answer should be > > dict(k, v, k, v) > > like Jython. given that Jython already gives a meaning to dict with more than one argument, I suggest: dict(d) # consistency dict(k, v, k, v, ...) # jython compatibility dict(*[k, v, k, v, ...]) # convenience dict(k=v, k=v, ...) # common pydiom and maybe: dict(d.items()) # symmetry > If both can be supported, that would be swell. how about: if (PyTuple_GET_SIZE(args)) { assert PyDict_GET_SIZE(kw) == 0 if (PyTuple_GET_SIZE(args) == 1) { args = PyTuple_GET_ITEM(args, 0); if (PyDict_Check(args)) dict = args.copy() else if (PySequence_Check(args)) dict = {} for k, v in args: dict[k] = v } else { assert (PySequence_Size(args) & 0) == 0 # maybe dict = {} for i in range(len(args)): dict[args[i]] = args[i+1] } } else { assert PyDict_GET_SIZE(kw) > 0 # probably dict = kw } From MarkH at ActiveState.com Wed Jun 6 01:13:27 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 6 Jun 2001 09:13:27 +1000 Subject: [Python-Dev] Happy event In-Reply-To: <20010605144110.DD90C99C84@waltz.rahul.net> Message-ID: [Paul] > > Actually girls are fine until about 13, after that I expect Jeremy > > won't be too responsive. Something about hormones and such. As a father of a 14 year old girl, I can relate to that!! [Aahz] > Are you trying to imply that there's a difference between girls and > boys? It would seem a safe assumption that you are not a parent of a teenager. :) Mark. From gward at python.net Wed Jun 6 03:03:33 2001 From: gward at python.net (Greg Ward) Date: Tue, 5 Jun 2001 21:03:33 -0400 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <011f01c0ee12$eeda9ba0$0900a8c0@spiff>; from fredrik@pythonware.com on Wed, Jun 06, 2001 at 12:57:43AM +0200 References: <200106051721.f55HLW729400@odiug.digicool.com> <20010605182949.A7545@gerg.ca> <011f01c0ee12$eeda9ba0$0900a8c0@spiff> Message-ID: <20010605210333.B7687@gerg.ca> On 06 June 2001, Fredrik Lundh said: > given that Jython already gives a meaning to dict with more > than one argument, I suggest: > > dict(d) # consistency > dict(k, v, k, v, ...) # jython compatibility > dict(*[k, v, k, v, ...]) # convenience > dict(k=v, k=v, ...) # common pydiom Yikes. I still think that #2 is the "essential" spelling. I think Tim was speaking of #1 when he said we don't need another way to spell copy() -- I'm inclined to agree. I think the fact that you can say int(3) or str("foo") are not strong arguments in favour of dict({...}), because of mutability, because of the overhead of dicts, because we already have the copy module, maybe other factors as well. > and maybe: > > dict(d.items()) # symmetry I think this is massive overloading. Two interfaces to a single function ought to be enough. I for one have long wished for syntactic sugar like Perl's => operator, which lets you do this: %band = { geddy => "bass", alex => "guitar", neil => "drums" } ...and keyword arg syntax is really the natural thing here. Being able to say band = dict(geddy="bass", alex="guitar", neil="drums") would be good enough for me. And it's less mysterious than Perl's =>, which is just a magic comma that forces its LHS to be interpreted as a string. Weird. Greg -- Greg Ward - Linux geek gward at python.net http://starship.python.net/~gward/ If you and a friend are being chased by a lion, it is not necessary to outrun the lion. It is only necessary to outrun your friend. From mal at lemburg.com Wed Jun 6 10:03:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 10:03:13 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1DE3C1.90BA3DD6@lemburg.com> Tim Peters wrote: > > [MAL, to Skip] > > Huh ? That should not be possible ! Python literals are still > > ASCII. > > > > >>> ?mlaut = '?mlaut' > > File "", line 1 > > ?mlaut = '?mlaut' > > ^ > > SyntaxError: invalid syntax > > That was Guido's intent, and what the Ref Man says, but the tokenizer uses > C's isalpha() so in reality it's locale-dependent. I think at least one > German on Python-Dev has already threatened to kill him if he ever fixes > this bug . Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Wed Jun 6 13:24:32 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:24:32 +0200 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: Message by "Eric S. Raymond" , Mon, 4 Jun 2001 17:19:08 -0400 , <20010604171908.A21831@thyrsus.com> Message-ID: <20010606112432.C4A43303181@snelboot.oratrix.nl> The early microcomputers (8008, 6800, 6502) are actually a lot more like the PDP-8 than the PDP-11: a single (or possibly double) accumulator register and a few special purpose registers hardwired to various instructions. The 68000, Z8000 and NS16032 were the first true successors of the PDP-11, sharing (to an extent) the unique characteristics of it's design with general purpose registers (with even SP and PC being general purpose registers with only very little magic attached to them) and an orthogonal design. The 68000 still had lots of little quirks in the instruction set, the latter two actually improved on the PDP-11 set (where a couple of instructions like XOR would only work with register-destination because it was added to the design in a stage where there weren't enough bits left in the instruction space, I guess). And the 8086 was just a souped-up 8080/8008: each register had a different function, no orthogonality, etc. Intel didn't get it "right" until the 386 32-bit instruction set (and even there some of the old baggage can still be seen). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Wed Jun 6 13:39:56 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 06 Jun 2001 13:39:56 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: Message by "Fredrik Lundh" , Tue, 5 Jun 2001 19:34:35 +0200 , <001301c0ede5$cb804a10$e46940d5@hagrid> Message-ID: <20010606113957.4A395303181@snelboot.oratrix.nl> For the dictionary initializer I would definitely want to be able to give an object that adheres to the dictionary protocol, so that I can to things like import anydbm f = anydbm.open("foo", "r") incore = dict(f) Hmm, I guess this goes for most types: list() and tuple() should take any iterable object, etc. The one question is what "dictionary protocol" mean. Should it support items()? Is only x.keys()/x[] good enough? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mal at lemburg.com Wed Jun 6 20:36:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 20:36:48 +0200 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? References: <200106051721.f55HLW729400@odiug.digicool.com> <3B1D1C8E.B7770419@lemburg.com> <3B1D3DAF.DAE727AE@lemburg.com> <200106052022.f55KMhq30227@odiug.digicool.com> Message-ID: <3B1E7840.C93EA788@lemburg.com> Guido van Rossum wrote: > > > > -1 > > > > > > While this looks cute, I think it would break a lot of introspection > > > code or other code which special cases Python functions for > > > some reason since type(int) would no longer return > > > types.BuiltinFunctionType. > > > > Looks like I'm alone with my uncertain feeling about this move... > > oh well. > > Well, I don't see how someone could be doing introspection on int and > be confused when it's not a function -- either you (think you) know > it's a function, so you use it as a function without introspecting it, > and that continues to work; or you're open to all possibilities, and > then you'll introspect it, and then you'll discover what it is. Ok, let's put it in another way: The point is that your are changing the type of very basic building parts in Python and that is likely to cause failure in places which will most likely be hard to find to fix. Becides we don't really gain anything from replacing builtin functions with classes (to the contrary: we lose some, since we can no longer use the function call optimizations for builtins and have to go through all the generic call mechanism code instead). Also, have you considered the effects this has on restricted execution mode ? What will happen if someone replaces the builtins with special versions which hide some security relevant objects, e.g. open() is a prominent candidate for this. Why not put the type objects into a separate module instead of reusing the builtins ? > > BTW, we should consider having more than one contructor for an > > object rather than trying to stuff all possible options and parameters > > into one overloaded super-constructor. I've done this in many of > > my mx extensions and have so far had great success with it (better > > programming error detection, better docs, more intuitive interfaces, > > etc.). In that sense, more than one way to do something will > > actually help clarify what the programmer really wanted. Just > > a thought... > > Yes, but the other ways are spelled as factory functions. Maybe, > *maybe* the other factory functions could be class-methods, but don't > hold your hopes high. No... why make things complicated when simple functions work just fine as factories. Multilpe constructors on a class would make subclassing a pain... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Wed Jun 6 21:00:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 12:00:07 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <15133.8441.983687.572159@beluga.mojam.com> <15133.12746.666351.127286@beluga.mojam.com> <15133.15019.237484.605267@beluga.mojam.com> <3B1D3E02.3C9AE1F4@lemburg.com> <15133.17749.390756.115544@beluga.mojam.com> Message-ID: <3B1E7DB7.408BC089@ActiveState.com> Skip Montanaro wrote: > >... > > Okay, I found the encoding section. I changed the encoding variable > > assignment to be > > encoding = "latin1" Danger, Will Robinson! You can now write software that will work great on your version of Python and will crash on everyone else's. You haven't just changed the behavior of "print" but of EVERY attempted automatic coercion from Unicode to an 8-bit string. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim.one at home.com Wed Jun 6 21:27:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 6 Jun 2001 15:27:59 -0400 Subject: [Python-Dev] -U option? Message-ID: http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 python -U breaks import with 2.1 Anyone understand -U? Like, should it work, why is it there if it doesn't and isn't expected to, and are there docs for it beyond the "python -h" blurb? Last mention of it I found in c.l.py was """ Date: Tue, 06 Feb 2001 16:09:46 +0100 From: "M.-A. Lemburg" Subject: Re: [Python-Dev] Pre-PEP: Python Character Model ... Well, with -U on, Python will compile "" into u"", ... last I tried, Python didn't even start up :-( ... """ An earlier msg (08 Sep 2000) said: """ Note that many thing fail when Python is started with -U... that switch was introduced to be able to get an idea of which parts of the standard fail to work in a mixed string/Unicode environment. """ If this is just an internal development switch, python -h probably shouldn't advertise it. From barry at digicool.com Wed Jun 6 21:37:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 6 Jun 2001 15:37:26 -0400 Subject: [Python-Dev] -U option? References: Message-ID: <15134.34422.62060.936788@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Anyone understand -U? Like, should it work, why is it there TP> if it doesn't and isn't expected to, and are there docs for it TP> beyond the "python -h" blurb? Nope, except that /for me/ an installed Python 2.1 seems to start up just fine with -U. My uninstalled (i.e. run from the source tree) 2.2a0 fails when given -U: @anthem[[~/projects/python:1068]]% ./python Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1069]]% ./python -U 'import site' failed; use -v for traceback Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> @anthem[[~/projects/python:1070]]% ./python -U -v # ./Lib/site.pyc matches ./Lib/site.py import site # precompiled from ./Lib/site.pyc # ./Lib/os.pyc matches ./Lib/os.py import os # precompiled from ./Lib/os.pyc import posix # builtin # ./Lib/posixpath.pyc matches ./Lib/posixpath.py import posixpath # precompiled from ./Lib/posixpath.pyc # ./Lib/stat.pyc matches ./Lib/stat.py import stat # precompiled from ./Lib/stat.pyc # ./Lib/UserDict.pyc matches ./Lib/UserDict.py import UserDict # precompiled from ./Lib/UserDict.pyc 'import site' failed; traceback: Traceback (most recent call last): File "./Lib/site.py", line 91, in ? from distutils.util import get_platform ImportError: No module named distutils.util Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> # clear __builtin__._ # clear sys.path # clear sys.argv # clear sys.ps1 # clear sys.ps2 # clear sys.exitfunc # clear sys.exc_type # clear sys.exc_value # clear sys.exc_traceback # clear sys.last_type # clear sys.last_value # clear sys.last_traceback # restore sys.stdin # restore sys.stdout # restore sys.stderr # cleanup __main__ # cleanup[1] signal # cleanup[1] site # cleanup[1] posix # cleanup[1] exceptions # cleanup[2] stat # cleanup[2] posixpath # cleanup[2] UserDict # cleanup[2] os # cleanup sys # cleanup __builtin__ # cleanup ints: 1 unfreed int in 1 out of 3 blocks # cleanup floats -Barry From mal at lemburg.com Wed Jun 6 22:27:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 22:27:19 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1E9227.7F67971E@lemburg.com> Tim Peters wrote: > > http://sf.net/tracker/?func=detail&atid=105470&aid=430269&group_id=5470 > python -U breaks import with 2.1 > > Anyone understand -U? Like, should it work, why is it there if it doesn't > and isn't expected to, and are there docs for it beyond the "python -h" > blurb? The -U option is there to be able to test drive Python into the Unicode age. As you and many others have noted, there's still a long way to go... > Last mention of it I found in c.l.py was > > """ > Date: Tue, 06 Feb 2001 16:09:46 +0100 > From: "M.-A. Lemburg" > Subject: Re: [Python-Dev] Pre-PEP: Python Character Model > > ... > Well, with -U on, Python will compile "" into u"", > ... > last I tried, Python didn't even start up :-( > ... > """ > > An earlier msg (08 Sep 2000) said: > > """ > Note that many thing fail when Python is started with -U... that > switch was introduced to be able to get an idea of which parts of > the standard fail to work in a mixed string/Unicode environment. > """ > > If this is just an internal development switch, python -h probably shouldn't > advertise it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Wed Jun 6 22:34:30 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 6 Jun 2001 22:34:30 +0200 Subject: [Python-Dev] -U option? Message-ID: <200106062034.f56KYUI02246@mira.informatik.hu-berlin.de> [Tim] > Anyone understand -U? Like, shoulQd it work, why is it there if it > doesn't and isn't expected to, and are there docs for it beyond the > "python -h" blurb? I'm not surprised it doesn't work, but I think it could be made working in many cases. I also think it would be worthwhile making that work; in the process, many places will be taught to accept Unicode strings which currently don't. [Barry] > Nope, except that /for me/ an installed Python 2.1 seems to start up > just fine with -U. [...] Sure, but it won't work martin at mira:~ > python -U [22:29] Python 2.2a0 (#336, May 29 2001, 09:28:57) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import string Traceback (most recent call last): File "", line 1, in ? ImportError: No module named string >>> import sys >>> sys.path ['', u'/usr/src/omni/lib/python', u'/usr/src/omni/lib/i586_linux_2.0_glibc2.1', u'/usr/ilu-2.0b1/lib', u'/home/martin', u'/usr/local/lib/python2.2', u'/usr/local/lib/python2.2/plat-linux2', u'/usr/local/lib/python2.2/lib-tk', u'/usr/local/lib/python2.2/lib-dynload', u'/usr/local/lib/python2.2/site-packages', u'/usr/local/lib/site-python'] The main problem (also with the SF bug report) seems to be that Unicode objects in sys.path are not accepted, but I think they should. Regards, Martin From tim.one at home.com Wed Jun 6 22:52:02 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 6 Jun 2001 16:52:02 -0400 Subject: [Python-Dev] -U option? In-Reply-To: <3B1E9227.7F67971E@lemburg.com> Message-ID: [MAL] > The -U option is there to be able to test drive Python into > the Unicode age. As you and many others have noted, there's > still a long way to go... That's cool. My question is why we're advertising (via -h) an option that end users have no chance of using successfully. From mal at lemburg.com Wed Jun 6 23:47:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 06 Jun 2001 23:47:25 +0200 Subject: [Python-Dev] -U option? References: Message-ID: <3B1EA4ED.38BEB1AA@lemburg.com> Tim Peters wrote: > > [MAL] > > The -U option is there to be able to test drive Python into > > the Unicode age. As you and many others have noted, there's > > still a long way to go... > > That's cool. My question is why we're advertising (via -h) an option that > end users have no chance of using successfully. I guess I just added the flag to the -h message without thinking much about it... it was added in some alpha release. Anyway, these bug reports will keep hitting us which is good in the sense that it'll eventually push Python into the Unicode arena. We could need some funding for this, though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Thu Jun 7 01:00:52 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 16:00:52 -0700 Subject: [Python-Dev] urllib2 Message-ID: <3B1EB624.563DABE0@ActiveState.com> Tim asked me to look into test_urllib2 failure. I notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>: http://www.faqs.org/rfcs/rfc1738.html Anyhow, there are a variety of problems. :( First, test_urllib2 says: file_url = "file://%s" % urllib2.__file__ This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it. So that's problem #1, we aren't compatible with mildly broken Windows file URLs. Problem #2 is that the test program generates mildly broken URLs on Windows. That begs the question of what IS the right way to construct file urls in a cross-platform manner. I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this: """Convert a DOS path name to a file url. C:\foo\bar\spam.foo becomes ///C|/foo/bar/spam.foo """ is not really constructing a URL! And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry at scottb.demon.co.uk Thu Jun 7 01:31:51 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:31:51 +0100 Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... In-Reply-To: <20010604161114.A20979@thyrsus.com> Message-ID: <000a01c0eee0$dcfe9250$060210ac@private> Eric, As others have pointed out your time line is wrong... BArry p.s. I'm ex-DEC and old enough to have seen the introduction of the 6502 (got mine at university for $25 inc postage to the U.K.), Z80 and VAX (worked on product for V1.0 of VMS). Also for my sins argued with Gordon Bell and Dave Cutler about CPU architecture. > -----Original Message----- > From: Eric S. Raymond [mailto:esr at thyrsus.com] > Sent: 04 June 2001 21:11 > To: Barry Scott > Cc: python-dev (E-mail) > Subject: Re: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... > > > Barry Scott : > > Eric wrote: > > > While I'm at it, I should note that the design of the 11 was ancestral > > > to both the 8088 and 68000 microprocessors, and thus to essentially > > > every new general-purpose computer designed in the last fifteen years. > > > > The key to PDP-11 and VAX was lots of registers all a like and rich > > addressing modes for the instructions. > > > > The 8088 is very far from this design, its owes its design more to > > 4004 then the PDP-11. > > Yes, but the 4004 was designed as a sort of lobotomized imitation of the 65xx, > which was descended from the 11. Admiitedly, in the chain of transmission here > were two stages of redesign so bad that the connection got really tenuous. > -- > Eric S. Raymond > > ...Virtually never are murderers the ordinary, law-abiding people > against whom gun bans are aimed. Almost without exception, murderers > are extreme aberrants with lifelong histories of crime, substance > abuse, psychopathology, mental retardation and/or irrational violence > against those around them, as well as other hazardous behavior, e.g., > automobile and gun accidents." > -- Don B. Kates, writing on statistical patterns in gun crime > > From barry at scottb.demon.co.uk Thu Jun 7 01:57:11 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 00:57:11 +0100 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <3B1E7840.C93EA788@lemburg.com> Message-ID: <000b01c0eee4$66f8a7e0$060210ac@private> Adding the atomic types of python as classes I'm +1 on. Perfomance is a problem for the parser to handle. If you have not already done so I suggest that you look at what MicroSoft .NET is doing in this area. In .NET, for example, int is a class and they have the technology to define the interface to an int and optimize the performace of the none derived cases. Barry From barry at scottb.demon.co.uk Thu Jun 7 02:03:54 2001 From: barry at scottb.demon.co.uk (Barry Scott) Date: Thu, 7 Jun 2001 01:03:54 +0100 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: <001001c0eee5$571a8090$060210ac@private> > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! If you embrace the world then NO. If America is you world then maybe. Barry From paulp at ActiveState.com Thu Jun 7 02:42:03 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 06 Jun 2001 17:42:03 -0700 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> Message-ID: <3B1ECDDB.F1E8B19D@ActiveState.com> Barry Scott wrote: > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > If you embrace the world then NO. If America is you world then maybe. Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From MarkH at ActiveState.com Thu Jun 7 03:09:51 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 7 Jun 2001 11:09:51 +1000 Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <000b01c0eee4$66f8a7e0$060210ac@private> Message-ID: > If you have not already done so I suggest that you look at > what MicroSoft .NET is doing in this area. In .NET, for example, > int is a class and they have the technology to define the > interface to an int and optimize the performace of the none > derived cases. Actually, that is not completely true. There is a "value type" and a class version. The value type is just the bits. The VM has instructions that work in the value type. As far as I am aware, you can not use a derived class with these instructions. They also have the concept of "sealed" meaning they can not be subclassed. Last time I looked, strings were an example of sealed classes. Mark. From greg at cosc.canterbury.ac.nz Thu Jun 7 04:16:00 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:16:00 +1200 (NZST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? In-Reply-To: <20010606113957.4A395303181@snelboot.oratrix.nl> Message-ID: <200106070216.OAA02594@s454.cosc.canterbury.ac.nz> Jack Jansen : > Should it support > items()? Is only x.keys()/x[] good enough? Check for items(), and fall back on x.keys()/x[] if necessary. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Jun 7 04:19:03 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:19:03 +1200 (NZST) Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <200106070219.OAA02597@s454.cosc.canterbury.ac.nz> > if we were really going to embrace the world we'd need to > handle more than a few European languages! -1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu Jun 7 04:22:33 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Jun 2001 14:22:33 +1200 (NZST) Subject: [Python-Dev] %b format? - 8088 like a PDP-11 I think not... Message-ID: <200106070222.OAA02600@s454.cosc.canterbury.ac.nz> Jack Jansen : > with even SP and PC being general purpose registers The PC is not a general purpose register in the 68000. I've heard that this was because DEC had a patent on the idea. > the latter two actually improved on the PDP-11 The 16032 was certainly extremely orthogonal. I wrote an assembler and a compiler for it once, and it was a joy after coming from the Z80! It wasn't quite perfect, though - its lack of a "top-of-stack-indirect" addressing mode was responsible for the one wart in my otherwise-beautiful code generation strategy. Also, it must have been the most CISCy instruction set the world has ever seen, with the possible exception of the VAX... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu Jun 7 06:54:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 00:54:42 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: <3B1EB624.563DABE0@ActiveState.com> Message-ID: [Paul Prescod] > Tim asked me to look into test_urllib2 failure. Wow! I'm going to remember that. Have to ask people to do things more often . > notice that Guido's name is in the relevant RFC so I guess he's the > real expert <0.5 wink>: > > http://www.faqs.org/rfcs/rfc1738.html > > Anyhow, there are a variety of problems. :( I'm going to add one more. The spec says this is a file URL: fileurl = "file://" [ host | "localhost" ] "/" fpath But on Windows, urllib2.urlopen() throws up even on URLs like: file:///c:/bootlog.txt and file://localhost/c:/bootlog.txt AFAICT, those conform to the spec (the first with an empty host, the second with the special reserved hostname), Windows has no problem with either of them (heck, in Outlook I can click on them while I'm typing this email -- works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which Windows has no idea what to do with. Hard to see why it should, either. > First, test_urllib2 says: > > file_url = "file://%s" % urllib2.__file__ > > This is not going to construct a strictly standards conforming URL on > Windows but that form is still common enough and obvious enough that > maybe we should support it. Common among what? > So that's problem #1, we aren't compatible with mildly broken Windows > file URLs. I haven't found a sense in which Windows file URLs are broken. test_urllib2 creates bad URLs on Windows, and urllib2 itself transforms legit file URLs into broken ones on Windows, but both of those appear to be our (Python's) fault. Until std stuff works, worrying about extensions to the std seems premature. > Problem #2 is that the test program generates mildly broken URLs > on Windows. Yup. > That begs the question of what IS the right way to construct file urls > in a cross-platform manner. The spec seems vaguely clear to me on this point (it's vaguely unclear to me whether a colon is allowed in an fpath -- the text seems to say one thing but the BNF another). > I would have thought that urllib.pathname2url was the way but I note > that it isn't documented. Plus it is poorly named. A function that > does this: > > """Convert a DOS path name to a file url. > > C:\foo\bar\spam.foo > > becomes > > ///C|/foo/bar/spam.foo > """ > > is not really constructing a URL! Or anything else recognizable . > And the semantics of the function on multiple platforms do not seem > to me to be identical. On Windows it adds a bunch of leading slashes > and mac and Unix seem not to. So you can't safely paste a "file:" or > "file://" on the front. I don't know how widely pathname2url has been > used even though it is undocumented....should we fix it and document > it or write a new function? Maybe it's just time to write urllib3.py <0.8 wink>. no-conclusions-from-me-ly y'rs - tim From tim at digicool.com Thu Jun 7 07:16:37 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 7 Jun 2001 01:16:37 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1DE3C1.90BA3DD6@lemburg.com> Message-ID: [M.-A. Lemburg] > Wasn't me for sure... even in the Unicode age, I believe that > Python source code should maintain readability by not allowing > all alpha(numeric) characters for use in identifiers (there are > lots of them in Unicode). > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > and 'A'...'Z' ?! (same for digits) ?! That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week ). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class ). From fredrik at pythonware.com Thu Jun 7 07:50:35 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 7 Jun 2001 07:50:35 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-) From tim.one at home.com Thu Jun 7 08:15:35 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 02:15:35 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <004c01c0ef15$c97f51d0$4ffa42d5@hagrid> Message-ID: [/F] > I don't get it. If people use non-ascii characters, they're clearly not > using Python. from the language reference: My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>. > ... > either change the specification, and break every single tool written by > anyone who actually bothered to read the specification [1], or add a > warning to 2.2. This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here). > ... > 1) I assume the specification didn't exist when GvR wrote the first > CPython implementation ;-) Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms. From mal at lemburg.com Thu Jun 7 10:29:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:29:52 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: <001001c0eee5$571a8090$060210ac@private> <3B1ECDDB.F1E8B19D@ActiveState.com> Message-ID: <3B1F3B80.DB8F4117@lemburg.com> Paul Prescod wrote: > > Barry Scott wrote: > > > > > Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' > > > and 'A'...'Z' ?! (same for digits) ?! > > > > If you embrace the world then NO. If America is you world then maybe. > > Actually, if we were really going to embrace the world we'd need to > handle more than a few European languages! I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu Jun 7 10:42:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 07 Jun 2001 10:42:40 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? References: Message-ID: <3B1F3E80.F8CC16D7@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > Wasn't me for sure... even in the Unicode age, I believe that > > Python source code should maintain readability by not allowing > > all alpha(numeric) characters for use in identifiers (there are > > lots of them in Unicode). > > > > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' > > and 'A'...'Z' ?! (same for digits) ?! > > That's certain to break code, and it's certain that some of those whose code > gets broken would scream very loudly about it. OTOH, nobody would come to > its defense with a hearty "whew! I'm so glad *that* hole finally got > plugged!". I'm sure it would cause less trouble to take away <> as an > alternative spelling of != (except that Barry is actually close enough to > strangle Guido a few days each week ). Is it worth the hassle? I > don't know, but I'd *guess* Guido would rather endure the complaints for > something more substantial (like, say, breaking 10 lines of an expert's > obscure code that relies on int() being a builtin instead of a class > ). Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas at xs4all.net Thu Jun 7 14:03:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 7 Jun 2001 14:03:20 +0200 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <3B1F3E80.F8CC16D7@lemburg.com>; from mal@lemburg.com on Thu, Jun 07, 2001 at 10:42:40AM +0200 References: <3B1F3E80.F8CC16D7@lemburg.com> Message-ID: <20010607140320.Z690@xs4all.nl> On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote: > still, it's funny sometimes how pydevs are willing to break perfectly > valid code in some areas while not considering pointing users to clean up > invalid code in other areas. Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh at python.net Thu Jun 7 14:54:55 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 7 Jun 2001 13:54:55 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-24 - 2001-06-07 Message-ID: This is a summary of traffic on the python-dev mailing list between May 24 and Jun 7 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the ninth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 305 50 | [|] | [|] | [|] | [|] 40 | [|] | [|] | [|] | [|] [|] [|] 30 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-018-014-011-014-020-019-034-035-032-014-008-020-051-015 Thu 24| Sat 26| Mon 28| Wed 30| Fri 01| Sun 03| Tue 05| Fri 25 Sun 27 Tue 29 Thu 31 Sat 02 Mon 04 Wed 06 Another busy-ish fortnight. I've been in Exam Hell(tm) and am writing this when hungover, this so summary might be a bit sketchier than normal. Apologies in advance. * strop vs. string * Greg Stein leapt up to defend the slated-to-be-deprecated strop module by pointing out that it's functions work on any object that supports the buffer API, whereas the 1.6-era string.py only works with objects that sprout the right methods: The discussion quickly degenerated into the usual griping about the fact that the buffer API is flawed and undocumented and not really well understood by many people. * Special-casing "O" * As a followup to the discussion mentioned in the last summary, Martin von Loewis posted a patch to sf enabling functions written in C that expect zero or one object arguments to dispense with the time wasting call to PyArg_ParseTuple: The first version of the patch was criticized for being overly general, and for not being general enough . It seems the forces of simplicity have won, but I don't think the patch has been checked in yet. * the late, unlamented, yearly list.append panic * Tim Peters posted c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). And then ameliorated the worst-case behaviour. So that one was easy. * making dicts ... * You might think that as dictionaries are so central to Python that their implementation would be bulletproof and one the areas of the source that would be least likely to change. This might be true *now*; Tim Peters seems to have spent most of the last fortnight implementing performance improvements one after the other and fixing core-dumping holes in the implementation pointed out by Michael Hudson. The first improvement was to "using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play." If you don't understand what that means, ignore it because Tim came up with a more radical rewrite: which seems to be a win, but sadly removes the shock of finding comments about Galois theory in dictobject.c... Most of the discussion in the thread following Tim's patch was about whether we need 128-bit floats or ints, which is another way of saying everyone liked it :-) This one hasn't been checked in either. * ... and breaking dicts * Inspired by a post to comp.lang.python by Wolfgang Lipp and driven slightly insane by revision, Michael Hudson posted a short program that used a hole in the dict implementation to trigger a core dump: This got fixed, so he did it again: The cause of both problems was C code assuming things about dictionaries remained the same across calls to code that ended up executing arbitrary Python code, which could mutate the dict exactly as much as it pleased, which in turn caused pointers to dangle. This problem has a history in Python; the .sort() method on lists has to fight the same issues. These holes have been plugged, although it is still possible to crash Python with exceptionally contrived code: There's another approach, which is was the .sort() method uses: >>> list = range(10) >>> def c(x,y): ... del list[:] ... return cmp(x, y) ... >>> list.sort(c) Traceback (most recent call last): File "", line 1, in ? File "", line 2, in c TypeError: a list cannot be modified while it is being sorted The .sort() method magically changes the type of the list being sorted to one that doesn't support mutation while it's sorting the list. This approach would have some merit to use with dictionaries too; for one thing we could lose all the contrived code in dictobject.c protecting against this sort of silliness... * arbitrary radix formatting * Greg Wilson made a plea for the addition of a "%b" formatting operator to display integers in binary, e.g: >>> print "%d %x %o %b"%(10,10,10,10) 10 a 12 1010 There was general support for the idea, but Tim Peters and Greg Ewing pointed out that it would be neater to invent a general format code that would enable one to format an integer into an arbitrary base, so that >>>> int("1111", 7) 400 has an inverse at long last. But no-one could think of a spelling that wasn't in general use, and the discussion died :-(. * quick poll * Guido asked if anyone would object violently to the builtin conversion functions becoming type objects on the descr-branch: in analogy to class objects. There was general support and only a few concerns, and the changes have begun to hit descr-branch. I'm sure I'm not the only one who wishes they had the time to understand what is going on in there... Cheers, M. From gmcm at hypernet.com Thu Jun 7 15:06:55 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 7 Jun 2001 09:06:55 -0400 Subject: [Python-Dev] urllib2 In-Reply-To: References: <3B1EB624.563DABE0@ActiveState.com> Message-ID: <3B1F442F.26920.1ECC32A9@localhost> [Tim & Paul on file URLs] [Tim] > But on Windows, urllib2.urlopen() throws up even on URLs like: > > file:///c:/bootlog.txt Curiously enough, url = "file:///" + urllib.quote_plus(fnm) seems to work on Windows. It even seems to work on mac, if you first turn '/' into '%2f', then undo the double quoting (turn '%252f' back into '%2f' in the ensuing url). It even seems to work on mac directory names with Unicode characters in them (though I haven't looked too closely, in fear of jinxing it). eye-of-newt-considered-helpful-ly y'rs - Gordon From pedroni at inf.ethz.ch Thu Jun 7 15:56:30 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Thu, 7 Jun 2001 15:56:30 +0200 (MET DST) Subject: [Python-Dev] quick poll: could int, str, tuple etc. become type objects? Message-ID: <200106071356.PAA04511@core.inf.ethz.ch> Hi. [GvR] > > Is the intent of using int and friends as constructors instead of just > > coercion functions that I should (eventually) be able to do this: > > > > class NonNegativeInt(int): > > def __init__(self, val): > > if int(val) < 0: > > raise ValueError, "Value must be >= 0" > > int.__init__(self, val) > > self.a = 47 > > ... > > > > ? > > Yes, sort-of. The details will be slightly different. I'm not > comfortable with letting a user-provided __init__() method change the > value of self, so I am brooding on a work-around that separates > allocation and one-time initialization from __init__(). Watch PEP > 253. jython already supports vaguely this: from types import IntType as Int class NonNegInt(Int): def __init__(self,val,annot=None): if int(val)<0: raise ValueError,"val<0" Int.__init__(self,val) self._annot = annot def neg(self): return -self def __add__(self,b): if type(b) is NonNegInt: return NonNegInt(Int.__add__(self,b)) return Int.__add__(self,b) def annot(self): return self._annot Jython 2.0 on java1.3.0 (JIT: null) Type "copyright", "credits" or "license" for more information. >>> from NonNegInt import NonNegInt >>> x=NonNegInt(-2) Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 5, in __init__ ValueError: val<0 >>> x=NonNegInt(2) >>> y=NonNegInt(3,"foo") >>> y._annot Traceback (innermost last): File "", line 1, in ? AttributeError: 'int' object has no attribute '_annot' >>> y.annot() Traceback (innermost last): File "", line 1, in ? File "/home/pedroni/BOX/exp/NonNegInt.py", line 15, in annot AttributeError: 'int' object has no attribute '_annot' >>> x+y, type(x+y) (5, ) >>> x.neg() -2 >>> x+(-2),type(x+(-2)) (0, ) >>> As one can see, the semantic is not without holes. The support for this is mainly a side-effect of the fact that internally jython objects are instances of java classes and jython allows to subclass java classes. I have no idea whether someone is already using this kind of stuff, I just remember that someone reported a bug concerning subclassing ListType so ... By the way int, long being types seems nice and elegant to me. A more general note FYI: I have read the PEP drafts about descrs and type as classes, I have not played with the descr-branch yet. I think that the descr and metaclasses stuff can help on jython side to put a lot of things (dealing with java classes, subclassing from them, etc) in a more precise framework polishing up many design aspects and the code. First I suppose that backward compatibility on the jython side is not a real problem, this aspects are so much under-documented that there are no promises about them. On the other hand until we start coding things on jython side (it's complex stuff and jython internals are already complex) it will be really difficult to make constructive comments on possible problems for jython, or toward a design that better fits both jython and CPython needs. Given that we are still working on jython 2.1, maybe we will be able to start working on jython 2.2 only late in 2.2 release cycle when things are somehow fixed and we can only do our best to re-implemnt them. regards Samuele Pedroni. From Greg.Wilson at baltimore.com Thu Jun 7 18:03:44 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Thu, 7 Jun 2001 12:03:44 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Prompted in part by the comment in Michael Hudson's python-dev summary about this discussion having died, I'd like to summarize: 1. Most people who commented felt that a base-2 format would be useful, if only for teaching and debugging. With regard to questions about byte order: A. Integer values are printed as base-2 numbers, so byte order is irrelevant. B. Floating-point numbers are printed as: [sign] [mantissa] [exponent] The mantissa and exponent are shown according to rule A. 2. Inventing a format for converting to arbitrary bases is dubious hypergeneralization (to borrow a phrase). 3. Implementation should mirror octal and hexadecimal support, e.g. a 'bin()' function to go with 'oct()' and 'hex()'. 4. The desirability or otherwise of a "%b" format specifier has nothing to do with the relative merits of any early microprocessor :-). If no-one has strong objections, I'll put together a PEP on this basis. Thanks Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From greg at cosc.canterbury.ac.nz Fri Jun 8 02:55:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Jun 2001 12:55:05 +1200 (NZST) Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E20EF@nsamcanms1.ca.baltimore.com> Message-ID: <200106080055.MAA02711@s454.cosc.canterbury.ac.nz> Greg Wilson : [good stuff about binary format support] > If no-one has strong objections, I'll put together a > PEP on this basis. Sounds okay to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Fri Jun 8 03:39:53 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 7 Jun 2001 21:39:53 -0400 Subject: [Python-Dev] Shouldn't I be able to print Unicode objects? In-Reply-To: <20010607140320.Z690@xs4all.nl> Message-ID: [Thomas Wouters] > ... > I'm also not terribly worried about the use of non-ASCII characters in > identifiers in Python, though a warning for the next one or two releases > would be a good thing -- if anything, it should warn that that trick > won't work for people with different locale settings! Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first. From skip at pobox.com Fri Jun 8 16:51:27 2001 From: skip at pobox.com (Skip Montanaro) Date: Fri, 8 Jun 2001 09:51:27 -0500 Subject: [Python-Dev] sys.modules["__main__"] in Jython Message-ID: <15136.58991.72069.433197@beluga.mojam.com> Would someone with Jython experience check to see if it interprets sys.modules["__main__"] in the same manner as Python? I'm interested to see if doctest's normal usage can be simplified slightly. The doctest documentation states: In normal use, end each module M with: def _test(): import doctest, M # replace M with your module's name return doctest.testmod(M) # ditto if __name__ == "__main__": _test() I'm wondering if this works for Jython as well as Python: def _test(): import doctest, sys return doctest.testmod(sys.modules["__main__"]) if __name__ == "__main__": _test() If so, then I think doctest.testmod's signature can be changed to def testmod(m=None, name=None, globs=None, verbose=None, isprivate=None, report=1): with the following extra code added to the start of the function: if m is None: import sys m = sys.modules["__main__"] That way the most common doctest usage can be changed to def _test(): import doctest return doctest.testmod() if __name__ == "__main__": _test() (I ran into a problem with a module that had initialization code that barfed if executed more than once.) Of course, these changes are ultimately Tim's decision. I'm just trying to knock down various potential hurdles. Thx, Skip From guido at digicool.com Fri Jun 8 18:06:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 08 Jun 2001 12:06:19 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: Your message of "Fri, 08 Jun 2001 12:01:37 EDT." References: Message-ID: <200106081606.f58G6Jj11829@odiug.digicool.com> > Prompted in part by the comment in Michael Hudson's > python-dev summary about this discussion having died, > I'd like to summarize: > > 1. Most people who commented felt that a base-2 format > would be useful, if only for teaching and debugging. > With regard to questions about byte order: > > A. Integer values are printed as base-2 numbers, so > byte order is irrelevant. > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > > The mantissa and exponent are shown according > to rule A. Why bother with floats at all? We can't print floats as hex either. If I were doing any kind of float-representation fiddling, I'd probably want to print it in hex anyway (I can read hex). But as I say, that's not for the general public. > 2. Inventing a format for converting to arbitrary > bases is dubious hypergeneralization (to borrow a > phrase). Agreed. > 3. Implementation should mirror octal and hexadecimal > support, e.g. a 'bin()' function to go with 'oct()' > and 'hex()'. > > 4. The desirability or otherwise of a "%b" format > specifier has nothing to do with the relative > merits of any early microprocessor :-). > > If no-one has strong objections, I'll put together a > PEP on this basis. Go for it. Or just submit a patch to SF -- this seems almost too small for a PEP to me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jun 8 18:10:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 8 Jun 2001 12:10:50 -0400 Subject: [Python-Dev] re: %b format (no, really) References: <200106081606.f58G6Jj11829@odiug.digicool.com> Message-ID: <15136.63754.927103.77358@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Go for it. Or just submit a patch to SF -- this seems almost GvR> too small for a PEP to me. :-) Since we all seem to agree, I'd agree. :) From Greg.Wilson at baltimore.com Fri Jun 8 18:14:14 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 12:14:14 -0400 Subject: [Python-Dev] re: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> > > Greg: > > B. Floating-point numbers are printed as: > > [sign] [mantissa] [exponent] > Guido: > Why bother with floats at all? For teaching purposes, which is what started me on this in the first place --- I would like an easy way to show people the bit patterns corresponding to basic types. > Guido: > Go for it. Or just submit a patch to SF -- this seems almost too > small for a PEP to me. :-) Thanks, Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at snark.thyrsus.com Fri Jun 8 18:23:34 2001 From: esr at snark.thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 12:23:34 -0400 Subject: [Python-Dev] Glowing endorsement of open source and Python Message-ID: <200106081623.f58GNYf22712@snark.thyrsus.com> It doesn't get much better than this: http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html -- Eric S. Raymond In the absence of any evidence tending to show that possession or use of a 'shotgun having a barrel of less than eighteen inches in length' at this time has some reasonable relationship to the preservation or efficiency of a well regulated militia, we cannot say that the Second Amendment guarantees the right to keep and bear such an instrument. [...] The Militia comprised all males physically capable of acting in concert for the common defense. -- Majority Supreme Court opinion in "U.S. vs. Miller" (1939) From mal at lemburg.com Fri Jun 8 19:08:53 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 08 Jun 2001 19:08:53 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <3B2106A5.FD16D95C@lemburg.com> "Eric S. Raymond" wrote: > > It doesn't get much better than this: > > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html I wonder what those MS Office XP ads are doing on that page... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 8 19:21:10 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:21:10 -0400 Subject: [Python-Dev] re: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2179@nsamcanms1.ca.baltimore.com> Message-ID: [Guido] > Why bother with floats at all? [Greg Wilson] > For teaching purposes, which is what started me on this > in the first place --- I would like an easy way to show > people the bit patterns corresponding to basic types. I'm confused by this: while for integers the bits correspond very clearly to what's stored in the machine, if you separate the mantissa and exponent for floats the result won't "look like" the storage at all. Please give an example first, like what do you intend to produce for print "%b" % 0.1 print "%b" % -42e300 ? You have to make decisions about whether or not to unbias the exponent for display (if you don't, it's incomprehensible; if you do, it's not really what's stored); whether or not to materialize the implicit most-significant mantissa bit in 754 normalized values (pretty much ditto); and what to do about Infs, NaNs, signed zeroes and denormal numbers. The kicker is that, to be truly useful for teaching floats, you need a way to select among all combinations of "yes" and "no" for each such decision. A single fixed set of answers will confound more than clarify; e.g., it's important to know what the "true exponent" is, but also to know what biased exponents look like inside the box. This is too much for %b -- write a float-format module instead. From Greg.Wilson at baltimore.com Fri Jun 8 19:34:13 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 8 Jun 2001 13:34:13 -0400 Subject: [Python-Dev] RE: %b format (no, really) Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> > [Guido] > > Why bother with floats at all? > > [Greg Wilson] > > For teaching purposes > [Tim Peters] > if you separate the mantissa and exponent > for floats the result won't "look like" the storage at all. > Please give an example first This is part of what was going to go into the PEP, along with what to do about character data (I've had a couple of emails from people who'd like to be able to look at 8-bit and Unicode characters as bit patterns). > This is too much for %b -- write a float-format module instead. How about a quick patch to do "%b" for int and long-int, and a PEP for a generic "format" module --- arbitrary radix, options for IEEE numbers, etc.? Any objections? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at thyrsus.com Fri Jun 8 19:44:40 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 8 Jun 2001 13:44:40 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Fri, Jun 08, 2001 at 01:34:13PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: <20010608134440.A23160@thyrsus.com> Greg Wilson : > How about a quick patch to do "%b" for int and long-int, and a > PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? I like it. -- Eric S. Raymond The people cannot delegate to government the power to do anything which would be unlawful for them to do themselves. -- John Locke, "A Treatise Concerning Civil Government" From tim.one at home.com Fri Jun 8 19:51:50 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 13:51:50 -0400 Subject: [Python-Dev] RE: %b format (no, really) In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E2188@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > How about a quick patch to do "%b" for int and long-int, Don't know how quick it will be (it should cover type slots and bin() and __bin__ and 0b1101 notation too, right?), but +1 from me. That much is routinely requested. > and a PEP for a generic "format" module --- arbitrary radix, options > for IEEE numbers, etc.? Any objections? None here. From bckfnn at worldonline.dk Fri Jun 8 21:15:14 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Fri, 08 Jun 2001 19:15:14 GMT Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <15136.58991.72069.433197@beluga.mojam.com> References: <15136.58991.72069.433197@beluga.mojam.com> Message-ID: <3b212431.21754982@smtp.worldonline.dk> [Skip] >Would someone with Jython experience check to see if it interprets >sys.modules["__main__"] in the same manner as Python? To me it seems like Jython defines sys.modules["__main__"] in the same way as CPython. >I'm wondering if this works for Jython as well as Python: > > def _test(): > import doctest, sys > return doctest.testmod(sys.modules["__main__"]) > > if __name__ == "__main__": > _test() It works for Jython. regards, finn From thomas at xs4all.net Fri Jun 8 23:41:02 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 8 Jun 2001 23:41:02 +0200 Subject: [Python-Dev] Glowing endorsement of open source and Python In-Reply-To: <200106081623.f58GNYf22712@snark.thyrsus.com>; from esr@snark.thyrsus.com on Fri, Jun 08, 2001 at 12:23:34PM -0400 References: <200106081623.f58GNYf22712@snark.thyrsus.com> Message-ID: <20010608234102.B690@xs4all.nl> On Fri, Jun 08, 2001 at 12:23:34PM -0400, Eric S. Raymond wrote: > It doesn't get much better than this: > http://it.mycareer.com.au/news/2001/06/05/FFX9ZT7UENC.html It's a nice (and very flattering!) piece, but it's a tad buzzword heavy. "[Python] supports XML for e-commerce and mobile applications" ? Well, shit, so *that*'s what XML is for :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Sat Jun 9 00:02:06 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 8 Jun 2001 18:02:06 -0400 Subject: [Python-Dev] sys.modules["__main__"] in Jython In-Reply-To: <3b212431.21754982@smtp.worldonline.dk> Message-ID: [Finn Bock] > To me it seems like Jython defines sys.modules["__main__"] in the same > way as CPython. Thank you, Finn! doctest has always avoided introspection tricks for which Jython doesn't work "exactly the same way" as CPython. However, in the past it achieved this by not paying any attention , then ripping out bad ideas when a Jython user reported failure. But now that it's in the std library, I want to proceed more carefully. Skip's idea is much more attractive now that you've confirmed it will work there too. From tim.one at home.com Sun Jun 10 03:10:53 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 9 Jun 2001 21:10:53 -0400 Subject: [Python-Dev] Struct schizophrenia Message-ID: I'm adding "long long" integral types to struct (in native mode, "long long" or __int64 on platforms that have them; in standard mode, 64 bits). This is proving harder than it should be, because the code that's already there is schizophrenic across boundaries, so is failing as a base to build on (raises more questions than it answers). Like: >>> x = 256 >>> struct.pack("b", x) # complains about magnitude in native mode Traceback (most recent call last): File "", line 1, in ? struct.error: byte format requires -128<=number<=127 >>> struct.pack("=b", x) # but doesn't with native order + std align '\x00' >>> struct.pack(">> struct.pack(">> struct.pack("", line 1, in ? OverflowError: long int too large to convert >>> Much the same is true of other small int sizes: you can't predict what will happen without trying it; and once you get to ints, no range-checking is performed even in native mode. Surely this can't stand, but what do people *want*? My preference is to raise the same "byte format requires -128<=number<=127" exception in all these cases; OTOH, the code structure fights that, working with Python longs is clumsy in C, and there are other "undocumented features" here that may or may not be accidents: >>> struct.pack("B", 234.3) '\xea' >>> That is, did we *intend* to accept floats packed via integer typecodes? Feature or bug? In the other (unpack) direction, the docs say for 'I' (unsigned int): The "I" conversion code will convert to a Python long if the C int is the same size as a C long, which is typical on most modern systems. If a C int is smaller than a C long, an Python integer will be created instead. That's in a footnote. In another part, they say: For the "I" and "L" format characters, the return value is a Python long integer. The footnote is wrong -- but is the footnote what was intended (somebody went to a fair bit of work to write all the stuff )? From tim.one at home.com Sun Jun 10 06:25:51 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 10 Jun 2001 00:25:51 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb Message-ID: Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its extension language. but-then-what-doesn't-ly y'rs - tim -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of Skip Montanaro Sent: Saturday, June 09, 2001 12:31 AM To: python-list at python.org Subject: printing Python stack info from gdb From tim.one at home.com Sun Jun 10 21:36:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 10 Jun 2001 15:36:50 -0400 Subject: [Python-Dev] FW: list-display semantics? Message-ID: I opened a bug on this: If anyone's keen to play with the grammar, have at it! Everyone at PythonLabs would +1 it. -----Original Message----- From: python-list-admin at python.org [mailto:python-list-admin at python.org]On Behalf Of jainweiwu Sent: Sunday, June 10, 2001 2:30 PM To: python-list at python.org Subject: list-display semantics? Hi all: I tried the one-line command in a interaction mode: [x for x in [1, 2, 3], y for y in [4, 5, 6]] and the result surprised me, that is: [[1,2,3],[1,2,3],[1,2,3],9,9,9] Who can explain the behavior? Since I expected the result should be: [[1,4],[1,5],[1,6],[2,4],...] -- Pary All Rough Yet. parywu at seed.net.tw -- http://mail.python.org/mailman/listinfo/python-list From dan at cgsoftware.com Sun Jun 10 22:30:24 2001 From: dan at cgsoftware.com (Daniel Berlin) Date: 10 Jun 2001 16:30:24 -0400 Subject: [Python-Dev] FW: printing Python stack info from gdb In-Reply-To: ("Tim Peters"'s message of "Sun, 10 Jun 2001 00:25:51 -0400") References: Message-ID: <87n17grsbj.fsf@cgsoftware.com> "Tim Peters" writes: > Fwd'ing this Skip gem from c.l.py, primarily so I can find it again next > time I'm thrashing on a Unix box. gdb clearly needs to adopt Python as its > extension language. HP has patches to do this, actually. Works quite nicely. And trust me, i've tried to get them to do it more than once. As I pointed out to skip, if he can profile gdb and tell me where the slowness is, it's likely I can make it a ton faster. GDB could use major optimizations almost everywhere. And i've done quite a lot of them, they just haven't been reviewed/integrated yet. --Dan C++ support maintainer - GDB DWARF2 reader person - GDB Symbol table patch submitting weirdo - GDB etc > > but-then-what-doesn't-ly y'rs - tim > > -----Original Message----- > From: python-list-admin at python.org > [mailto:python-list-admin at python.org]On Behalf Of Skip Montanaro > Sent: Saturday, June 09, 2001 12:31 AM > To: python-list at python.org > Subject: printing Python stack info from gdb > > >>From time to time I've wanted to be able to print the Python stack from gdb. > Today I broke down and spent some time actually implementing something. > > set $__trimpath = 1 > define ppystack > set $__fr = 0 > select-frame $__fr > while !($pc > Py_Main && $pc < Py_GetArgcArgv) > if $pc > eval_code2 && $pc < set_exc_info > set $__fn = PyString_AsString(co->co_filename) > set $__n = PyString_AsString(co->co_name) > if $__n[0] == '?' > set $__n = "" > end > if $__trimpath > set $__f = strrchr($__fn, '/') > if $__f > set $__fn = $__f + 1 > end > end > printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n > end > set $__fr = $__fr + 1 > select-frame $__fr > end > select-frame 0 > end > > Output looks like this (and dribbles out *quite slowly*): > > Text_Editor.py (147): apply_tag > Text_Editor.py (152): apply_tag_by_name > Script_GUI.py (302): push_help > Script_GUI.py (113): put_help > Script_GUI.py (119): focus_enter > Signal.py (34): handle_signal > Script_GUI.py (324): main > Script_GUI.py (338): > > If you don't want to trim the paths from the filenames, set $__trimpath to > 0. > > Warning: I've only tried this with a very recent CVS version of Python on a > PIII-based Linux system with an interpreter compiled using gcc. I rely on > the ordering of functions within the while loop to detect when to exit the > loop and when the frame I'm examining is an eval_code2 frame. I'm sure > there are plenty of people out there with more gdb experience than me. I > welcome any feedback on ways to improve this little bit of code. > > -- > Skip Montanaro (skip at pobox.com) > (847)971-7098 > > -- > http://mail.python.org/mailman/listinfo/python-list > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- "I saw a man with a wooden leg, and a real foot. "-Steven Wright From greg at cosc.canterbury.ac.nz Mon Jun 11 04:44:54 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 11 Jun 2001 14:44:54 +1200 (NZST) Subject: [Python-Dev] FW: list-display semantics? In-Reply-To: Message-ID: <200106110244.OAA03090@s454.cosc.canterbury.ac.nz> parywu at seed.net.tw: > [x for x in [1, 2, 3], y for y in [4, 5, 6]] > and the result surprised me, that is: > [[1,2,3],[1,2,3],[1,2,3],9,9,9] Did you by any chance execute that in an environment where y was previously bound to 9? It will be parsed as [x for x in ([1, 2, 3], y) for y in [4, 5, 6]] which should give a NameError if y is previously unbound, since it will try to evaluate ([1, 2, 3], y) before y is bound by the inner loop. But executing y = 9 beforehand will give the results you got. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From gstein at lyra.org Mon Jun 11 13:31:59 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 11 Jun 2001 04:31:59 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Wed, Jun 06, 2001 at 07:34:15AM -0700 References: Message-ID: <20010611043158.E26210@lyra.org> On Wed, Jun 06, 2001 at 07:34:15AM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv17474 > > Modified Files: > Tag: descr-branch > object.c > Log Message: > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > where __dict__ is stored in an object. The simplest case is to add > tp_dictoffset to the start of the object, but there are comlications: > tp_flags may tell us that tp_dictoffset is not defined, or the offset > may be negative: indexing from the end of the object, where > tp_itemsize may have to be taken into account. Why would you ever have a negative size in there? That seems like an unnecessary "feature". The offsets are easily set up by the compiler as positive values. (not even sure how you'd come up with a proper/valid negative value) Cheers, -g > > > Index: object.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v > retrieving revision 2.124.4.11 > retrieving revision 2.124.4.12 > diff -C2 -r2.124.4.11 -r2.124.4.12 > *** object.c 2001/06/06 14:27:54 2.124.4.11 > --- object.c 2001/06/06 14:34:13 2.124.4.12 > *************** > *** 1074,1077 **** > --- 1074,1111 ---- > } > > + /* Helper to get a pointer to an object's __dict__ slot, if any */ > + > + PyObject ** > + _PyObject_GetDictPtr(PyObject *obj) > + { > + #define PTRSIZE (sizeof(PyObject *)) > + > + long dictoffset; > + PyTypeObject *tp = obj->ob_type; > + > + if (!(tp->tp_flags & Py_TPFLAGS_HAVE_CLASS)) > + return NULL; > + dictoffset = tp->tp_dictoffset; > + if (dictoffset == 0) > + return NULL; > + if (dictoffset < 0) { > + dictoffset += tp->tp_basicsize; > + assert(dictoffset > 0); /* Sanity check */ > + if (tp->tp_itemsize > 0) { > + int n = ((PyVarObject *)obj)->ob_size; > + if (n > 0) { > + dictoffset += tp->tp_itemsize * n; > + /* Round up, if necessary */ > + if (tp->tp_itemsize % PTRSIZE != 0) { > + dictoffset += PTRSIZE - 1; > + dictoffset /= PTRSIZE; > + dictoffset *= PTRSIZE; > + } > + } > + } > + } > + return (PyObject **) ((char *)obj + dictoffset); > + } > + > /* Generic GetAttr functions - put these in your tp_[gs]etattro slot */ > > *************** > *** 1082,1086 **** > PyObject *descr; > descrgetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1116,1120 ---- > PyObject *descr; > descrgetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1097,1103 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject *dict = * (PyObject **) ((char *)obj + dictoffset); > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > --- 1131,1137 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > ! PyObject *dict = *dictptr; > if (dict != NULL) { > PyObject *res = PyDict_GetItem(dict, name); > *************** > *** 1129,1133 **** > PyObject *descr; > descrsetfunc f; > ! int dictoffset; > > if (tp->tp_dict == NULL) { > --- 1163,1167 ---- > PyObject *descr; > descrsetfunc f; > ! PyObject **dictptr; > > if (tp->tp_dict == NULL) { > *************** > *** 1143,1149 **** > } > > ! dictoffset = tp->tp_dictoffset; > ! if (dictoffset != 0) { > ! PyObject **dictptr = (PyObject **) ((char *)obj + dictoffset); > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > --- 1177,1182 ---- > } > > ! dictptr = _PyObject_GetDictPtr(obj); > ! if (dictptr != NULL) { > PyObject *dict = *dictptr; > if (dict == NULL && value != NULL) { > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins -- Greg Stein, http://www.lyra.org/ From guido at digicool.com Mon Jun 11 14:57:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 08:57:18 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects object.c,2.124.4.11,2.124.4.12 In-Reply-To: Your message of "Mon, 11 Jun 2001 04:31:59 PDT." <20010611043158.E26210@lyra.org> References: <20010611043158.E26210@lyra.org> Message-ID: <200106111257.IAA03505@cj20424-a.reston1.va.home.com> > > Modified Files: > > Tag: descr-branch > > object.c > > Log Message: > > Add _PyObject_GetDictPtr() -- an internal API to get a pointer to > > where __dict__ is stored in an object. The simplest case is to add > > tp_dictoffset to the start of the object, but there are comlications: > > tp_flags may tell us that tp_dictoffset is not defined, or the offset > > may be negative: indexing from the end of the object, where > > tp_itemsize may have to be taken into account. > > Why would you ever have a negative size in there? That seems like an > unnecessary "feature". The offsets are easily set up by the compiler as > positive values. (not even sure how you'd come up with a proper/valid > negative value) When extending a type like tuple or string, the __dict__ has to be added to the end, after the last item, because we can't change the starting offset of the first item. This is not at a fixed offset from the start of the structure. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Jun 11 18:50:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:50:11 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <3B24F6C3.C911C0BF@lemburg.com> I would like to add a .decode() method to Unicode objects and also enable the builtin unicode() to accept Unicode object as input. The .decode() method will work just like the .encode() method except that it interfaces to the decode API of the codec in question. While this may seem useless for the currently available encodings, it does have some use for codecs which recode Unicode to Unicode, e.g. codecs which do XML escaping or Unicode compression. Any objections ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon Jun 11 18:57:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 18:57:12 +0200 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <3B24F868.A3DFA649@lemburg.com> Tamito KAJIYAMA recently announced that he changed the licenses on his Japanese codecs from GPL to a BSD variant. This is great news since this would allow adding the codecs to the Python core which would certainly attract more users to Python in Asia. The codecs are available at: http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ The codecs are 280kB when compressed as .tar.gz file. Thoughts ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From aahz at rahul.net Mon Jun 11 19:42:30 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 11 Jun 2001 10:42:30 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B24F868.A3DFA649@lemburg.com> from "M.-A. Lemburg" at Jun 11, 2001 06:57:12 PM Message-ID: <20010611174230.0625E99C8D@waltz.rahul.net> M.-A. Lemburg wrote: > > Tamito KAJIYAMA recently announced that he changed the licenses > on his Japanese codecs from GPL to a BSD variant. This is great > news since this would allow adding the codecs to the Python core > which would certainly attract more users to Python in Asia. > > The codecs are 280kB when compressed as .tar.gz file. +0 I like the idea, am uncomfortable with that amount of space. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From fdrake at cj42289-a.reston1.va.home.com Mon Jun 11 21:15:06 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 11 Jun 2001 15:15:06 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Substantial additional material on floating point arithmetic in the tutorial, written by Tim Peters to explain why FP can fail to reflect the decimal world presented to the user. Lots of additional updates and corrections. From guido at digicool.com Mon Jun 11 22:07:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 11 Jun 2001 16:07:40 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline Message-ID: <200106112007.f5BK7eW22506@odiug.digicool.com> Please comment on the following. This came up a while ago in python-dev and I decided to follow through. I'm making this a PEP because of the risk of breaking code (which everybody on Python-dev seemed to think was acceptable). --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 259 Title: Omit printing newline after newline Version: $Revision: 1.1 $ Author: guido at python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 11-Jun-2001 Post-History: 11-Jun-2001 Abstract Currently, the print statement always appends a newline, unless a trailing comma is used. This means that if we want to print data that already ends in a newline, we get two newlines, unless special precautions are taken. I propose to skip printing the newline when it follows a newline that came from data. In order to avoid having to add yet another magic variable to file objects, I propose to give the existing 'softspace' variable an extra meaning: a negative value will mean "the last data written ended in a newline so no space *or* newline is required." Problem When printing data that resembles the lines read from a file using a simple loop, double-spacing occurs unless special care is taken: >>> for line in open("/etc/passwd").readlines(): ... print line ... root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin: daemon:x:2:2:daemon:/sbin: (etc.) >>> While there are easy work-arounds, this is often noticed only during testing and requires an extra edit-test roundtrip; the fixed code is uglier and harder to maintain. Proposed Solution In the PRINT_ITEM opcode in ceval.c, when a string object is printed, a check is already made that looks at the last character of that string. Currently, if that last character is a whitespace character other than space, the softspace flag is reset to zero; this suppresses the space between two items if the first item is a string ending in newline, tab, etc. (but not when it ends in a space). Otherwise the softspace flag is set to one. The proposal changes this test slightly so that softspace is set to: -1 -- if the last object written is a string ending in a newline 0 -- if the last object written is a string ending in a whitespace character that's neither space nor newline 1 -- in all other cases (including the case when the last object written is an empty string or not a string) Then, the PRINT_NEWLINE opcode, printing of the newline is suppressed if the value of softspace is negative; in any case the softspace flag is reset to zero. Scope This only affects printing of 8-bit strings. It doesn't affect Unicode, although that could be considered a bug in the Unicode implementation. It doesn't affect other objects whose string representation happens to end in a newline character. Risks This change breaks some existing code. For example: print "Subject: PEP 259\n" print message_body In current Python, this produces a blank line separating the subject from the message body; with the proposed change, the body begins immediately below the subject. This is not very robust code anyway; it is better written as print "Subject: PEP 259" print print message_body In the test suite, only test_StringIO (which explicitly tests for this feature) breaks. Implementation A patch relative to current CVS is here: http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From BPettersen at NAREX.com Mon Jun 11 22:20:38 2001 From: BPettersen at NAREX.com (Bjorn Pettersen) Date: Mon, 11 Jun 2001 14:20:38 -0600 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <6957F6A694B49A4096F7CFD0D900042F27D452@admin56.narex.com> > From: Guido van Rossum [mailto:guido at digicool.com] > > Subject: PEP 259: Omit printing newline after newline This would probably break most of the cgi scripts I did at my last job without giving any useful error message. But then again... why should I care ? -- bjorn From skip at pobox.com Mon Jun 11 22:20:33 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 11 Jun 2001 15:20:33 -0500 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> References: <20010611191506.DC49F28923@cj42289-a.reston1.va.home.com> Message-ID: <15141.10257.487549.196538@beluga.mojam.com> Fred> Substantial additional material on floating point arithmetic in Fred> the tutorial, written by Tim Peters to explain why FP can fail to Fred> reflect the decimal world presented to the user. I took a quick look at that appendix. One thing that confused me a bit was that if 0.1 is approximated by something ever-so-slightly larger than 0.1, how is it that if you add ten of them together you wind up with a result that is ever-so-slightly less than 1.0? I didn't expect it to be exactly 1.0. Other floating point naifs may be confused in the same way: >>> "%.55f" % 0.5 '0.5000000000000000000000000000000000000000000000000000000' >>> "%.55f" % 0.1 '0.1000000000000000055511151231257827021181583404541015625' >>> "%.55f" % (0.5+0.1) '0.5999999999999999777955395074968691915273666381835937500' I guess the explanation is that not only can't most decimals be represented exactly, but that summing the same approximation multiple times doesn't always skew the error in the same direction either: >>> "%.55f" % (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1) '0.7999999999999999333866185224906075745820999145507812500' >>> "%.55f" % (0.8) '0.8000000000000000444089209850062616169452667236328125000' IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, Skip From mal at lemburg.com Mon Jun 11 22:55:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 11 Jun 2001 22:55:13 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <3B253031.AB1954CB@lemburg.com> Guido van Rossum wrote: > > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 259 > Title: Omit printing newline after newline > ... > Scope > > This only affects printing of 8-bit strings. It doesn't affect > Unicode, although that could be considered a bug in the Unicode > implementation. It doesn't affect other objects whose string > representation happens to end in a newline character. I guess I should fix the Unicode stuff ;-) > Risks > > This change breaks some existing code. For example: > > print "Subject: PEP 259\n" > print message_body > > In current Python, this produces a blank line separating the > subject from the message body; with the proposed change, the body > begins immediately below the subject. This is not very robust > code anyway; it is better written as > > print "Subject: PEP 259" > print > print message_body > > In the test suite, only test_StringIO (which explicitly tests for > this feature) breaks. Hmm, I think the above is a very typical idiom for RFC822 style content and used in CGI scripts a lot. I'm not sure whether this change is worth getting the CGI crowd upset... Wouldn't it make sense to only use this technique in inter- active mode ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 00:00:54 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 00:00:54 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode Message-ID: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> > I would like to add a .decode() method to Unicode objects and also > enable the builtin unicode() to accept Unicode object as input. -1. What is this good for? > While this may seem useless for the currently available encodings, > it does have some use for codecs which recode Unicode to Unicode, > e.g. codecs which do XML escaping or Unicode compression. I still can see the value. If you think the codec API is good for such transformation, why not use it? I.e. enc,dec,_,_ = codecs.lookup("compress-form-foo") s = dec(s) Furthermore, this seems like a form of hypergeneralization. If you have this, why not also add s = s.decode("capitalize") # instead of s.capitalize() i = s.decode("int") # instead of int(s) > Any objections ? Yes, I think this should not be added. Regards, Martin From paulp at ActiveState.com Tue Jun 12 01:38:55 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Mon, 11 Jun 2001 16:38:55 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25568F.B766E00D@ActiveState.com> "Martin v. Loewis" wrote: > >... > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) IMO, there is a huge usability difference between the above and mystr.decode("base64"). I think that we've done a good job of providing better ways to get at codecs than the codecs.lookup function. I don't see how this is any different. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Tue Jun 12 01:51:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 11:51:55 +1200 (NZST) Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: <200106112351.LAA03197@s454.cosc.canterbury.ac.nz> Skip Montanaro : > One thing that confused me a bit was > that if 0.1 is approximated by something ever-so-slightly larger than 0.1, > how is it that if you add ten of them together you wind up with a result > that is ever-so-slightly less than 1.0? I think what's happening is that the exact binary result of adding 0.1_plus_a_little to itself has one more bit than there is room for, so it gets shifted right and one bit falls off the end. The amount you lose when that happens a few times ends up outweighing the extra that you would expect. Whether it's worth trying to explain *that* in the tutorial I don't know! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Tue Jun 12 02:00:33 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 12 Jun 2001 12:00:33 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Guido: > I propose to skip printing the newline when it follows a newline > that came from data. -1 There's too much magic in the way print handles spaces and newlines already. Making it even more magical and inconsistent seems like exactly the wrong direction to be going in. If there are to be any changes to the way print works, I would prefer to see one that removes the need for the softspace flag altogether. The behaviour of a given print should not depend on state left behind by some previous one. Neither should it depend on whether the characters being printed come directly from a string or not. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Tue Jun 12 04:17:24 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 11 Jun 2001 22:17:24 -0400 Subject: [Python-Dev] Feedback on new floating point info in tutorial In-Reply-To: <15141.10257.487549.196538@beluga.mojam.com> Message-ID: [Skip Montanaro, on the in-progess 2.2 Tutorial appendix] > I took a quick look at that appendix. One thing that confused me > a bit was that if 0.1 is approximated by something ever-so-slightly > larger than 0.1, how is it that if you add ten of them together you > wind up with a result that is ever-so-slightly less than 1.0? Good for you, Skip! In all the years I've been explaining this stuff, I only recall one other picking up on that immediately. I'm not writing a book here, though , and any intro numeric programming text emphasizes that n*x is a better bet than adding x together n times. >>> .1 * 10 1.0 >>> Greg Ewing put you on the right track, if you want to figure it out yourself (as Deep Throat said, "follow the bits, Skip -- follow the bits"). > I didn't expect it to be exactly 1.0. Other floating point naifs > may be confused in the same way: > > >>> "%.55f" % 0.5 > '0.5000000000000000000000000000000000000000000000000000000' > >>> "%.55f" % 0.1 > '0.1000000000000000055511151231257827021181583404541015625' > >>> "%.55f" % (0.5+0.1) > '0.5999999999999999777955395074968691915273666381835937500' Note that this output is platform-dependent. For example, the last on Windows is >>> "%.55f" % (0.5+0.1) '0.5999999999999999800000000000000000000000000000000000000' > ... > IEEE-754-is-full-of-traps-for-the-unwary-ly y'rs, All computer arithmetic is; and among binary fp systems, 754 has got to be the best-behaved there is. Know how many irksome bugs I've fixed in Python mucking with different sizes of integers across platforms, and what C does and doesn't guarantee about them? About 20x more than fp bugs. Of course there's 10000x as much integer code in Python too . god-created-the-integers-from-1-through-3-inclusive-and-that's-it-ly y'rs - tim From barry at digicool.com Tue Jun 12 05:00:52 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 11 Jun 2001 23:00:52 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> <200106120000.MAA03201@s454.cosc.canterbury.ac.nz> Message-ID: <15141.34276.191510.708654@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> There's too much magic in the way print handles spaces and GE> newlines already. Making it even more magical and inconsistent GE> seems like exactly the wrong direction to be going in. I tend to agree. I'm sometimes bitten by the double newlines, but as I think Andrew brought up in c.l.py, I'd rather see a way to tell readlines() to strip the newlines than to add more magic to print. print-has-all-the-magic-it-needs-now-<>-ly y'rs, -Barry From fredrik at pythonware.com Tue Jun 12 08:21:55 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 08:21:55 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <200106112007.f5BK7eW22506@odiug.digicool.com> Message-ID: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> guido wrote: > Please comment on the following. This came up a while ago in > python-dev and I decided to follow through. I'm making this a PEP > because of the risk of breaking code (which everybody on Python-dev > seemed to think was acceptable). when was this discussed on python-dev? From mal at lemburg.com Tue Jun 12 09:09:05 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:09:05 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106112200.f5BM0sO01692@mira.informatik.hu-berlin.de> Message-ID: <3B25C011.125B6462@lemburg.com> "Martin v. Loewis" wrote: > > > I would like to add a .decode() method to Unicode objects and also > > enable the builtin unicode() to accept Unicode object as input. > > -1. What is this good for? See below :) > > While this may seem useless for the currently available encodings, > > it does have some use for codecs which recode Unicode to Unicode, > > e.g. codecs which do XML escaping or Unicode compression. > > I still can see the value. If you think the codec API is good for such > transformation, why not use it? I.e. > > enc,dec,_,_ = codecs.lookup("compress-form-foo") > s = dec(s) Sure and that's the point. I would like to add the .decode() method to make this just as simple as encoding Unicode to UTF-8. Note that strings already have this method: str.encode() str.decode() uni.encode() #uni.decode() # still missing > Furthermore, this seems like a form of hypergeneralization. If you > have this, why not also add > > s = s.decode("capitalize") # instead of s.capitalize() > i = s.decode("int") # instead of int(s) No, that's not the intention. One very useful application for this method is XML unescaping which turns numeric XML entities into Unicode chars. Others are Unicode decompression (using the Unicode compression algorithm) and certain forms of Unicode normalization. The key argument for these interfaces is that they provide an extensible transformation mechanism for string and binary data. > > Any objections ? > > Yes, I think this should not be added. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Tue Jun 12 09:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 12 Jun 2001 03:29:02 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: [/F] > when was this discussed on python-dev? It wasn't -- it actually came up on one of the SourceForge mailing lists ... ah, of course, tried to search but "Geocrawler is down for nightly database maintenance". They sure have long nights . I'm guessing it's the python-iterators list. It spun off of a thread where Guido was wondering whether one of the new ways to spell "iterate over a file" should return lines without trailing \n, so that e.g. for line in sys.stdin: print line wasn't a surprise. I opined it would be better to make all ways of iterating a file do the same thing, but change print instead. We both agreed that couldn't happen. But then I couldn't find any code it would break, only code of the form print line, where the "," was trying to suppress the extra newline, and that would continue to work the same way even if print were changed. The notion that legions of people are using print line as an obscure way to get double-spacing is taking me by surprise. Nobody on the iterators list had this objection. win-some-lose-some-lose-some-lose-some-lose-some-ly y'rs - tim From mal at lemburg.com Tue Jun 12 09:35:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 09:35:08 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010611174230.0625E99C8D@waltz.rahul.net> Message-ID: <3B25C62C.969B40B3@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > > > Tamito KAJIYAMA recently announced that he changed the licenses > > on his Japanese codecs from GPL to a BSD variant. This is great > > news since this would allow adding the codecs to the Python core > > which would certainly attract more users to Python in Asia. > > > > The codecs are 280kB when compressed as .tar.gz file. > > +0 > > I like the idea, am uncomfortable with that amount of space. Tamito corrected me about the size (his file includes the .pyc byte code files): the correct size for the sources is 143kB -- almost half of what I initially wrote. If that should still be too much, there are probably some ways to further compress the size of the mapping tables which could be investigated. PS: Tamito is very thrilled about getting his codecs into the core and I am quite certain that he is also prepared to maintain them (I have put him on CC). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Tue Jun 12 09:37:55 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 12 Jun 2001 03:37:55 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Include longobject.h,2.19,2.20 In-Reply-To: <3B25C116.3E65A32D@lemburg.com> Message-ID: [M.-A. Lemburg] > I have tried to compile longobject.c/h on a HP-UX box and am getting > warnings about MIN/MAX being redefined. Perhaps you should add > an #undef for these before the #define ?! I changed nothing relevant here. Are you certain this is a new problem? The MIN/MAX macros have been in longobject.c for a long time, and I didn't touch them. In any case, I'm not inclined to fiddle things on a box where I can't see a problem so can't know whether I'm fixing it or just creating new problems. If you can figure out why it's happening on that box, and it's a legit problem there, feel free to fix it. From SBrunning at trisystems.co.uk Tue Jun 12 10:25:19 2001 From: SBrunning at trisystems.co.uk (Simon Brunning) Date: Tue, 12 Jun 2001 09:25:19 +0100 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline Message-ID: <31575A892FF6D1118F5800600846864D78BD25@intrepid> > From: Guido van Rossum [SMTP:guido at digicool.com] > In order to avoid having to add yet another magic variable to file > objects, I propose to give the existing 'softspace' variable an > extra meaning: a negative value will mean "the last data written > ended in a newline so no space *or* newline is required." Better another magic variable than a magic value for an old one, I think. Cheers, Simon Brunning TriSystems Ltd. sbrunning at trisystems.co.uk ----------------------------------------------------------------------- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. TriSystems Ltd. cannot accept liability for statements made which are clearly the senders own. From thomas at xs4all.net Tue Jun 12 10:33:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 10:33:30 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: ; from tim.one@home.com on Tue, Jun 12, 2001 at 03:29:02AM -0400 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <20010612103330.D690@xs4all.nl> On Tue, Jun 12, 2001 at 03:29:02AM -0400, Tim Peters wrote: > [/F] > > when was this discussed on python-dev? > It wasn't -- it actually came up on one of the SourceForge mailing lists ... > I'm guessing it's the python-iterators list. I'm guessing the same thing, because I *did* see the proposal somewhere. I recall thinking 'that might work' but not much else, anyway. > The notion that legions of people are using > print line > as an obscure way to get double-spacing is taking me by surprise. Bah, humbug! (And you can quote me on that.) Backward compatibility is not an issue -- that's why we have future-imports and warning mechanisms. Import smart-print from future to get the new behaviour, and warn whenever print *would* *have* printed one newline less otherwise. Regardless, I'm -1 on this change. Not because of backward compatibility problem, but because of what GregE said. Let's not make print even more magically unpredictably confusing than it already is, with comma's that do something magical, softspace to control that magic, and shifting the print operator to the right :-) Why can't we use for line in file: print line, to print all lines in a file ? Softspace doesn't seem to add a space (though I had to write a testcase to make sure ;) and 'explicit is better than implicit'. I'd also prefer special syntax to control the softspace behaviour, like say: print "spam:", "ham" : "and" : "eggs" to print 'spamandeggs' without a space inbetween. Too late for that, I 'spose :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 11:42:52 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 11:42:52 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: "mal@lemburg.com"'s message of Tue, 12 Jun 2001 09:09:05 +0200 Message-ID: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> > str.encode() > str.decode() > uni.encode() > #uni.decode() # still missing It's not missing. str.decode and uni.encode go through a single codec; that's easy. str.encode is somewhat more confusing, because it really is unicode(str).encode. Now, you are not proposing that uni.decode is str(uni).decode, are you? If not that, what else would it mean? And if it means something else, it is clearly not symmetric to str.encode, so it is not "missing". > One very useful application for this method is XML unescaping > which turns numeric XML entities into Unicode chars. Ok. Please show me how that would work. More precisely, please write a PEP describing the rationale for this feature, including use case examples and precise semantics of the proposed addition. > The key argument for these interfaces is that they provide > an extensible transformation mechanism for string and binary > data. That is too general for me to understand; I need to see detailed examples that solve real-world problems. Regards, Martin P.S. I don't think that unescaping XML characters entities into Unicode characters is a useful application in itself. This is normally done by the XML parser, which not only has to deal with character entities, but also with general entities and a lot of other markup. Very few people write XML parsers, and they are using the string methods and the sre module successfully (if the parser is written in Python - a C parser would do the unescaping before even passing the text to Python). From thomas at xs4all.net Tue Jun 12 12:02:03 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 12:02:03 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl>; from thomas@xs4all.net on Tue, Jun 12, 2001 at 10:33:30AM +0200 References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> Message-ID: <20010612120203.E690@xs4all.nl> On Tue, Jun 12, 2001 at 10:33:30AM +0200, Thomas Wouters wrote: > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. Err. I meant "hamandeggs" with no space inbetween. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue Jun 12 12:13:21 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:13:21 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> Message-ID: <3B25EB41.807C2C51@lemburg.com> "Martin v. Loewis" wrote: > > > str.encode() > > str.decode() > > uni.encode() > > #uni.decode() # still missing > > It's not missing. str.decode and uni.encode go through a single codec; > that's easy. str.encode is somewhat more confusing, because it really > is unicode(str).encode. Now, you are not proposing that uni.decode is > str(uni).decode, are you? No. uni.decode() will (just like the other methods) directly interface to the codecs decoder -- there is no magic conversion involved. It is meant to be used by Unicode-Unicode codecs > If not that, what else would it mean? And if it means something else, > it is clearly not symmetric to str.encode, so it is not "missing". It is in the sense that strings support this method and Unicode currently doesn't. > > One very useful application for this method is XML unescaping > > which turns numeric XML entities into Unicode chars. > > Ok. Please show me how that would work. More precisely, please write a > PEP describing the rationale for this feature, including use case > examples and precise semantics of the proposed addition. There's no need for a PEP. This addition is much too simple to require a PEP on its own. As for use cases: I have already given a whole bunch of them (Unicode compression, normalization, escaping in various ways). Codecs are in no way constrained to only interface between strings and Unicode. There are many other possibilities for their usage out there. Just look at the latest checkins for a bunch of string-string codecs for examples of codecs which solve common real-life problems and do not interface to Unicode. > > The key argument for these interfaces is that they provide > > an extensible transformation mechanism for string and binary > > data. > > That is too general for me to understand; I need to see detailed > examples that solve real-world problems. > > Regards, > Martin > > P.S. I don't think that unescaping XML characters entities into > Unicode characters is a useful application in itself. This is normally > done by the XML parser, which not only has to deal with character > entities, but also with general entities and a lot of other markup. > Very few people write XML parsers, and they are using the string > methods and the sre module successfully (if the parser is written in > Python - a C parser would do the unescaping before even passing the > text to Python). True, but not all XML text out there is meant for XML parsers to read ;-). Preprocessing of e.g. XML text in Python is a rather common thing to do and this is what the direct codec access methods are meant for. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue Jun 12 12:46:36 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:46:36 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> mal wrote: > > Ok. Please show me how that would work. More precisely, please write a > > PEP describing the rationale for this feature, including use case > > examples and precise semantics of the proposed addition. > > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. we'd been better off if you'd written a PEP before you started adding decode and encode stuff. what's currently implemented is ugly enough; adding more warts won't make it any prettier. -1 on anything except a PEP that covers *all* aspects of encode/decode (including things that are already implemented) From fredrik at pythonware.com Tue Jun 12 12:47:49 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:47:49 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> Message-ID: <00ba01c0f32d$208d4160$0900a8c0@spiff> Thomas Wouters wrote: > > print "spam:", "ham" : "and" : "eggs" > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. and "+" (or plain whitespace) instead of ":", right? From fredrik at pythonware.com Tue Jun 12 12:55:27 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 12:55:27 +0200 Subject: [Python-Dev] RE: PEP 259: Omit printing newline after newline References: <31575A892FF6D1118F5800600846864D78BD25@intrepid> Message-ID: <00c301c0f32e$31cd7ed0$0900a8c0@spiff> simon wrote: > > > In order to avoid having to add yet another magic variable to file > > objects, I propose to give the existing 'softspace' variable an > > extra meaning: a negative value will mean "the last data written > > ended in a newline so no space *or* newline is required." > > Better another magic variable than a magic value for an old one, I think. many file-like C types (e.g. cStringIO) already have special code to deal with a softspace integer attribute. From mal at lemburg.com Tue Jun 12 12:57:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 12:57:32 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <3B25F59C.9AAF604A@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Ok. Please show me how that would work. More precisely, please write a > > > PEP describing the rationale for this feature, including use case > > > examples and precise semantics of the proposed addition. > > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > we'd been better off if you'd written a PEP before you started > adding decode and encode stuff. what's currently implemented > is ugly enough; adding more warts won't make it any prettier. Could you please be more specific about what is "ugly" in the current implementation ? The .encode/.decode methods are a direct interface to the codecs encoder and decoder APIs. I can't find anything ugly about this in general except maybe some of the constraints which were originally put into these interface on the grounds of using them for string/Unicode conversions -- I have already removed most of these and would like to clean this up completely before 2.2 gets out. > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Gee, Guido starts breaking code and nobody objects; I try to clean up some left-overs in the Unicode implementation and people start huge discussions about it. Something is backwards here... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 13:00:40 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 13:00:40 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B25EB41.807C2C51@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> > > > str.encode() > > > str.decode() > > > uni.encode() > > > #uni.decode() # still missing > > > > It's not missing. str.decode and uni.encode go through a single codec; > > that's easy. str.encode is somewhat more confusing, because it really > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > str(uni).decode, are you? > > No. uni.decode() will (just like the other methods) directly > interface to the codecs decoder -- there is no magic conversion > involved. It is meant to be used by Unicode-Unicode codecs When invoking "Hallo".encode("utf-8"), two conversions are executed: first the default decoding into Unicode, then the UTF-8 encoding. Of course, that is not the intended use (but then, is the intended use documented anywhere?): instead, people should write "Hallo".encode("base64") instead. This is an example I can understand, although I'm not sure why it is inherently better to write this instead of writing base64.encodestring("Hallo"). > > If not that, what else would it mean? And if it means something else, > > it is clearly not symmetric to str.encode, so it is not "missing". > > It is in the sense that strings support this method and Unicode > currently doesn't. The rationale for string.encode is weak: it argues that string->string conversions are frequent enough to justify this API, even though these conversions have nothing to do with coded character sets. So far, I can see *no* rationale for unicode.decode. > There's no need for a PEP. This addition is much too simple > to require a PEP on its own. PEP 1 says: # We intend PEPs to be the primary mechanisms for proposing new # features, for collecting community input on an issue, and for # documenting the design decisions that have gone into Python. The # PEP author is responsible for building consensus within the # community and documenting dissenting opinions. So we have a proposal for a new feature, and we have dissenting opinions. Who are you to decide that this additions is too simple to require a PEP on its own? > As for use cases: I have already given a whole bunch of them > (Unicode compression, normalization, escaping in various ways). I was asking for specific examples: Names of specific codecs that you want to implement, and application code fragments using these specific codecs. I don't know how to use Unicode compression if I had such this proposed feature, for example. I know what XML escaping is, and I cannot see how this feature would help. > True, but not all XML text out there is meant for XML parsers to > read ;-). Preprocessing of e.g. XML text in Python is a rather common > thing to do and this is what the direct codec access methods are > meant for. Can you give an example of an application which processes XML without a parser, but with converting character entities (preferably open-source, so I can study its code)? I wonder whether they get CDATA sections right... MAL, I really mean that: Please don't make claims that something is common or useful without giving an *exact* example. Regards, Martin P.S. This insistence on adding Unicode and string methods makes it appear as if the author of the codecs module now thinks that the API of it sucks. From thomas at xs4all.net Tue Jun 12 13:16:05 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 12 Jun 2001 13:16:05 +0200 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <00ba01c0f32d$208d4160$0900a8c0@spiff> References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <20010612103330.D690@xs4all.nl> <20010612120203.E690@xs4all.nl> <00ba01c0f32d$208d4160$0900a8c0@spiff> Message-ID: <20010612131605.Q22849@xs4all.nl> On Tue, Jun 12, 2001 at 12:47:49PM +0200, Fredrik Lundh wrote: > Thomas Wouters wrote: > > > print "spam:", "ham" : "and" : "eggs" > > > > > to print 'spamandeggs' without a space inbetween. > > Err. I meant "hamandeggs" with no space inbetween. > and "+" (or plain whitespace) instead of ":", right? Not really. That would only work for string-types. Print auto-converts, remember ? At least the ':' is unambiguous. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue Jun 12 13:42:31 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 12 Jun 2001 13:42:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> Message-ID: <3B260027.7DD33246@lemburg.com> "Martin v. Loewis" wrote: > > > > > str.encode() > > > > str.decode() > > > > uni.encode() > > > > #uni.decode() # still missing > > > > > > It's not missing. str.decode and uni.encode go through a single codec; > > > that's easy. str.encode is somewhat more confusing, because it really > > > is unicode(str).encode. Now, you are not proposing that uni.decode is > > > str(uni).decode, are you? > > > > No. uni.decode() will (just like the other methods) directly > > interface to the codecs decoder -- there is no magic conversion > > involved. It is meant to be used by Unicode-Unicode codecs > > When invoking "Hallo".encode("utf-8"), two conversions are executed: > first the default decoding into Unicode, then the UTF-8 encoding. Of > course, that is not the intended use (but then, is the intended use > documented anywhere?): instead, people should write > "Hallo".encode("base64") instead. This is an example I can understand, > although I'm not sure why it is inherently better to write this > instead of writing base64.encodestring("Hallo"). Please note that the conversion from string to Unicode is done by the codec, not the .encode() interface. > > > If not that, what else would it mean? And if it means something else, > > > it is clearly not symmetric to str.encode, so it is not "missing". > > > > It is in the sense that strings support this method and Unicode > > currently doesn't. > > The rationale for string.encode is weak: it argues that string->string > conversions are frequent enough to justify this API, even though these > conversions have nothing to do with coded character sets. You still don't get it: codecs can be used for much more than just character set conversion ! > So far, I can see *no* rationale for unicode.decode. > > > There's no need for a PEP. This addition is much too simple > > to require a PEP on its own. > > PEP 1 says: > > # We intend PEPs to be the primary mechanisms for proposing new > # features, for collecting community input on an issue, and for > # documenting the design decisions that have gone into Python. The > # PEP author is responsible for building consensus within the > # community and documenting dissenting opinions. > > So we have a proposal for a new feature, and we have dissenting > opinions. Who are you to decide that this additions is too simple to > require a PEP on its own? So you want a PEP for each and every small addition to in the core ?! (I am not talking about features which might break code !) > > As for use cases: I have already given a whole bunch of them > > (Unicode compression, normalization, escaping in various ways). > > I was asking for specific examples: Names of specific codecs that you > want to implement, and application code fragments using these specific > codecs. I don't know how to use Unicode compression if I had such this > proposed feature, for example. I know what XML escaping is, and I > cannot see how this feature would help. I think I have given enough examples in this thread already. See below for some more. > > True, but not all XML text out there is meant for XML parsers to > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > thing to do and this is what the direct codec access methods are > > meant for. > > Can you give an example of an application which processes XML without > a parser, but with converting character entities (preferably > open-source, so I can study its code)? I wonder whether they get CDATA > sections right... MAL, I really mean that: Please don't make claims > that something is common or useful without giving an *exact* example. Yes, I am using these feature in real code and no, I can't show it to you because it's closed source. XML is only one example where this would be useful, HTML is another text format which would benefit from it, URL encoding is yet another application. You basically find these applications in all situations where some form of escaping is needed. What I am trying to do here is simplify codec access and usage for the casual user. .encode() and .decode() are very intuitive ways to deal with data transformation, IMHO. > Regards, > Martin > > P.S. This insistence on adding Unicode and string methods makes it > appear as if the author of the codecs module now thinks that the API > of it sucks. No comment. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Tue Jun 12 16:22:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:22:26 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> Message-ID: <15142.9634.842402.241225@anthem.wooz.org> >>>>> "M" == M writes: M> Codecs are in no way constrained to only interface between M> strings and Unicode. There are many other possibilities for M> their usage out there. Just look at the latest checkins for a M> bunch of string-string codecs for examples of codecs which M> solve common real-life problems and do not interface to M> Unicode. Having just followed this thread tangentially, I do have to say it seems quite cool to be able to do something like the following in Python 2.2: >>> s = msg['from'] >>> parts = s.split('?') >>> if parts[2].lower() == 'q': ... name = parts[3].decode('quopri') ... elif parts[2].lower() == 'b': ... name = parts[3].decode('base64') ... -Barry From fredrik at pythonware.com Tue Jun 12 16:45:16 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 12 Jun 2001 16:45:16 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> barry wrote: > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') uhuh? and how exactly is this cooler than being able to do something like the following: import quopri, base64 s = msg['from'] parts = s.split('?') if parts[2].lower() == 'q': name = quopri.decodestring(parts[3]) elif parts[2].lower() == 'b': name = base64.decodestring(parts[3]) (going through the codec registry is slower, and imports more modules, but what's so cool with that?) From barry at digicool.com Tue Jun 12 16:50:01 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 10:50:01 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <15142.11289.16053.424966@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> uhuh? and how exactly is this cooler than being able to do FL> something like the following: | import quopri, base64 | s = msg['from'] | parts = s.split('?') | if parts[2].lower() == 'q': | name = quopri.decodestring(parts[3]) | elif parts[2].lower() == 'b': | name = base64.decodestring(parts[3]) FL> (going through the codec registry is slower, and imports more FL> modules, but what's so cool with that?) -------------------- snip snip -------------------- Python 2.2a0 (#4, Jun 6 2001, 13:03:36) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import quopri >>> quopri.decodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'decodestring' >>> quopri.encodestring Traceback (most recent call last): File "", line 1, in ? AttributeError: 'quopri' module has no attribute 'encodestring' -------------------- snip snip -------------------- Much cooler :) Okay, okay, so we /could/ add encodestring/decodestring to quopri.py, which isn't a bad idea. But it seems to me that the s.encode() s.decode() API is nicely universal for any supported encoding. but-what-do-i-know?-ly y'rs, -Barry From skip at pobox.com Tue Jun 12 17:32:11 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 12 Jun 2001 10:32:11 -0500 Subject: [Python-Dev] Re: metaclasses -- aka Don Beaudry hook/hack In-Reply-To: References: Message-ID: <15142.13819.477491.993419@beluga.mojam.com> James> Before I head too deeply into Zope dependencies, I would be James> interested in knowing whether or not "type(MyClass) == James> types.ClassType" and "isinstance(myInstance,MyClass)" work for James> classes derived from ExtensionClass. Straight from the horse's mouth: >>> type(gtk.GtkButton) >>> type(gtk.GtkButton) == types.ClassType 0 >>> isinstance(gtk.GtkButton(), gtk.GtkButton) 1 James> (And if so, why do these work for C extension classes using the James> Don Beaudry hook but not for Python classes using the same hook?) You'll have to ask someone with more subject knowledge. (Don would probably be a good start. ;-) I've cc'd python-dev because the experts in this area are all there. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From skip at pobox.com Tue Jun 12 17:53:24 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 12 Jun 2001 10:53:24 -0500 Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> Message-ID: <15142.15092.57490.275201@beluga.mojam.com> Tim> The notion that legions of people are using Tim> print line Tim> as an obscure way to get double-spacing is taking me by surprise. Tim> Nobody on the iterators list had this objection. I suspect that most CGI scripts that didn't use any abstraction for HTTP responses suffer from this potential problem. I've been using one abstraction or another for quite awhile now, but I still have a few CGI scripts laying around that still use print to emit headers and bodies of HTTP responses. Skip From barry at digicool.com Tue Jun 12 18:06:53 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 12:06:53 -0400 Subject: [Python-Dev] PEP 259: Omit printing newline after newline References: <004c01c0f307$fb0540c0$4ffa42d5@hagrid> <15142.15092.57490.275201@beluga.mojam.com> Message-ID: <15142.15901.223641.151562@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> I suspect that most CGI scripts that didn't use any SM> abstraction for HTTP responses suffer from this potential SM> problem. I've been using one abstraction or another for quite SM> awhile now, but I still have a few CGI scripts laying around SM> that still use print to emit headers and bodies of HTTP SM> responses. Same here. From paulp at ActiveState.com Tue Jun 12 19:22:31 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:22:31 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <3B264FD7.86ACB034@ActiveState.com> "Barry A. Warsaw" wrote: > >... > > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... I think that the central point is that if code like the above is useful and supported then it needs to be the same for Unicode strings as for 8-bit strings. If the code above is NOT useful and should NOT be supported then we need to undo it before 2.2 ships. This unicode.decode argument is just a proxy for the real argument about the above. I don't feel strongly one way or another about this (ab?)use of the codecs concept, myself, but I do feel strongly that Unicode strings should behave as much as possible like 8-bit strings. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Tue Jun 12 19:31:54 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 10:31:54 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de><3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <004b01c0f34e$4d3e96c0$4ffa42d5@hagrid> Message-ID: <3B26520A.C579D00C@ActiveState.com> Fredrik Lundh wrote: > >... > > uhuh? and how exactly is this cooler than being able to do > something like the following: > > import quopri, base64 >... > > (going through the codec registry is slower, and imports more > modules, but what's so cool with that?) One argument in favor is that the base64 and quopri modules are not standardized today. In fact, Python has a huge problem with standardization of access paradigms in the standard library. We get the best standardization (i.e. of the "file interface") when we force module authors to conform to a standard in order to get some "extra feature" of the standard library. A counter argument is that the conflation of the concept of Unicode encoding/decoding and other forms of encoding/decoding could be confusing. MAL would not have to keep pointing out that "codecs are for more than Unicode encoding/decoding" if it was obvious. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From barry at digicool.com Tue Jun 12 20:24:25 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:24:25 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <15142.24153.921774.610559@anthem.wooz.org> >>>>> "PP" == Paul Prescod writes: PP> I don't feel strongly one way or another about this (ab?)use PP> of the codecs concept, myself, but I do feel strongly that PP> Unicode strings should behave as much as possible like 8-bit PP> strings. I'd agree with both statements. time-to-add-{encode,decode}string()-to-quopri-ly y'rs, -Barry From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:00:19 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:00:19 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B260027.7DD33246@lemburg.com> (mal@lemburg.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <200106121100.f5CB0em03009@mira.informatik.hu-berlin.de> <3B260027.7DD33246@lemburg.com> Message-ID: <200106121800.f5CI0Jw00946@mira.informatik.hu-berlin.de> > > So we have a proposal for a new feature, and we have dissenting > > opinions. Who are you to decide that this additions is too simple to > > require a PEP on its own? > > So you want a PEP for each and every small addition to in the > core ?! (I am not talking about features which might break code !) No, additions that find immediate consent and come with complete patches (including documentation and test cases) don't need this overhead. Features that find resistance should go through the full process. > > I was asking for specific examples: Names of specific codecs that you > > want to implement, and application code fragments using these specific > > codecs. I don't know how to use Unicode compression if I had such this > > proposed feature, for example. I know what XML escaping is, and I > > cannot see how this feature would help. > > I think I have given enough examples in this thread already. See > below for some more. I haven't seen a single example involving actual Python code. > > > True, but not all XML text out there is meant for XML parsers to > > > read ;-). Preprocessing of e.g. XML text in Python is a rather common > > > thing to do and this is what the direct codec access methods are > > > meant for. > > > > Can you give an example of an application [...] > > Yes, I am using these feature in real code and no, I can't show it to > you because it's closed source. Not very convincing... If this is "a rather common thing to do", it shouldn't be hard to find examples in other people's code, shouldn't it? > XML is only one example where this would be useful, HTML is another > text format which would benefit from it, URL encoding is yet another > application. You basically find these applications in all situations > where some form of escaping is needed. These are all not specific examples. I'm still looking for a specific application that might use this feature, and specific codec names and implementations. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:08:31 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:08:31 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.9634.842402.241225@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> Message-ID: <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> > Having just followed this thread tangentially, I do have to say it > seems quite cool to be able to do something like the following in > Python 2.2: > > >>> s = msg['from'] > >>> parts = s.split('?') > >>> if parts[2].lower() == 'q': > ... name = parts[3].decode('quopri') > ... elif parts[2].lower() == 'b': > ... name = parts[3].decode('base64') > ... What is the type of parts[3] here? If it is a plain string, it is already possible: >>> 'SGVsbG8=\n'.decode("base64") 'Hello' I doubt you'd ever have a Unicode string that represents a base64-encoded byte string, and if you had, .decode would probably do the wrong thing: >>> import codecs >>> enc,dec,_,_ = codecs.lookup("base64") >>> dec(u'SGVsbG8=\n') ('Hello', 9) Note that this returns a byte string, not a Unicode string. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 20:18:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 20:18:45 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B264FD7.86ACB034@ActiveState.com> (message from Paul Prescod on Tue, 12 Jun 2001 10:22:31 -0700) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> Message-ID: <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> > > Having just followed this thread tangentially, I do have to say it > > seems quite cool to be able to do something like the following in > > Python 2.2: > > > > >>> s = msg['from'] > > >>> parts = s.split('?') > > >>> if parts[2].lower() == 'q': > > ... name = parts[3].decode('quopri') > > ... elif parts[2].lower() == 'b': > > ... name = parts[3].decode('base64') > > ... > > I think that the central point is that if code like the above is useful > and supported then it needs to be the same for Unicode strings as for > 8-bit strings. Why is that? An encoding, by nature, is something that produces a byte sequence from some input. So you can only decode byte sequences, not character strings. > If the code above is NOT useful and should NOT be supported then we > need to undo it before 2.2 ships. This unicode.decode argument is > just a proxy for the real argument about the above. No, it isn't. The code is useful for byte strings, but not for Unicode strings. > I don't feel strongly one way or another about this (ab?)use of the > codecs concept, myself, but I do feel strongly that Unicode strings > should behave as much as possible like 8-bit strings. Not at all. Byte strings and character strings are as different as are byte strings and lists of DOM child nodes (i.e. the only common thing is that they are sequences). Regards, Martin From barry at digicool.com Tue Jun 12 20:35:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 12 Jun 2001 14:35:10 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> Message-ID: <15142.24798.941322.762791@anthem.wooz.org> >>>>> "MvL" == Martin v Loewis writes: MvL> What is the type of parts[3] here? If it is a plain string, MvL> it is already possible: >> 'SGVsbG8=\n'.decode("base64") MvL> 'Hello' But only in Python 2.2a0 currently, right? And yes, the type is plain string. MvL> I doubt you'd ever have a Unicode string that represents a MvL> base64-encoded byte string, and if you had, .decode would MvL> probably do the wrong thing: >> import codecs enc,dec,_,_ = codecs.lookup("base64") >> dec(u'SGVsbG8=\n') MvL> ('Hello', 9) MvL> Note that this returns a byte string, not a Unicode string. I trust you on that. ;) I've only played with this tangentially since this thread cropped up. -Barry From paulp at ActiveState.com Tue Jun 12 20:51:25 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 11:51:25 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <3B264FD7.86ACB034@ActiveState.com> <200106121818.f5CIIjj00983@mira.informatik.hu-berlin.de> Message-ID: <3B2664AD.B560D685@ActiveState.com> "Martin v. Loewis" wrote: > >... > > Why is that? An encoding, by nature, is something that produces a byte > sequence from some input. So you can only decode byte sequences, not > character strings. According to this logic, it is not logical to "encode" a Unicode string into a base64'd Unicode string or "decode" a Unicode string from a base64'd Unicode string. But I have seen circumstances where one XML document is base64'd into another. In that circumstance, it would be useful to say node.nodeValue.decode("base64"). Let me turn the argument around? What would the *harm* in having 8-bit strings and Unicode strings behave similarly in this manner? >... > Not at all. Byte strings and character strings are as different as are > byte strings and lists of DOM child nodes (i.e. the only common thing > is that they are sequences). 8-bit strings are not purely byte strings. They are also "character strings". That's why they have methods like "capitalize", "isalpha", "lower", "swapcase", "title" and so forth. DOM nodes and byte strings have virtually no methods in common. We could argue angels on the head of a pin until the cows come home but 90% of all Python users think of 8-bit strings as strings of characters. So arguments based on the idea that they are not "really" character strings are wishful thinking. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From martin at loewis.home.cs.tu-berlin.de Tue Jun 12 22:01:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 12 Jun 2001 22:01:39 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <15142.24798.941322.762791@anthem.wooz.org> (barry@digicool.com) References: <200106120942.f5C9gqT02418@mira.informatik.hu-berlin.de> <3B25EB41.807C2C51@lemburg.com> <15142.9634.842402.241225@anthem.wooz.org> <200106121808.f5CI8VO00950@mira.informatik.hu-berlin.de> <15142.24798.941322.762791@anthem.wooz.org> Message-ID: <200106122001.f5CK1de01350@mira.informatik.hu-berlin.de> > MvL> What is the type of parts[3] here? If it is a plain string, > MvL> it is already possible: > > >> 'SGVsbG8=\n'.decode("base64") > MvL> 'Hello' > > But only in Python 2.2a0 currently, right? Exactly, since MAL's last patch. If people think that byte strings must behave exactly as Unicode strings, I'd rather prefer to back out this patch instead of adding unicode.decode. Personally, I think the status quo is fine and should not be changed. Regards, Martin From aahz at rahul.net Wed Jun 13 01:48:14 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 12 Jun 2001 16:48:14 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B25C62C.969B40B3@lemburg.com> from "M.-A. Lemburg" at Jun 12, 2001 09:35:08 AM Message-ID: <20010612234815.2C90599C82@waltz.rahul.net> M.-A. Lemburg wrote: > Aahz Maruch wrote: >> M.-A. Lemburg wrote: >>> >>> Tamito KAJIYAMA recently announced that he changed the licenses >>> on his Japanese codecs from GPL to a BSD variant. This is great >>> news since this would allow adding the codecs to the Python core >>> which would certainly attract more users to Python in Asia. >>> >>> The codecs are 280kB when compressed as .tar.gz file. >> >> +0 >> >> I like the idea, am uncomfortable with that amount of space. > > Tamito corrected me about the size (his file includes the .pyc > byte code files): the correct size for the sources is 143kB -- > almost half of what I initially wrote. That makes me +0.5, possibly a bit higher. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From greg at cosc.canterbury.ac.nz Wed Jun 13 01:57:35 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 11:57:35 +1200 (NZST) Subject: [Python-Dev] PEP 259: Omit printing newline after newline In-Reply-To: <20010612103330.D690@xs4all.nl> Message-ID: <200106122357.LAA03316@s454.cosc.canterbury.ac.nz> Thomas Wouters : > I'd also prefer special syntax to control the softspace > behaviour... Too late for that, I 'spose Maybe not. I'd suggest spelling "don't add a newline or a space after this" as: print a, b, c... This could coexist with the current softspace behaviour, and the use of a trailing comma could be deprecated. After a suitable warning period, the softspace flag could then be removed. > print "spam:", "ham" : "and" : "eggs" > to print 'spamandeggs' without a space inbetween. I don't think it's so important to have a special syntax for that, since it can be accomplished in other ways without too much difficulty, e.g. print "%s: %s%s%s" % ("spam", "ham", "and", "eggs")... The main thing I'd like is to get rid of the statefulness of the current behaviour. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Wed Jun 13 02:02:40 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 12:02:40 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <00aa01c0f32c$f4a4b740$0900a8c0@spiff> Message-ID: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> > -1 on anything except a PEP that covers *all* aspects of > encode/decode (including things that are already implemented) Particularly, it should clearly explain why we need a completely new and separate namespace mechanism for these codec things, and provide a firm rationale for deciding whether any proposed new form of encoding or decoding should be placed in this namespace or the module namespace. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From paulp at ActiveState.com Wed Jun 13 02:32:17 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 17:32:17 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B26B491.CA8536BD@ActiveState.com> Aahz Maruch wrote: > >.... > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We really shouldn't consider the Japanese without Chinese and Korean. And those both seem *larger* than the Japanese. :( What if we add them to CVS and formally maintain them as part of the core but distribute them as a separate download? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Wed Jun 13 04:25:23 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:25:23 -0700 Subject: [Python-Dev] Pure Python strptime Message-ID: <3B26CF13.2A337AC6@ActiveState.com> Should this strptime implementation be added to the standard library? http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/56036 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From paulp at ActiveState.com Wed Jun 13 04:41:53 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 12 Jun 2001 19:41:53 -0700 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> Message-ID: <3B26D2F1.8840FB1A@ActiveState.com> Greg Ewing wrote: > > > -1 on anything except a PEP that covers *all* aspects of > > encode/decode (including things that are already implemented) > > Particularly, it should clearly explain why we need a > completely new and separate namespace mechanism for these > codec things, I don't know whether MAL will write the PEP or not but the rationale for a new namespace is trivial. The namespace exists and is maintained by the Internet Assigned Names Association. You can't work with Unicode without working with names from this list: http://www.iana.org/assignments/character-sets MAL is basically exending it to include names from this list: http://www.iana.org/assignments/transfer-encodings and others. > and provide a firm rationale for deciding > whether any proposed new form of encoding or decoding > should be placed in this namespace or the module namespace. *My* answer would be that any function that has strings (8-bit or Unicode) as both domain and range is potentially a codec. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Wed Jun 13 06:45:36 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 13 Jun 2001 16:45:36 +1200 (NZST) Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <200106130445.QAA03370@s454.cosc.canterbury.ac.nz> Paul Prescod : > The namespace exists and is maintained by > the Internet Assigned Names Association. Hmmm... so, is the only reason that we're not using the module namespace the fact that these names can contain non-alphanumeric characters? Or is there more to it than that? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From skip at pobox.com Wed Jun 13 07:09:38 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 00:09:38 -0500 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B26B491.CA8536BD@ActiveState.com> References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <15142.62866.180570.158325@beluga.mojam.com> Paul> What if we add them to CVS and formally maintain them as part of Paul> the core but distribute them as a separate download? That seems to make sense to me. I suspect most Linux distributions (for example) bundle Python into multiple pieces already. My Mandrake system splits the core into (I think) four pieces. It also bundles several other RPMs for PIL, NumPy, Postgres and RPM. Adding another package for a set of codecs doesn't seem like a big deal. Skip From mal at lemburg.com Wed Jun 13 09:02:05 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:02:05 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> Message-ID: <3B270FED.8E2A4ECB@lemburg.com> Aahz Maruch wrote: > > M.-A. Lemburg wrote: > > Aahz Maruch wrote: > >> M.-A. Lemburg wrote: > >>> > >>> Tamito KAJIYAMA recently announced that he changed the licenses > >>> on his Japanese codecs from GPL to a BSD variant. This is great > >>> news since this would allow adding the codecs to the Python core > >>> which would certainly attract more users to Python in Asia. > >>> > >>> The codecs are 280kB when compressed as .tar.gz file. > >> > >> +0 > >> > >> I like the idea, am uncomfortable with that amount of space. > > > > Tamito corrected me about the size (his file includes the .pyc > > byte code files): the correct size for the sources is 143kB -- > > almost half of what I initially wrote. > > That makes me +0.5, possibly a bit higher. We will be working on reducing the size of the mapping tables. Can't promise anything, but I believe that Tamito can squeeze them into under 100k using some compression technique (which one is yet to be determined ;). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 09:05:31 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:05:31 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> Message-ID: <3B2710BB.CFD8215@lemburg.com> Paul Prescod wrote: > > Aahz Maruch wrote: > > > >.... > > > > > > Tamito corrected me about the size (his file includes the .pyc > > > byte code files): the correct size for the sources is 143kB -- > > > almost half of what I initially wrote. > > > > That makes me +0.5, possibly a bit higher. > > We really shouldn't consider the Japanese without Chinese and Korean. > And those both seem *larger* than the Japanese. :( Unfortunately, these aren't available under a usable (=non-GPL) license yet. > What if we add them to CVS and formally maintain them as part of the > core but distribute them as a separate download? Good idea. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 09:17:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 09:17:14 +0200 Subject: [Python-Dev] Adding .decode() method to Unicode References: <200106130002.MAA03319@s454.cosc.canterbury.ac.nz> <3B26D2F1.8840FB1A@ActiveState.com> Message-ID: <3B27137A.E7BFC4EC@lemburg.com> Paul Prescod wrote: > > Greg Ewing wrote: > > > > > -1 on anything except a PEP that covers *all* aspects of > > > encode/decode (including things that are already implemented) > > > > Particularly, it should clearly explain why we need a > > completely new and separate namespace mechanism for these > > codec things, > > I don't know whether MAL will write the PEP or not With the kind of attitude towards the proposed extensions which I am currently getting in this forum, I'd rather spend my time on something more useful. > but the rationale for > a new namespace is trivial. The namespace exists and is maintained by > the Internet Assigned Names Association. You can't work with Unicode > without working with names from this list: > > http://www.iana.org/assignments/character-sets > > MAL is basically exending it to include names from this list: > > http://www.iana.org/assignments/transfer-encodings > > and others. Right. Since these codecs live in the encoding package, I don't think we have a namespace problem here. Codecs which are hooked into the codec registry by the encoding package's search function will have to provide a getregentry() entry point. If this API is not available, the codec won't load. Since the encoding package's search function is using standard Python imports for loading the codecs, we can also benefit from a nice side-effect: codec names can use Python's dotted names (which then map to standard Python packages). This allows codec writers like Tamito to place their codecs into Python package thereby avoiding any conflict with other authors of codecs with similar names. > > and provide a firm rationale for deciding > > whether any proposed new form of encoding or decoding > > should be placed in this namespace or the module namespace. > > *My* answer would be that any function that has strings (8-bit or > Unicode) as both domain and range is potentially a codec. Right. (Hey, the first time *we* agree on something ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed Jun 13 14:53:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 13 Jun 2001 14:53:50 +0200 Subject: [Python-Dev] Weird message to stderr Message-ID: <3B27625E.F18046F7@lemburg.com> Running Python 2.1 using a .pyc file I get these weird messages printed to stderr: run_pyc_file: nested_scopes: 0 These originate in pythonrun.c: static PyObject * run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) { PyCodeObject *co; PyObject *v; long magic; long PyImport_GetMagicNumber(void); magic = PyMarshal_ReadLongFromFile(fp); if (magic != PyImport_GetMagicNumber()) { PyErr_SetString(PyExc_RuntimeError, "Bad magic number in .pyc file"); return NULL; } (void) PyMarshal_ReadLongFromFile(fp); v = PyMarshal_ReadLastObjectFromFile(fp); fclose(fp); if (v == NULL || !PyCode_Check(v)) { Py_XDECREF(v); PyErr_SetString(PyExc_RuntimeError, "Bad code object in .pyc file"); return NULL; } co = (PyCodeObject *)v; v = PyEval_EvalCode(co, globals, locals); if (v && flags) { if (co->co_flags & CO_NESTED) flags->cf_nested_scopes = 1; fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", flags->cf_nested_scopes); } Py_DECREF(co); return v; } Is this is left over debug printf or should I be warned in some way ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed Jun 13 16:41:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 10:41:37 -0400 Subject: [Python-Dev] Re: Adding .decode() method to Unicode In-Reply-To: Your message of "Tue, 12 Jun 2001 22:40:01 EDT." References: Message-ID: <200106131441.KAA16557@cj20424-a.reston1.va.home.com> Wow, this almost looks like a real flamefest. ("Flame" being defined as the presence of metacomments.) (In the following, s is an 8-bit string, u is a Unicode string, and e is an encoding name.) The original design of the encode() methods of string and Unicode objects (in 2.0 and 2.1) is asymmetric, and clearly geared towards Unicode codecs only: to decode an 8-bit string you *have* to use unicode(s, encoding) while to encode a Unicode string into a specific 8-bit encoding you *have* to use u.encode(e). 8-bit strings also have an encode() method: s.encode(e) is the same as unicode(s).encode(e). (This is useful since code that expects Unicode strings should also work when it is passed ASCII-encoded 8-bit strings.) I'd say there's no need for s.decode(e), since this can already be done with unicode(s, e) -- and to me that API looks better since it clearly states that the result is Unicode. We *could* have designed the encoding API similarly: str(u, e) is available, symmetric with unicode(s, e), and a logical extension of str(u) which uses the default encoding. But I accept the argument that u.encode(e) is better because it emphasizes the encoding action, and because it means no API changes to str(). I guess what I'm saying here is that 'str' does not give enough of a clue that an encoding action is going on, while 'unicode' *does* give a clue that a decoding action is being done: as soon as you read "Unicode" you think "Mmm, encodings..." -- but "str" is pretty neutral, so u.encode(e) is needed to give a clue. Marc-Andre proposes (and has partially checked in) changes that stretch the meaning of the encode() method, and add a decode() method, to be basically interfaces to anything you can do with the codecs module. The return type of encode() and decode() is now determined by the codec (formerly, encode() always returned an 8-bit string). Some new codecs have been added that do things like gzip and base64. Initially, I liked this, and even contributed a codec. But questions keep coming up. What is the problem being solved? True, the codecs module has a clumsy interface if you just want to invoke a codec on some data. But that can easily be remedied by adding convenience functions encode() and decode() to codecs.py -- which would have the added advantage that it would work for other datatypes that support the buffer interface, e.g. codecs.encode(myPILobject, "base64"). True, the "codec" pattern can be used for other encodings than Unicode. But it seems to me that the entire codecs architecture is rather strongly geared towards en/decoding Unicode, and it's not clear how well other codecs fit in this pattern (e.g. I noticed that all the non-Unicode codecs ignore the error handling parameter or assert that it is set to 'strict'). Is it really right that x.encode("gzip") and x.encode("utf-8") look similar, while the former requires an 8-bit string and the latter only makes sense if x is a Unicode string? Another (minor) issue is that Unicode encoding names are an IANA namespace. Is it wise to add our own names to this? I'm not forcing a decision here, but I do ask that we consider these issues before forging ahead with what might be a mistake. A PEP would be most helpful to focus the discussion. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed Jun 13 17:19:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 11:19:03 -0400 Subject: [Python-Dev] Releasing 2.0.1 Message-ID: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> I think it's now or never with the 2.0.1 release. Moshe seems to have disappeared from the face of the earth. His last mail to me (May 23) suggested that it was good to go except for the SRE checkin and the NEWS file. I did the SRE checkin today (making it identical to what's in 2.1, per /F's recommendation) and added a note about that to the NEWS file -- I wouldn't know what else would be needed there. So I think it's good to go now. I can release a 2.0.1c1 this week (indicating a release candidate) and a final 2.0.1 next week. If you know a good reason why I should hold off on releasing this, or if you have a patch that absolutely should make it into 2.0.1, please let me know NOW! This project is way overdue. (Thomas is ready to release 2.1.1 as soon as this goes out, I believe. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 13 17:29:19 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 17:29:19 +0200 Subject: [Python-Dev] Releasing 2.0.1 References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <023f01c0f41d$9dfb87b0$0900a8c0@spiff> guido wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 From skip at pobox.com Wed Jun 13 17:49:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 10:49:58 -0500 Subject: [Python-Dev] on announcing point releases Message-ID: <15143.35750.837420.376281@beluga.mojam.com> (Just thinking out loud) I wonder if it would help gain wider distribution for the point releases if explicit announcements were sent to the various Linux distributors so they could create updated packages (RPMs, debs, whatever) for their users. On a related note, I see one RedHat email address on python-dev (and one Debian address on python-list). Are there other Linux distributions that are heavy Python users (as opposed to simply packaging it up for inclusion)? If so, perhaps they should be invited to join python-dev. Skip From niemeyer at conectiva.com Wed Jun 13 17:54:08 2001 From: niemeyer at conectiva.com (Gustavo Niemeyer) Date: Wed, 13 Jun 2001 12:54:08 -0300 Subject: [Python-Dev] sre improvements Message-ID: <20010613125408.W13940@tux.distro.conectiva> I'm forwarding this to the dev list.. probably somebody here knows about this... -------------- Hi there!! I have looked into sre, and was wondering if somebody is working to implement more features in it. I'd like, for example, to see the (?(1)blah) operator, available in perl, working. Should I care about this? Should I write some code?? Anybody working in sre currently? Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From skip at pobox.com Wed Jun 13 18:03:58 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 13 Jun 2001 11:03:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <20010613125408.W13940@tux.distro.conectiva> References: <20010613125408.W13940@tux.distro.conectiva> Message-ID: <15143.36590.447465.657241@beluga.mojam.com> Gustavo> I'd like, for example, to see the (?(1)blah) operator, Gustavo> available in perl, working. Gustavo, For the non-Perl-heads on the list, can you explain what the (?(1)blah) operator does? -- Skip Montanaro (skip at pobox.com) (847)971-7098 From gregor at mediasupervision.de Wed Jun 13 18:13:17 2001 From: gregor at mediasupervision.de (Gregor Hoffleit) Date: Wed, 13 Jun 2001 18:13:17 +0200 Subject: [Python-Dev] on announcing point releases In-Reply-To: <15143.35750.837420.376281@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 10:49:58AM -0500 References: <15143.35750.837420.376281@beluga.mojam.com> Message-ID: <20010613181317.B30006@mediasupervision.de> On Wed, Jun 13, 2001 at 10:49:58AM -0500, Skip Montanaro wrote: > I wonder if it would help gain wider distribution for the point releases if > explicit announcements were sent to the various Linux distributors so they > could create updated packages (RPMs, debs, whatever) for their users. > > On a related note, I see one RedHat email address on python-dev (and one > Debian address on python-list). Are there other Linux distributions that > are heavy Python users (as opposed to simply packaging it up for inclusion)? > If so, perhaps they should be invited to join python-dev. Rest assured that Debian is present on python-dev as well, and nervously looking forward to the maintenance releases ;-) I hope 2.1.1 will make it out in time as well for our next release (being aware that 'before the next Debian release happens' is no very tight timeframe ;-). Gregor From guido at digicool.com Wed Jun 13 18:16:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 12:16:42 -0400 Subject: [Python-Dev] Re: PEP 259: Omit printing newline after newline Message-ID: <200106131616.MAA17468@cj20424-a.reston1.va.home.com> OK, OK, PEP 259 is dead. It seemed a nice idea at the time. :-) Alex and others, if you're serious about implementing print as __print__(), why don't you write a PEP? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Wed Jun 13 18:21:20 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 13 Jun 2001 12:21:20 -0400 (EDT) Subject: [Python-Dev] on announcing point releases In-Reply-To: <20010613181317.B30006@mediasupervision.de> References: <15143.35750.837420.376281@beluga.mojam.com> <20010613181317.B30006@mediasupervision.de> Message-ID: <15143.37632.758887.966026@cj42289-a.reston1.va.home.com> Gregor Hoffleit writes: > looking forward to the maintenance releases ;-) I hope 2.1.1 will make it > out in time as well for our next release (being aware that 'before the next Personally, I see no reason for Thomas to wait for the 2.0.1 release if he doesn't want to. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fredrik at pythonware.com Wed Jun 13 18:32:13 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 18:32:13 +0200 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <007801c0f426$84d1f220$4ffa42d5@hagrid> skip wrote: > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? conditionals: (?(cond)true) (?(cond)true|false) where cond is a group number (true if defined) or an assertion pattern, and true/false are patterns. (imo, whoever invented that needs help ;-) From akuchlin at mems-exchange.org Wed Jun 13 18:39:58 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 13 Jun 2001 12:39:58 -0400 Subject: [Python-Dev] sre improvements Message-ID: >For the non-Perl-heads on the list, can you explain what the (?(1)blah) >operator does? Conditionals. From http://www.perl.com/pub/doc/manual/html/pod/perlre.html, (...)(?(1)A|B) will match 'A' if group 1 matched, and B if it didn't. I'm not sure how "matched" is defined, as the Perl docs are vague; judging from the example, it means 'matched something of nonzero length'. Perl 5.6 introduced a bunch of new regex features, but I'm not sure how much we actually *care* about them; they're no doubt useful if regexes are the only tool you've got and you try to do full parsers using them, but they're also complicated to explain and will make the compiler messier. For example, lookaheads can also go into the conditional, not just an integer. (?i) now obeys the scoping from parens, and you can turn it off with (?-i). If Gustavo wants to implement these features and /F approves of his patches, then sure, put them in. But if either of those conditions fails, little will be lost. --amk From dmitry.antipov at auriga.ru Wed Jun 13 18:46:09 2001 From: dmitry.antipov at auriga.ru (dmitry.antipov at auriga.ru) Date: Wed, 13 Jun 2001 20:46:09 +0400 Subject: [Python-Dev] Why not Lisp-like list-related functions ? Message-ID: <3B2798D1.16F832A3@auriga.ru> Hello all, I'm new to Python but quite familiar with Lisp. So my question is about Python list-related functions. Why append(), extend(), sort(), reverse() etc. doesn't return a reference to it's own (modified) argument ? IMHO (I'm tweaking Python 2.1 to allow first example possible), >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) [9, 13, 19, 21, 8, 3, 6] >>> looks much better (and more "functional") than >>> x = [5, 8, 9, 3] >>> x.sort() >>> x = [3 + x * 2 for x in x] >>> y = [6, 3, 8] >>> y.reverse() >>> x.extend(y) >>> x [9, 13, 19, 21, 8, 3, 6] >>> Python designers and fans, please explain it to me :-). Any comments are welcome. Thanks and reply to me directly if possible, Dmitry Antipov From guido at digicool.com Wed Jun 13 19:01:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 13 Jun 2001 13:01:34 -0400 Subject: [Python-Dev] Weird message to stderr Message-ID: <200106131701.NAA17619@cj20424-a.reston1.va.home.com> > Running Python 2.1 using a .pyc file I get these weird messages > printed to stderr: > > run_pyc_file: nested_scopes: 0 > > These originate in pythonrun.c: > > static PyObject * > run_pyc_file(FILE *fp, char *filename, PyObject *globals, PyObject *locals, > PyCompilerFlags *flags) > { [...] > if (v && flags) { > if (co->co_flags & CO_NESTED) > flags->cf_nested_scopes = 1; > fprintf(stderr, "run_pyc_file: nested_scopes: %d\n", > flags->cf_nested_scopes); > } > Py_DECREF(co); > return v; > } > > Is this is left over debug printf or should I be warned > in some way ? I'll channel Jeremy... Looks like a debug message -- this code isn't tested by the standard test suite. Feel free to get rid of the fprintf() statement (and no, you don't have to write a PEP for this :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 13 19:06:52 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 13 Jun 2001 19:06:52 +0200 Subject: [Python-Dev] Why not Lisp-like list-related functions ? References: <3B2798D1.16F832A3@auriga.ru> Message-ID: <012d01c0f42b$45453b30$4ffa42d5@hagrid> Dmitry wrote: > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? doesn't Lisp have a FAQ? ;-) http://www.python.org/doc/FAQ.html#6.20 Q. Why doesn't list.sort() return the sorted list? ... basically, operations that modify an object generally don't return the object itself, to avoid mistakes like: for item in list.reverse(): print item # backwards ... for item in list.reverse(): print item # backwards, or? a slightly more pythonic way would be to add sorted, extended, reversed (etc) -- but that leads to method bloat. in addition, based on studying huge amounts of python code, I doubt cascading list operations would save the world that much typing... followups to python-list at python.org From paulp at ActiveState.com Wed Jun 13 19:22:09 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 13 Jun 2001 10:22:09 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> Message-ID: <3B27A141.6C69EC55@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > > > We really shouldn't consider the Japanese without Chinese and Korean. > > And those both seem *larger* than the Japanese. :( > > Unfortunately, these aren't available under a usable (=non-GPL) > license yet. Frank Chen has agreed to make them available under a Python-style license. > > What if we add them to CVS and formally maintain them as part of the > > core but distribute them as a separate download? > > Good idea. All in favour? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From aahz at rahul.net Wed Jun 13 19:32:24 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 13 Jun 2001 10:32:24 -0700 (PDT) Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B27A141.6C69EC55@ActiveState.com> from "Paul Prescod" at Jun 13, 2001 10:22:09 AM Message-ID: <20010613173224.0FFB999C87@waltz.rahul.net> >>> What if we add them to CVS and formally maintain them as part of the >>> core but distribute them as a separate download? >> >> Good idea. > > All in favour? +1 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gward at python.net Wed Jun 13 20:53:20 2001 From: gward at python.net (Greg Ward) Date: Wed, 13 Jun 2001 14:53:20 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <007801c0f426$84d1f220$4ffa42d5@hagrid>; from fredrik@pythonware.com on Wed, Jun 13, 2001 at 06:32:13PM +0200 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> Message-ID: <20010613145320.G5114@gerg.ca> On 13 June 2001, Fredrik Lundh said: > conditionals: > > (?(cond)true) > (?(cond)true|false) > > where cond is a group number (true if defined) or an assertion > pattern, and true/false are patterns. > > (imo, whoever invented that needs help ;-) I think I'd have to agree with /F on this one... somewhere around Perl 5.003 or 5.004, regexes in Perl went from being a powerful and really cool facility to being a massively overgrown language-within-a-language. I *tried* to use some of the fancy new features a few times out of curiosity, but could never get them to work. (At the time, I think I was a pretty sharp Perl programmer, although I've dulled since then.) Greg -- Greg Ward - Unix bigot gward at python.net http://starship.python.net/~gward/ No animals were harmed in transmitting this message. From jepler at inetnebr.com Wed Jun 13 18:09:58 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Wed, 13 Jun 2001 11:09:58 -0500 Subject: [Python-Dev] sre improvements In-Reply-To: <15143.36590.447465.657241@beluga.mojam.com>; from skip@pobox.com on Wed, Jun 13, 2001 at 11:03:58AM -0500 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> Message-ID: <20010613110957.C29405@inetnebr.com> On Wed, Jun 13, 2001 at 11:03:58AM -0500, Skip Montanaro wrote: > > Gustavo> I'd like, for example, to see the (?(1)blah) operator, > Gustavo> available in perl, working. > > Gustavo, > > For the non-Perl-heads on the list, can you explain what the (?(1)blah) > operator does? from perlre(1): (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero- width assertion. Say, m{ ( \( )? [^()]+ (?(1) \) ) }x matches a chunk of non-parentheses, possibly included in parentheses themselves. Jeff From tim.one at home.com Thu Jun 14 08:12:48 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 14 Jun 2001 02:12:48 -0400 Subject: [Python-Dev] Adding .decode() method to Unicode In-Reply-To: <3B2664AD.B560D685@ActiveState.com> Message-ID: [Paul Prescod] > ... > We could argue angels on the head of a pin until the cows come home but > 90% of all Python users think of 8-bit strings as strings of characters. Actually, if you count me, make that 92%. some-things-were-easier-when-python-had-50-users-and-i-was-two- of-them-ly y'rs - tim From paulp at ActiveState.com Thu Jun 14 09:30:19 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 00:30:19 -0700 Subject: [Python-Dev] sre improvements References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> Message-ID: <3B28680B.A46CF171@ActiveState.com> Greg Ward wrote: > >... > > I think I'd have to agree with /F on this one... somewhere around Perl > 5.003 or 5.004, regexes in Perl went from being a powerful and really > cool facility to being a massively overgrown language-within-a-language. > I *tried* to use some of the fancy new features a few times out of > curiosity, but could never get them to work. (At the time, I think I > was a pretty sharp Perl programmer, although I've dulled since then.) I would rather see us try a new approach to regular expressions. I've seen a few proposals for more verbose-but-readable syntaxes. I think one was from Greg Ewing? And maybe one from Ping? For those of us who use regular expressions only once in a while (i.e. the lucky ones), the current syntax is a holy terror. Which characters are magical again? In what contexts? With how many levels of backslashing? Upper case W versus lower case W? Obviously we can never abandon the tried and true Perl5 RE module, but I think we could have another syntax on top. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From arigo at ulb.ac.be Thu Jun 14 10:58:48 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Thu, 14 Jun 2001 10:58:48 +0200 (MET DST) Subject: [Python-Dev] Special-casing "O" Message-ID: Hello everybody, For comparison purposes, I implemented the idea of optimizing PyArg_ParseTuple calls by modifying the C code itself. Here is the result: http://homepages.ulb.ac.be/~arigo/pyarg_pp.tgz I did not upload this as a patch at SourceForge for several reasons. The most fundamental is that it raises bootstrapping issues: how can we compile the Python interpreter if we first have to run a Python script on the source files ? Fixing this would make the Makefiles significantly more complex. The other reason is that the METH_O solution is probably still faster, as it often completely avoids to build the 1-tuple of arguments. More serious performance tests might be needed, however. A bientot, Armin. From thomas at xs4all.net Thu Jun 14 13:10:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 14 Jun 2001 13:10:01 +0200 Subject: [Python-Dev] Releasing 2.0.1 In-Reply-To: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> References: <200106131519.LAA16978@cj20424-a.reston1.va.home.com> Message-ID: <20010614131001.B1659@xs4all.nl> On Wed, Jun 13, 2001 at 11:19:03AM -0400, Guido van Rossum wrote: > So I think it's good to go now. I can release a 2.0.1c1 this week > (indicating a release candidate) and a final 2.0.1 next week. +1 here. > If you know a good reason why I should hold off on releasing this, or > if you have a patch that absolutely should make it into 2.0.1, please > let me know NOW! This project is way overdue. (Thomas is ready to > release 2.1.1 as soon as this goes out, I believe. :-) Well, not quite, but I can put in a couple of allnighters (I want to do a review of all log-messages since 2.1-final, to see if I missed any checkin messages, and I want to update the NEWS file with a list of bugs fixed) and have it ready in a week or two. I don't think 2.1.1 should be released *that* soon after 2.0.1 anyway. I noticed this in the LICENCE file, by the way: Python 2.1 is a derivative work of Python 1.6.1, as well as of Python 2.0. and 8. By copying, installing or otherwise using Python 2.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Thu Jun 14 13:14:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:14:22 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? Message-ID: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> > Hello all, > > I'm new to Python but quite familiar with Lisp. So my question is > about Python list-related functions. Why append(), extend(), sort(), > reverse() etc. doesn't return a reference to it's own (modified) > argument ? IMHO (I'm tweaking Python 2.1 to allow first example > possible), > > >>> [3 + x * 2 for x in [5, 8, 9, 3].sort()].extend([6, 3, 8].reverse()) > [9, 13, 19, 21, 8, 3, 6] > >>> > > looks much better (and more "functional") than > > >>> x = [5, 8, 9, 3] > >>> x.sort() > >>> x = [3 + x * 2 for x in x] > >>> y = [6, 3, 8] > >>> y.reverse() > >>> x.extend(y) > >>> x > [9, 13, 19, 21, 8, 3, 6] > >>> > > Python designers and fans, please explain it to me :-). > Any comments are welcome. > > Thanks and reply to me directly if possible, > Dmitry Antipov Funny, to me your first form is much harder to read than your second. With the first form, I have to stop and think and look carefully at where the brackets are to see in which order the operations are executed, while in the second form it's obvious, because it's broken down in smaller chunks. So I guess that's the real reason: Python users have a procedural brain, not a functional brain, and we don't like Lispish code. Maybe we also have a smaller brain than the typical Lisper -- I would say, that would make us more normal, and if Python caters to people with a closer-to-average brain size, that would mean more people will be able to program in Python. History will decide... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu Jun 14 13:31:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 07:31:16 -0400 Subject: [Python-Dev] Adding Asian codecs to the core Message-ID: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +1, as long as they're not in the CVS subtree that's normally extracted for a regular source distribution. I propose this location in the CVS tree: python/dist/encodings/... (So 'encodings' would be a sibling of 'src', which has been pretty lonely ever since I started using CVS. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin at mems-exchange.org Thu Jun 14 17:19:28 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 14 Jun 2001 11:19:28 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <200106141114.HAA25430@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Thu, Jun 14, 2001 at 07:14:22AM -0400 References: <200106141114.HAA25430@cj20424-a.reston1.va.home.com> Message-ID: <20010614111928.A4560@ute.cnri.reston.va.us> On Thu, Jun 14, 2001 at 07:14:22AM -0400, Guido van Rossum wrote: >Maybe we also have a smaller brain than the typical Lisper -- I would >say, that would make us more normal, and if Python caters to people >with a closer-to-average brain size, that would mean more people will >be able to program in Python. History will decide... I thought it already has, pretty much. --amk From tim at digicool.com Thu Jun 14 18:49:07 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 14 Jun 2001 12:49:07 -0400 Subject: [Python-Dev] PEP 255: Simple Generators Message-ID: You can view an HTML version of PEP 255 here: http://python.sourceforge.net/peps/pep-0255.html Discussion should take place primarily on the Python Iterators list: mailto:python-iterators at lists.sourceforge.net If replying directly to this message, please remove (at least) Python-Dev and Python-Announce. PEP: 255 Title: Simple Generators Version: $Revision: 1.3 $ Author: nas at python.ca (Neil Schemenauer), tim.one at home.com (Tim Peters), magnus at hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators at lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 Post-History: 14-Jun-2001 Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. Specification A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase this in. [XXX spell this out] The yield statement may only be used inside functions. A function that contains a yield statement is called a generator function. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). When a return statement is encountered, nothing is returned, but a StopIteration exception is raised, signalling that the iterator is exhausted. The same is true if control flows off the end of the function. Note that return means "I'm done, and have nothing interesting to return", for both generator functions and non-generator functions. Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print Q & A Q. Why a new keyword? Why not a builtin function instead? A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new keyword makes that easy. Reference Implementation A preliminary patch against the CVS Python source is available[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html Copyright This document has been placed in the public domain. From guido at digicool.com Thu Jun 14 19:30:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 14 Jun 2001 13:30:42 -0400 Subject: [Python-Dev] Python 2.0.1c1 - GPL-compatible release candidate Message-ID: <200106141730.f5EHUgX03621@odiug.digicool.com> With a sigh of relief I announce Python 2.0.1c1 -- the first Python release in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Note that this is a release candidate. We don't expect any problems, but we're being careful nevertheless. We're planning to do the final release of 2.0.1 a week from now; expect it to be identical to the release candidate except for some dotted i's and crossed t's. Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=39267 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Thu Jun 14 13:46:25 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:46:25 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <02db01c0f4c7$a491c620$0900a8c0@spiff> during a late hacking pass, I was perplexed to realized that r"[\u0000-\uffff]" didn't match any unicode character, and reported it as bug #420011. but a few minutes later, I realized that SRE doesn't support \u and \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works as expected. should I close the bug report, or turn it into a feature request? From fredrik at pythonware.com Thu Jun 14 13:52:26 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 14 Jun 2001 13:52:26 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> Message-ID: <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Paul wrote: > > > What if we add them to CVS and formally maintain them as part of the > > > core but distribute them as a separate download? > > > > Good idea. > > All in favour? +0.5 I still think adding them to the core is okay, but that's me. Cheers /F From gward at python.net Thu Jun 14 22:11:49 2001 From: gward at python.net (Greg Ward) Date: Thu, 14 Jun 2001 16:11:49 -0400 Subject: [Python-Dev] sre improvements In-Reply-To: <3B28680B.A46CF171@ActiveState.com>; from paulp@ActiveState.com on Thu, Jun 14, 2001 at 12:30:19AM -0700 References: <20010613125408.W13940@tux.distro.conectiva> <15143.36590.447465.657241@beluga.mojam.com> <007801c0f426$84d1f220$4ffa42d5@hagrid> <20010613145320.G5114@gerg.ca> <3B28680B.A46CF171@ActiveState.com> Message-ID: <20010614161149.C9884@gerg.ca> On 14 June 2001, Paul Prescod said: > I would rather see us try a new approach to regular expressions. I've > seen a few proposals for more verbose-but-readable syntaxes. I think one > was from Greg Ewing? And maybe one from Ping? I remember Ping's from a few year's back. It was pretty cool, but awfully verbose. I *like* the compactness of the One True Regex Language (ie. the one implemented by Perl 5, PCRE, and SRE). > For those of us who use regular expressions only once in a while (i.e. > the lucky ones), the current syntax is a holy terror. Which characters > are magical again? In what contexts? With how many levels of > backslashing? Upper case W versus lower case W? Wow, you should try keeping grep vs. egrep vs. sed vs. awk (which version again?) vs. emacs straight. I generally don't bother: as soon as a problem gets too hairy for grep/sed/awk/etc., I whip out my trusty old friend "perl -e" and all is well again. Unless I'm already coding in Python of course, in which case I whip out my trusty old friend re.compile(), and everything just works. I guess I just have a good memory for line noise. > Obviously we can never abandon the tried and true Perl5 RE module, but I > think we could have another syntax on top. Yeah, I s'pose it could be useful. Yet another great teaching tool, at any rate. Greg -- Greg Ward - Python bigot gward at python.net http://starship.python.net/~gward/ Quick!! Act as if nothing has happened! From greg at cosc.canterbury.ac.nz Fri Jun 15 02:56:50 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 12:56:50 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <20010614161149.C9884@gerg.ca> Message-ID: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Paul Prescod: > I think one > was from Greg Ewing? And maybe one from Ping? I can't remember what my first proposal (many years ago now) was like, but you might like to look at what I'm using in my Plex module: http://www.cosc.canterbury.ac.nz/~greg/python/Plex Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From paulp at ActiveState.com Fri Jun 15 03:36:13 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 14 Jun 2001 18:36:13 -0700 Subject: [Python-Dev] sre improvements References: <200106150056.MAA03621@s454.cosc.canterbury.ac.nz> Message-ID: <3B29668D.ADFB3C22@ActiveState.com> Greg Ewing wrote: > > Paul Prescod: > > > I think one > > was from Greg Ewing? And maybe one from Ping? > > I can't remember what my first proposal (many years ago > now) was like, but you might like to look at what I'm > using in my Plex module: > > http://www.cosc.canterbury.ac.nz/~greg/python/Plex I would be interested in *both* your regular expression library and your lexer for the Python standard library. But separately. Maybe we need two short PEPs that point to the documentation and suggest how the two packages could be integrated into the standard library. What do you think? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From greg at cosc.canterbury.ac.nz Fri Jun 15 03:49:04 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 15 Jun 2001 13:49:04 +1200 (NZST) Subject: [Python-Dev] sre improvements In-Reply-To: <3B29668D.ADFB3C22@ActiveState.com> Message-ID: <200106150149.NAA03631@s454.cosc.canterbury.ac.nz> > I would be interested in *both* your regular expression library and your > lexer for the Python standard library. But separately. Well, the regular expressions aren't really a separable part of Plex. I mentioned it as a possible source of ideas for anyone working on a new syntax for the regexp stuff. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Fri Jun 15 09:58:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 09:58:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> Message-ID: <3B29C037.FB1DB6B8@lemburg.com> Fredrik Lundh wrote: > > Paul wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +0.5 > > I still think adding them to the core is okay, but that's me. What would be the threshold for doing so ? Tamito is actively working on reducing the table sizes of the the codecs and after what I have seen you do on these sort of tables I am pretty sure Tamito can turn these tables into shared libs which are smaller than 200k. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From MarkH at ActiveState.com Fri Jun 15 10:05:26 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Fri, 15 Jun 2001 18:05:26 +1000 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: <3B29C037.FB1DB6B8@lemburg.com> Message-ID: > > I still think adding them to the core is okay, but that's me. > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. But isn't this set only one of the many possible Asian codecs? I would have no objection to one 200k module, but if we really wanted to handle "asian codecs" I believe this is only the start. For this reason, I would give a -0 to adding these to the core, and a +1 to adding them to the directory structure proposed by Guido. Mark. From guido at digicool.com Fri Jun 15 18:59:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 12:59:40 -0400 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106151659.MAA30396@cj20424-a.reston1.va.home.com> > during a late hacking pass, I was perplexed to realized that > r"[\u0000-\uffff]" didn't match any unicode character, and reported > it as bug #420011. > > but a few minutes later, I realized that SRE doesn't support \u and > \U escapes at all -- and that the pattern u"[\u0000-\uffff]" works > as expected. > > should I close the bug report, or turn it into a feature request? > > You meant ur"[\u0000-\uffff]", right? (It works the same -- Unicode raw strings still do \u expansion, although the rationale escapes me at the moment -- as does the rationale for why ru"..." is a syntax error...) Looks like a feature request to me. Since \000 and \x00 work in that context, \u0000 would be expected to work. And suppose someone uses u"[\u0000-\u005d]"... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jun 15 21:00:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 15 Jun 2001 15:00:26 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch Message-ID: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> I've checked in Neil's latest generator patch into a branch of the CVS tree. That makes it (hopefully) easier for folks to play with. Tim, can you update the PEP to point to this branch? (There's some boilerplate code about branches in PEP 252 or 253 that you could adapt.) I had to change the code in ceval.c because of recent conflicting changes there. The test suite runs (except test_inspect), but I'd appreciate it if someone (Neil?) could make sure that I didn't overlook anything. (I should probably check the CVS logs. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) PS. If you saw a checkin of Grammar/Grammar in the *head* branch, that was a mistake, and I've already corrected it. From paulp at ActiveState.com Fri Jun 15 21:19:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 15 Jun 2001 12:19:08 -0700 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> Message-ID: <3B2A5FAC.C5089CC2@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > What would be the threshold for doing so ? > > Tamito is actively working on reducing the table sizes of the the > codecs and after what I have seen you do on these sort of tables I > am pretty sure Tamito can turn these tables into shared libs which are > smaller than 200k. Don't forget Chinese (Taiwan and mainland) and Korean! I guess I don't see the big deal in making them separate downloads. We can use distutils to make them easy to install .exe's for Reference Python and PPM for ActivePython. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at lemburg.com Fri Jun 15 22:05:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 15 Jun 2001 22:05:47 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <20010612234815.2C90599C82@waltz.rahul.net> <3B26B491.CA8536BD@ActiveState.com> <3B2710BB.CFD8215@lemburg.com> <3B27A141.6C69EC55@ActiveState.com> <02ef01c0f4c8$7bdc8520$0900a8c0@spiff> <3B29C037.FB1DB6B8@lemburg.com> <3B2A5FAC.C5089CC2@ActiveState.com> Message-ID: <3B2A6A9B.AC156262@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > What would be the threshold for doing so ? > > > > Tamito is actively working on reducing the table sizes of the the > > codecs and after what I have seen you do on these sort of tables I > > am pretty sure Tamito can turn these tables into shared libs which are > > smaller than 200k. > > Don't forget Chinese (Taiwan and mainland) and Korean! > > I guess I don't see the big deal in making them separate downloads. We > can use distutils to make them easy to install .exe's for Reference > Python and PPM for ActivePython. Ok. BTW, how come www.python.org no longer provides precompiled (contributed) binaries for the various OSes out there ? The FTP server only has these for Python <= 1.5.2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Fri Jun 15 23:39:42 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 15 Jun 2001 17:39:42 -0400 Subject: [Python-Dev] gen-branch: CVS branch for Neil's generator patch In-Reply-To: <200106151900.PAA31935@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I've checked in Neil's latest generator patch into a branch of the CVS > tree. That makes it (hopefully) easier for folks to play with. It will for me, and I thank you. > Tim, can you update the PEP to point to this branch? Done. From martin at loewis.home.cs.tu-berlin.de Sat Jun 16 00:17:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 16 Jun 2001 00:17:49 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions Message-ID: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> > should I close the bug report, or turn it into a feature request? I think the bug report can be closed. Myself, I found it sufficient that you can write normal \u escapes in strings, in particular as you can also use them in raw strings: >>> ur"Ha\u006Clo" u'Hallo' Perhaps not very intuitive, and perhaps even a bug (how do you put a backslash in front of a "u" in a raw unicode string), but useful in this context. Regards, Martin From guido at digicool.com Sat Jun 16 17:46:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 11:46:14 -0400 Subject: [Python-Dev] 2.0.1's GPL-compatibility is official! Message-ID: <200106161546.LAA05521@cj20424-a.reston1.va.home.com> Richard Stallman, Eben Moglen and the FSF agree: Python 2.0.1 is compatible with the GPL. They've updated the text about the Python license on http://www.gnu.org/philosophy/license-list.html, stating in particular: GPL-Compatible, Free Software Licenses [...] The License of Python 1.6a2 and earlier versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that newer versions of Python are under other licenses (see below). The License of Python 2.0.1, 2.1.1, and newer versions. This is a free software license and is compatible with the GNU GPL. Please note, however, that intermediate versions of Python (1.6b1, through 2.0 and 2.1) are under a different license (see below). I would like to emphasize and clarify (again!) that Python is *not* released under the GPL, so if you think the GPL is a bad thing, you don't have to worry about Python being contaminated. The GPL compatibility is important for folks who distribute Python binaries: e.g. the new license makes it okay to release Python binaries linked with GNU readline and other GPL-covered libraries. We'll release the final release of 2.0.1 within a week; so far we've had only one bug reported in the release candidate. I expect that we won't have to wait long for 2.1.1, which will have the same GPL-compatible license as 2.0.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat Jun 16 18:10:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 16 Jun 2001 12:10:27 -0400 Subject: [Python-Dev] contributed binaries (was: Adding Asian codecs...) Message-ID: <200106161610.MAA05684@cj20424-a.reston1.va.home.com> > BTW, how come www.python.org no longer provides precompiled > (contributed) binaries for the various OSes out there ? > The FTP server only has these for Python <= 1.5.2. There are some binaries for newer versions, mostly Linux RPMs, but these are in different places. I agree the FTP download area is a mess. I propose to give up on the FTP area and start over on the new Zope-based web server, if and when it's ready. Not enough people are helping out, so it's going slowly. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat Jun 16 20:59:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 16 Jun 2001 20:59:52 +0200 Subject: [Python-Dev] recognizing \u escapes in regular expressions References: <200106152217.f5FMHnI01360@mira.informatik.hu-berlin.de> Message-ID: <3B2BACA7.CDA96737@lemburg.com> "Martin v. Loewis" wrote: > > > should I close the bug report, or turn it into a feature request? > > I think the bug report can be closed. Myself, I found it sufficient > that you can write normal \u escapes in strings, in particular as you > can also use them in raw strings: > > >>> ur"Ha\u006Clo" > u'Hallo' > > Perhaps not very intuitive, and perhaps even a bug (how do you put a > backslash in front of a "u" in a raw unicode string), but useful in > this context. >>> print ur"backslash in front of an 'u': \u005cu" backslash in front of an 'u': \u A double backslash is easier to have: >>> print ur"double backslash in front of an 'u': \\u" double backslash in front of an 'u': \\u Python uses C's convention for \uXXXX where \u is only interpreted as Unicode escape of it is used with an odd number of backslashes in front of it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Mon Jun 18 02:57:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 17 Jun 2001 20:57:53 -0400 Subject: [Python-Dev] Re: Why not Lisp-like list-related functions ? In-Reply-To: <20010614111928.A4560@ute.cnri.reston.va.us> Message-ID: [Guido] > Maybe we also have a smaller brain than the typical Lisper -- I would > say, that would make us more normal, and if Python caters to people > with a closer-to-average brain size, that would mean more people will > be able to program in Python. History will decide... [Andrew Kuchling] > I thought it already has, pretty much. OK, I've kept quiet for days, but can't bear it any longer: Andrew, are you waiting for someone to *force* you to immortalize this exchange in your Python Quotes collection? If so, the PSU knows where you liv From mal at lemburg.com Mon Jun 18 12:14:04 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 18 Jun 2001 12:14:04 +0200 Subject: [Python-Dev] Adding Asian codecs to the core References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> Message-ID: <3B2DD46C.EEC20857@lemburg.com> Guido van Rossum wrote: > > > > > What if we add them to CVS and formally maintain them as part of the > > > > core but distribute them as a separate download? > > > > > > Good idea. > > > > All in favour? > > +1, as long as they're not in the CVS subtree that's normally > extracted for a regular source distribution. I propose this location > in the CVS tree: > > python/dist/encodings/... > > (So 'encodings' would be a sibling of 'src', which has been pretty > lonely ever since I started using CVS. ;-) Ok. When Tamito has completed his work on the codecs (he is currently reimplementing them in C), I'll check them in under the new directory. BTW, how should we ship these codecs ? I'd propose to provide a distutils setup.py file which wraps up all codecs under encodings and can be used to create a standard Python add-on "Python-X.X Encoding Add-on". The generated files should then ideally be published right next to the Python source/binary links on the python.org web-pages to achieve high visibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Mon Jun 18 14:25:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 18 Jun 2001 08:25:35 -0400 Subject: [Python-Dev] Adding Asian codecs to the core In-Reply-To: Your message of "Mon, 18 Jun 2001 12:14:04 +0200." <3B2DD46C.EEC20857@lemburg.com> References: <200106141131.HAA25522@cj20424-a.reston1.va.home.com> <3B2DD46C.EEC20857@lemburg.com> Message-ID: <200106181225.IAA15518@cj20424-a.reston1.va.home.com> > Ok. When Tamito has completed his work on the codecs (he is currently > reimplementing them in C), I'll check them in under the new directory. Excellent! > BTW, how should we ship these codecs ? > > I'd propose to provide a distutils setup.py file which wraps up > all codecs under encodings and can be used to create a standard > Python add-on "Python-X.X Encoding Add-on". Sounds like a good plan. > The generated files should then ideally be published right next > to the Python source/binary links on the python.org web-pages to > achieve high visibility. Sure, for some defininition of "right next to" :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Mon Jun 18 16:35:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 18 Jun 2001 16:35:12 +0200 Subject: [Python-Dev] Moshe Message-ID: <20010618163512.D8098@xs4all.nl> Just FYI: Moshe has been sighted, alive and well. He's been caught up in personal matters, apparently. He apologized and said he'd mail python-dev with an update soonish. Don't-you-wish-you-lurked-on-#python-too-ly y'rs ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From m.favas at per.dem.csiro.au Mon Jun 18 23:28:23 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 05:28:23 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? Message-ID: <3B2E7277.D6109E7E@per.dem.csiro.au> [Platform: Tru64 Unix, Compaq C compiler) The current CVS of 2.2a0 fails test_struct for me with: test test_struct failed -- pack('>i', -2147483649) did not raise error more extensively, trying std iI on -2147483649 == 0xffffffff7fffffff Traceback (most recent call last): File "Lib/test/test_struct.py", line 367, in ? t.run() File "Lib/test/test_struct.py", line 353, in run self.test_one(x) File "Lib/test/test_struct.py", line 269, in test_one any_err(pack, ">" + code, x) File "Lib/test/test_struct.py", line 38, in any_err raise TestFailed, "%s%s did not raise error" % ( test_support.TestFailed: pack('>i', -2147483649) did not raise error A 64-bit platform issue? Also, the current imap.py causes "make test" (test___all__ and test_sundry) to fail with: "exceptions.TabError: inconsistent use of tabs and spaces in indentation (imaplib.py, line 576)" - untested checkin ? -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim at digicool.com Tue Jun 19 00:04:06 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 18 Jun 2001 18:04:06 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: [Mark Favas] > [Platform: Tru64 Unix, Compaq C compiler) > The current CVS of 2.2a0 fails test_struct for me with: > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > more extensively, > trying std iI on -2147483649 == 0xffffffff7fffffff > Traceback (most recent call last): > File "Lib/test/test_struct.py", line 367, in ? > t.run() > File "Lib/test/test_struct.py", line 353, in run > self.test_one(x) > File "Lib/test/test_struct.py", line 269, in test_one > any_err(pack, ">" + code, x) > File "Lib/test/test_struct.py", line 38, in any_err > raise TestFailed, "%s%s did not raise error" % ( > test_support.TestFailed: pack('>i', -2147483649) did not raise error > > A 64-bit platform issue? In test_struct.py, please change this line (right after "class IntTester"): BUGGY_RANGE_CHECK = "bBhHIL" to BUGGY_RANGE_CHECK = "bBhHiIlL" and try again. I suspect you're bumping into a pre-existing bug that simply wasn't checked before (and, yes, there's A Reason it *may* screw up on a 64-bit box but not a 32-bit one). Note that since in standard mode, "i" is considered to be a 4-byte int regardless of platform, we really *should* bitch about trying to pack -2147483649 under "i" (but we don't -- and in general no codes except the new q/Q reliably bitch about out-of-range errors in the standard modes). > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? Leaving that to some loser who cares about whitespace . From m.favas at per.dem.csiro.au Tue Jun 19 00:11:37 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 19 Jun 2001 06:11:37 +0800 Subject: [Python-Dev] Anyone else seeing test_struct fail? References: Message-ID: <3B2E7C99.E9BEFC3C@per.dem.csiro.au> [Tim Peters suggests] > > [Mark Favas] > > [Platform: Tru64 Unix, Compaq C compiler) > > The current CVS of 2.2a0 fails test_struct for me with: > > > > test test_struct failed -- pack('>i', -2147483649) did not raise error > > In test_struct.py, please change this line (right after "class IntTester"): > > BUGGY_RANGE_CHECK = "bBhHIL" > > to > > BUGGY_RANGE_CHECK = "bBhHiIlL" > > and try again. Yep, passes with this change. > > Also, the current imap.py causes "make test" (test___all__ and > > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > > tabs and spaces in indentation (imaplib.py, line 576)" - untested > > checkin ? > > Leaving that to some loser who cares about whitespace . Guess we'll have to advertise widely, then . -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From barry at digicool.com Tue Jun 19 00:28:21 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 18 Jun 2001 18:28:21 -0400 Subject: [Python-Dev] Bogosities in quopri module? Message-ID: <15150.32901.611349.524220@yyz.digicool.com> I've been playing a bit with the quopri module (trying to support RFC 2047 in mimelib), and I've run across a few bogosities that I'd like to fix. Fixing some of them could break code, so I wanted to see what people think first. First, quopri should have encodestring() and decodestring() functions which take a string and return a string. This would make it more consistent API-wise with e.g. base64. One difference is that quopri.encodestring() should probably take a default argument quotetabs (defaulted to 1) for passing to the encode() function. This shouldn't be very controversial. I think there are two problems with encode(). First, it always tacks on an extra \n character, such that an encode->decode roundtrip is not idempotent. I propose fixing this so that encode() doesn't add the extra newline, but this can break code that expects that newline to be present. Third, I think that encode()'s quotetabs flag should also apply to spaces. RFC 1521 says that both ASCII tabs and spaces may be encoded, and I don't think it's worthwhile that there be a separate flag to independently choose to encode tabs or spaces. Lastly, if you buy the extra-newline solution above, then encode() has to be fixed w.r.t. trailing spaces and tabs. Currently, an encode->decode roundtrip for, e.g. "hello " returns "hello =\n", but what it should really return is "hello=20". Likewise "hello\t" should return "hello=09". The patches must take multiline strings into account though, so that it doesn't chomp newlines out of """hello great big world """ I haven't worked up a patch yet, but when I do I'll upload it to SF to get some feedback. I think there are a few other things in the module that could be cleaned up. I also plan to add a test_quopri.py. Comments? -Barry From see at my.signature Tue Jun 19 08:21:14 2001 From: see at my.signature (Greg Ewing) Date: Tue, 19 Jun 2001 18:21:14 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Something is bothering me about this. In fact, it's bothering me a LOT. In the following, will f() work as a generator-function: def f(): for i in range(5): g(i) def g(i): for j in range(10): yield i,j If I understand PEP255 correctly, this will *not* work. But it seems entirely reasonable to me that it *should* work. It *has* to work, otherwise how am I to write generators that are too complicated to fit into a single function? Someone please tell me I'm wrong about this! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From jepler at inetnebr.com Tue Jun 19 15:25:23 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Tue, 19 Jun 2001 08:25:23 -0500 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619082522.A12200@inetnebr.com> On Tue, Jun 19, 2001 at 06:21:14PM +1200, Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. But it seems entirely reasonable to me that > it *should* work. It *has* to work, otherwise how > am I to write generators that are too complicated > to fit into a single function? The following similar code seems to produce the results you have in mind. def f(): for i in range(5): #g(i) #yield g(i) for x in g(i): yield x def g(i): for j in range(10): yield i, j It would be nice to have a succinct way to say 'for dummy in iterator: yield dummy'. Maybe 'yield from iterator'? Then f would become: def f(): for i in range(5): yield from g(i) Jeff PS I noticed that the generator branch got merged into the trunk. Cool! From fdrake at acm.org Tue Jun 19 15:24:46 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 09:24:46 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 Message-ID: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> I built GCC 3.0 last night, and Python built and passed the regression tests. I've not done any further comparisons, but using --with-cxx=... failed; the C++ ABI changed and a new version of the C++ runtime is required before that will work. I didn't want to install that over my working installation, just in case. ;-) I'll report more as I find out more. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From nas at python.ca Tue Jun 19 16:00:39 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 07:00:39 -0700 Subject: [Python-Dev] Re: PEP 255: Simple Generators In-Reply-To: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz>; from see@my.signature on Tue, Jun 19, 2001 at 06:21:14PM +1200 References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <20010619070039.A13712@glacier.fnational.com> Greg Ewing wrote: > Something is bothering me about this. In fact, > it's bothering me a LOT. In the following, will > f() work as a generator-function: > > def f(): > for i in range(5): > g(i) > > def g(i): > for j in range(10): > yield i,j > > If I understand PEP255 correctly, this will *not* > work. No, it will not work. The title of PEP 255 is "Simple Generators". What you want will require something like stackless in order to get the C stack out of the way. That's a major change to the Python internals. To make your example work you need to do: def f(): for i in range(5): for j in g(i): yield j def g(i): for j in range(10): yield i,j Stackless may still be in Python's future but no for 2.2. Neil From barry at digicool.com Tue Jun 19 16:19:58 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 19 Jun 2001 10:19:58 -0400 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> Message-ID: <15151.24462.400930.295658@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I built GCC 3.0 last night, and Python built and passed Fred> the regression tests. Hey, you were actually able to download it!? :) I couldn't get an ftp connection for the longest time and finally gave up. It'd be interesting to see if there are any performance improvements, esp. on x86 boxen. -Barry From fdrake at acm.org Tue Jun 19 17:07:48 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 11:07:48 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.24462.400930.295658@anthem.wooz.org> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> Message-ID: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Barry A. Warsaw writes: > It'd be interesting to see if there are any performance > improvements, esp. on x86 boxen. GCC 2.95.3: cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.58 This machine benchmarks at 6329.11 pystones/second 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (280major+241minor)pagefaults 0swaps GCC 3.0: cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.65 This machine benchmarks at 6060.61 pystones/second 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (307major+239minor)pagefaults 0swaps There is a little variation with multiple run, but it varies less than 5% from the numbers above. Bumping up the LOOPS constant in pystone.py changes the numbers a small bit, but the relationship remains constant. This is one a Linux-Mandrake 7.2 installation with non-cooker updates installed, and still using the Linux 2.2 kernel: cj42289-a(.../python/linux-gcc-3.0); uname -a Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dan at cgsoftware.com Tue Jun 19 18:19:14 2001 From: dan at cgsoftware.com (Daniel Berlin) Date: 19 Jun 2001 12:19:14 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> ("Fred L. Drake, Jr."'s message of "Tue, 19 Jun 2001 11:07:48 -0400 (EDT)") References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <87vglsbfy5.fsf@cgsoftware.com> "Fred L. Drake, Jr." writes: > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. Except, I bet you didn't use one of the "optimize for a given cpu" switches. Try adding -mpentiumpro -march=pentiumpro to your compiler flags. Otherwise, it's scheduling for a 386. And the old x86 backend wasn't all that bad at scheduling for the 386. Hell, i'm not that bad at scheduling for a 386. :) --Dan > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown > > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Digital Creations > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- "If all the nations in the world are in debt, where did all the money go? "-Steven Wright From mal at lemburg.com Tue Jun 19 18:55:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 18:55:47 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: <3B2F8413.77F40494@lemburg.com> "Fred L. Drake, Jr." wrote: > > Barry A. Warsaw writes: > > It'd be interesting to see if there are any performance > > improvements, esp. on x86 boxen. > > GCC 2.95.3: > > cj42289-a(.../python/linux); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.58 > This machine benchmarks at 6329.11 pystones/second > 1.66user 0.01system 0:03.40elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (280major+241minor)pagefaults 0swaps > > GCC 3.0: > > cj42289-a(.../python/linux); cd ../linux-gcc-3.0/ > cj42289-a(.../python/linux-gcc-3.0); time ./python -tt ../Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.65 > This machine benchmarks at 6060.61 pystones/second > 1.77user 0.01system 0:03.52elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (307major+239minor)pagefaults 0swaps > > There is a little variation with multiple run, but it varies less than > 5% from the numbers above. Bumping up the LOOPS constant in > pystone.py changes the numbers a small bit, but the relationship > remains constant. > > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 13:16:08 CEST 2000 i686 unknown Note that if you really want to see a speedup for x86 boxes then you should take a look at PGCC, the Pentium GCC compiler group: http://www.goof.com/pcg/ You can then adjust the compiler to various x86 CPUs and take advantage of some special optimizations they have intergrated into 2.95.2.1. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Tue Jun 19 19:44:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 12:44:47 -0500 Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <15151.36751.406758.577420@beluga.mojam.com> I am trying to add a module interface to some of the bits missing from PyGtk2. Some functions I'm interested in have varargs signatures, e.g.: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) From fdrake at acm.org Tue Jun 19 21:04:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2001 15:04:18 -0400 (EDT) Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <87vglsbfy5.fsf@cgsoftware.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> Message-ID: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Daniel Berlin writes: > Except, I bet you didn't use one of the "optimize for a given cpu" > switches. No, I hadn't. My main interest was in the GCC team's claim that the generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" did not make much difference at all. M.-A. Lemburg writes: > Note that if you really want to see a speedup for x86 boxes then > you should take a look at PGCC, the Pentium GCC compiler group: > > http://www.goof.com/pcg/ > > You can then adjust the compiler to various x86 CPUs and > take advantage of some special optimizations they have intergrated > into 2.95.2.1. If they have any improved optimizations for recent x86 chips, I'd like to see them folded into GCC. I'd hate to see another egcs-style split. It doesn't look like I can just download a single source package from them and wait 3 hours for it to build, so I won't plan on pursuing this further. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim at digicool.com Tue Jun 19 21:14:10 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:14:10 -0400 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> Message-ID: [Fred L. Drake, Jr.] > GCC 2.95.3: > This machine benchmarks at 6329.11 pystones/second > ... > GCC 3.0: > This machine benchmarks at 6060.61 pystones/second > ... > This is one a Linux-Mandrake 7.2 installation with non-cooker updates > installed, and still using the Linux 2.2 kernel: > > cj42289-a(.../python/linux-gcc-3.0); uname -a > Linux cj42289-a.reston1.va.home.com 2.2.17-21mdk #1 Thu Oct 5 > 13:16:08 CEST 2000 i686 unknown This is a good place to note that the single biggest "easy win" for pystone is to run it with -O (that is, Python's -O). Yields a 10% boost on Fred's box, and about 7% on MSVC6+Win2K. pystone is more sensitive to -O than most "real Python apps", probably because it's masses of very simple operations on scalar types -- no real classes, no dicts, no lists except to simulate fixed-size C arrays, lots of globals, and so on. The dynamic frequency of SET_LINENO is high, and the avg work per other opcode is low. OTOH, that's typical of *some* Python apps, and typical of *parts* of almost all Python apps. So it would be worth getting ridding of SET_LINENO even in non- -O runs. Note that SET_LINENO isn't needed to get correct line numbers in tracebacks (and hasn't been needed for years), it's "just" there to support tracing now. Vladimir had what looked to be a workable scheme for doing that a different way, and that would be a cool project for someone to revive (IMO -- Guido's may differ, but he's too busy to notice what we're doing ). From michel at digicool.com Tue Jun 19 21:12:14 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 19 Jun 2001 12:12:14 -0700 (PDT) Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: <3B2E7277.D6109E7E@per.dem.csiro.au> Message-ID: On Tue, 19 Jun 2001, Mark Favas wrote: > Also, the current imap.py causes "make test" (test___all__ and > test_sundry) to fail with: "exceptions.TabError: inconsistent use of > tabs and spaces in indentation (imaplib.py, line 576)" - untested > checkin ? I submitted a patch right on this line the other day that Guido applied, but I tested it and niether test___all__ nor test_sundry fail for me today. -Michel From mal at lemburg.com Tue Jun 19 21:28:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 19 Jun 2001 21:28:14 +0200 Subject: [Python-Dev] Python & GCC 3.0 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <3B2FA7CE.DD1602F7@lemburg.com> "Fred L. Drake, Jr." wrote: > > Daniel Berlin writes: > > Except, I bet you didn't use one of the "optimize for a given cpu" > > switches. > > No, I hadn't. My main interest was in the GCC team's claim that the > generated code was faster. Compiling with "make OPT='-mcpu=i686 -O3'" > did not make much difference at all. > > M.-A. Lemburg writes: > > Note that if you really want to see a speedup for x86 boxes then > > you should take a look at PGCC, the Pentium GCC compiler group: > > > > http://www.goof.com/pcg/ > > > > You can then adjust the compiler to various x86 CPUs and > > take advantage of some special optimizations they have intergrated > > into 2.95.2.1. > > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. > It doesn't look like I can just download a single source package > from them and wait 3 hours for it to build, so I won't plan on > pursuing this further. Oh, it's fairly easy to get a pgcc compiler: all you have to do is apply their small set of patches to the gcc source before compiling it. And then you should set your OPT environment variable to e.g. OPT="-g -O3 -Wall -Wstrict-prototypes -mcpu=k6" This will cause the pgcc compiler to use these settings in pretty much all compiles you ever do without having to think about it every time. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Tue Jun 19 21:36:41 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 19 Jun 2001 15:36:41 -0400 Subject: [Python-Dev] Anyone else seeing test_struct fail? In-Reply-To: Message-ID: [Michel Pelletier] > I submitted a patch right on this line the other day that Guido applied, > but I tested it and niether test___all__ nor test_sundry fail for me > today. Not to worry! I fixed all this stuff yesterday. imaplib.py had an ambiguous mix of hard tabs and spaces, which Guido "should have" caught before checking in, and that Python itself complained about when run with -tt (which is how Mark ran the test suite). There's no problem anymore. From nas at python.ca Tue Jun 19 22:37:18 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 13:37:18 -0700 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <15151.41522.200832.655534@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 19, 2001 at 03:04:18PM -0400 References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> Message-ID: <20010619133718.A14814@glacier.fnational.com> Fred L. Drake, Jr. wrote: > Compiling with "make OPT='-mcpu=i686 -O3'" did not make much > difference at all. Try OPT="-m486 -O2". That gave me the best results last time I played with this stuff. > If they have any improved optimizations for recent x86 chips, I'd > like to see them folded into GCC. I'd hate to see another egcs-style > split. Some people say you should avoid PGCC since it generates buggy code. I don't know if that's true or not. Neil From thomas at xs4all.net Tue Jun 19 23:04:46 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 19 Jun 2001 23:04:46 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message-ID: <20010619230446.E8098@xs4all.nl> On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > The test used int(time.time()) to get a random number, but this doesn't > work on the mac (where times are bigger than ints). Changed to > int(time.time()%1000000). Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be degrading the sequentiality of this particularly unrandom random number on platforms where ints really are big enough to hold times :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From loewis at informatik.hu-berlin.de Tue Jun 19 23:25:26 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Tue, 19 Jun 2001 23:25:26 +0200 (MEST) Subject: [Python-Dev] example of module interface to a varargs function? Message-ID: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> > The only place in the standard modules I saw that processed a truly > arbitrary number of arguments is the struct_pack method of the > struct module, and it doesn't use PyArg_Parse* to process them. Can > someone point me to an example of marshalling arbitrary numbers of > arguments then calling a varargs function? In a true varargs function, you cannot use PyArg_Parse*. Instead, you have to iterate over the argument tuple with PyTuple_GetItem, fetching one argument after another. Another example of such a function is builtin_max. > (I'll worry about calling gtk_binding_entry_add_signal after I > figure out how to marshal the args.) I'd worry about this first: In C, it is not possible to call a true varargs function in a portable way if the caller doesn't statically (i.e. in source code) know the number of arguments. Only the callee can be variable, not the caller. A slight exception is that you are allowed to pass-through va_list objects from one function to another. However, that requires that the callee expects a va_list argument, i.e. is not a varargs function, plus there is no portable way to create a va_list object from scratch. If you absolutely need to call such a function, you can use the Cygnus libffi function, which, for a certain number of microprocessors and C ABIs, allows to call arbitrary function pointers. However, I'd rather recommend to look for alternatives to gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall accepts a GSList*, which is a chained list of arguments, instead of being varargs. This you can call in a C module - the other one is out of reach. Regards, Martin From skip at pobox.com Tue Jun 19 23:32:50 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 16:32:50 -0500 Subject: [Python-Dev] Python & GCC 3.0 In-Reply-To: <20010619133718.A14814@glacier.fnational.com> References: <15151.21150.214238.429130@cj42289-a.reston1.va.home.com> <15151.24462.400930.295658@anthem.wooz.org> <15151.27332.418303.872171@cj42289-a.reston1.va.home.com> <3B2F8413.77F40494@lemburg.com> <87vglsbfy5.fsf@cgsoftware.com> <15151.41522.200832.655534@cj42289-a.reston1.va.home.com> <20010619133718.A14814@glacier.fnational.com> Message-ID: <15151.50434.297860.277726@beluga.mojam.com> Neil> Some people say you should avoid PGCC since it generates buggy Neil> code. I don't know if that's true or not. If nothing else, PGCC almost certainly gets a lot less exercise than the mainstream GCC code. Given the statement in the PGCC FAQ that typical speedups are on the range of 5%: http://www.goof.com/pcg/pgcc-faq.html#SEC0119 it doesn't seem like it would be worth the effort to use it in any critical applications. Better to just wait for PGCC optimizations to trickle into GCC itself. Skip From jack at oratrix.nl Tue Jun 19 23:56:43 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 19 Jun 2001 23:56:43 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib/test test_mailbox.py,1.5,1.6 In-Reply-To: Message by Thomas Wouters , Tue, 19 Jun 2001 23:04:46 +0200 , <20010619230446.E8098@xs4all.nl> Message-ID: <20010619215648.B2A7CE267B@oratrix.oratrix.nl> Recently, Thomas Wouters said: > On Tue, Jun 19, 2001 at 01:20:07PM -0700, Jack Jansen wrote: > > > The test used int(time.time()) to get a random number, but this doesn't > > work on the mac (where times are bigger than ints). Changed to > > int(time.time()%1000000). > > Doesn't int(time.time()%sys.maxint) make more sense ? At least you won't be > degrading the sequentiality of this particularly unrandom random number on > platforms where ints really are big enough to hold times :) I think the last sentence should be "... platforms where time before 1970 doesn't exist so they can fit it in a measly 32 bits":-) But anyway: I haven't a clue whether the sequentiality is important, it doesn't really seem to be from a quick glance. If you want to fix it: allez votre corridor. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From skip at pobox.com Wed Jun 20 00:01:13 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 19 Jun 2001 17:01:13 -0500 Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> Message-ID: <15151.52137.623119.852524@beluga.mojam.com> >> The only place in the standard modules I saw that processed a truly >> arbitrary number of arguments is the struct_pack method of the struct >> module, and it doesn't use PyArg_Parse* to process them. Can someone >> point me to an example of marshalling arbitrary numbers of arguments >> then calling a varargs function? Martin> In a true varargs function, you cannot use PyArg_Parse*. Martin> Instead, you have to iterate over the argument tuple with Martin> PyTuple_GetItem, fetching one argument after another. I think it would be nice if PyArg_ParseTuple and friends took a "*" format character. It would only be useful at the end of a format string, but would allow the generic argument parsing machinery to be used for those arguments that precede it. The argument it writes into would be an int, which would represent the offset of the first argument not processed by PyArg_ParseTuple. Reusing my example: void gtk_binding_entry_add_signal (GtkBindingSet *binding_set, guint keyval, guint modifiers, const gchar *signal_name, guint n_args, ...) If I had a Python module wrapper function for this it might call PyArg_ParseTuple as PyArg_ParseTuple(args, "iis*", &keyval, &modifiers, &signal_name, &offset); Processing of the rest of the argument list would be the responsibility of the author and start at args[offset]. >> (I'll worry about calling gtk_binding_entry_add_signal after I figure >> out how to marshal the args.) Martin> I'd worry about this first: In C, it is not possible to call a Martin> true varargs function in a portable way if the caller doesn't Martin> statically (i.e. in source code) know the number of Martin> arguments. Only the callee can be variable, not the caller. Understood. It turns out that the function I used as an example is actually only called in a few distinct ways. I can analyze its var-arguments fairly easily and dispatch to the appropriate call to the underlying function. Martin> However, I'd rather recommend to look for alternatives to Martin> gtk_binding_entry_add_signal. E.g. gtk_binding_entry_add_signall Martin> accepts a GSList*, which is a chained list of arguments, instead Martin> of being varargs. This you can call in a C module - the other Martin> one is out of reach. Hmm... thanks, this does look like the correct solution. I failed to notice the distinction between the two functions when I first scanned the source code, the signall (two-els) version is never called outside of gtkbindings.c, the Gtk documentation in this area is, well, rather sparse, to say the least (nine comments over 1200 lines of code, the only two substatial ones of which are boilerplate at the top), and there is no reference manual documentation for any of the interesting functions. By comparison, the Python documentation looks as if Guido has employed a team of full-time tech writers for years. Way to go, Fred! Skip From nas at python.ca Wed Jun 20 00:12:49 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 19 Jun 2001 15:12:49 -0700 Subject: [Python-Dev] OS timer and profiling Python code Message-ID: <20010619151249.A15126@glacier.fnational.com> On x86 hardware the Linux timer runs at 100 Hz by default. On modern hardware that is probably much too slow to accurately profile programs using the Python profiler. Changing the value in include/asm-i386/param.h from 100 to 1024 and recompiling the kernel made a huge difference for me. Perhaps we should include a note in the profiler documentation. I'm not sure if this affects gprof as well but I suspect it does. Neil From moshez at zadka.site.co.il Wed Jun 20 07:31:23 2001 From: moshez at zadka.site.co.il (Moshe Zadka) Date: Wed, 20 Jun 2001 08:31:23 +0300 Subject: [Python-Dev] Moshe In-Reply-To: <20010618163512.D8098@xs4all.nl> References: <20010618163512.D8098@xs4all.nl> Message-ID: On Mon, 18 Jun 2001 16:35:12 +0200, Thomas Wouters wrote: > Just FYI: Moshe has been sighted, alive and well. He's been caught up in > personal matters, apparently. He apologized and said he'd mail python-dev > with an update soonish. Yes, indeed, and soonish got sorta delayed too... Anyway, I am alive and well, and the bad guys will have to do better then 300m to get me in an explosion ;-) Anyway, I'm terribly sorry for disappearing - my personal life caught up with me and stuff. I'm now trying to catch up with everything. Thanks to whoever took 2.0.1 from where I left off and kept it going. -- "I'll be ex-DPL soon anyway so I'm |LUKE: Is Perl better than Python? looking for someplace else to grab power."|YODA: No...no... no. Quicker, -- Wichert Akkerman (on debian-private)| easier, more seductive. For public key, finger moshez at debian.org |http://www.{python,debian,gnu}.org From greg at cosc.canterbury.ac.nz Wed Jun 20 07:55:28 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 17:55:28 +1200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Tim Peters wrote: > > Who would this help? Seriously. There's nothing special about a generator > to a caller, except that it returns an object that implements the iterator > interface. What matters to the caller is irrelevant here. We're talking about what matters to someone writing or reading the implementation. To those people, there is a VERY big difference between a regular function and a generator-function -- about as big as the difference between a class and a function! In fact, a generator-function is in many ways much more like a class than a function. Calling a generator-function doesn't execute any of the code in its body; instead, it creates an instance of the generator, much like calling a class creates an instance of the class. Calling them "generator classes" and "generator instances" would perhaps be more appropriate, and more suggestive of the way they actually behave. The more I think about this, the more I agree with those who say that overloading the function-definition syntax for defining generators is a bad idea. It seems to make about as much sense as saying that there shouldn't be any special syntax for defining a class -- the header of a class definition should look exactly like a function definition, and to tell the difference you have to look for some subtle clue further down. I suggest dropping the "def" altogether and using: generator foo(args): ... yield x ... Right from the word go, this says loudly and clearly that this thing is *not* a function, it's something else. If you haven't come across generators before, you go and look in the manual to find out what it means. There you're told something like Executing a generator statement creates a special callable object called a generator. Calling a generator creates a generator-instance, which is an iterator object... [...stuff about the "yield" statement...] I think this is going to be easier to document and lead to much less confusion than trying to explain the magic going on when you call something that looks for all the world like a function and it doesn't execute any of the code in it. Explicit is better than implicit! -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg at cosc.canterbury.ac.nz Wed Jun 20 08:17:09 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:17:09 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: Message-ID: <3B303FE5.735A5FDC@cosc.canterbury.ac.nz> Tim Peters wrote: > > This is like saying that functions returning integers should be declared > "defint" instead, or some such gibberish. Not the same thing. If a function returns an integer, somewhere in it or in something that it calls there is a piece of code that explicitly creates an integer. But under PEP 255, there is *nothing* anywhere in the code that you can point to and say "look, here is where the generator-iterator is created!" Instead, it happens implicitly at some point just after the generator-function is called, but before any of its code is executed. You could say that the same thing is true when you call a class object -- creation of the instance happens implicitly before __init__ is called. But there is no secret made of the fact that classes are not functions, and there is nothing in the syntax to lead you to believe that they behave like functions. In contrast, the proposed generator syntax makes generators look so nearly like functions that their actual behaviour, once you get your head around it, seems quite bizarre. I just think it's going to lead to a lot of confusion and misunderstanding, among newcomers especially. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From greg at cosc.canterbury.ac.nz Wed Jun 20 08:28:13 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 20 Jun 2001 18:28:13 +1200 Subject: [Python-Dev] Re: PEP 255: Simple Generators References: <3B2EEF5A.8FF0DAFB@cosc.canterbury.ac.nz> Message-ID: <3B30427D.5A90DDE7@cosc.canterbury.ac.nz> Olaf Delgado Friedrichs wrote: > > If I understand correctly, this should work: > > def f(): > for i in range(5): > for x in g(i): > yield x > > def g(i): > for j in range(10): > yield i,j Yes, I realised that shortly afterwards. But I think we're going to get a lot of questions from newcomers who have tried to implicitly nest iterators and are very confused about why it doesn't work and what needs to be done to make it work. An explicit generator definition syntax would help here, I think. First of all, it would be a syntax error to use "yield" outside of a generator definition, so they would be forced to declare the inner one as a generator. Then, if they neglected to make the outer one a generator too, it would look like this: def f(): for i in range(5): g(i) generator g(i): for j in range(10): yield i,j from which it is glaringly obvious that f() is NOT a generator, and therefore can't be used as one. -- Greg Ewing, Computer Science Dept, University of Canterbury, Christchurch, New Zealand To get my email address, please visit my web page: http://www.cosc.canterbury.ac.nz/~greg From loewis at informatik.hu-berlin.de Wed Jun 20 12:27:30 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Wed, 20 Jun 2001 12:27:30 +0200 (MEST) Subject: [Python-Dev] Re: example of module interface to a varargs function? In-Reply-To: <15151.52137.623119.852524@beluga.mojam.com> (message from Skip Montanaro on Tue, 19 Jun 2001 17:01:13 -0500) References: <200106192125.XAA27631@pandora.informatik.hu-berlin.de> <15151.52137.623119.852524@beluga.mojam.com> Message-ID: <200106201027.MAA06782@pandora.informatik.hu-berlin.de> > I think it would be nice if PyArg_ParseTuple and friends took a "*" format > character. It would only be useful at the end of a format string, but would > allow the generic argument parsing machinery to be used for those arguments > that precede it. Now I understand. Yes, that would be useful, but apparently was not required often enough so far to make somebody ask for it. Regards, Martin From aahz at rahul.net Wed Jun 20 15:00:08 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 06:00:08 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> from "Greg Ewing" at Jun 20, 2001 05:55:28 PM Message-ID: <20010620130008.7880D99C88@waltz.rahul.net> Greg Ewing wrote: > > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... +2 -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From nas at python.ca Wed Jun 20 16:28:20 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 20 Jun 2001 07:28:20 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: ; from tim_one@users.sourceforge.net on Tue, Jun 19, 2001 at 11:57:34PM -0700 References: Message-ID: <20010620072820.A16584@glacier.fnational.com> Tim Peters wrote: > gen_iternext(): repair subtle refcount problem. > NeilS, please check! This came from staring at your genbug.py, but I'm > not sure it plugs all possible holes. Without this, I caught a > frameobject refcount going negative, and it was also the cause (in debug > build) of _Py_ForgetReference's attempt to forget an object with already- > NULL _ob_prev and _ob_next pointers -- although I'm still not entirely > sure how! Doesn't this cause a memory leak? f_back is INCREFed in PyFrame_New. There are other problems lurking here as well. def f(): try: yield 1 finally: print "finally" def h(): g = f() g.next() while 1: h() The above code leaks memory like mad, with or without your change. Also, the finally clause is never executed although it probably should be. My feeling is that the reference counting of f_back should be done by ceval and not by the frame object. The problem with the finally clause is another ball of wax. I think its fixable though. I'll look at it closer this evening. Neil From tim.one at home.com Wed Jun 20 16:28:19 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:28:19 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > ... Why is this on Python-Dev? The PEP announcement specifically asked for discussion to occur on the Iterators list, and specifically asked to keep it *off* of Python-Dev. I've been playing along with people who wanted to discuss it on c.l.py instead, as finite time allows, but no way does the discussion belong here. From arigo at ulb.ac.be Wed Jun 20 16:30:49 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Wed, 20 Jun 2001 16:30:49 +0200 (MET DST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: Hi, On Wed, 20 Jun 2001, Greg Ewing wrote: > I suggest dropping the "def" altogether and using: > > generator foo(args): > ... > yield x > ... Nice idea. We might even think about dropping the 'yield' keyword altogether and using 'return' instead (althought I'm not quite sure it is a good idea; I'm just suggesting it with a personal -0.5). A bientot, Armin. From tim.one at home.com Wed Jun 20 16:41:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 10:41:13 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python ceval.c,2.250,2.251 In-Reply-To: <20010620072820.A16584@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Doesn't this cause a memory leak? f_back is INCREFed in > PyFrame_New. There are other problems lurking here as well. > ... Our msgs crossed in the mail. Unfortunately, I have to get off email now and probably won't get on again before this evening. Tracebacks appear to be a potential problem too ... we'll-reinvent-stackless-before-this-is-over<0.9-wink>-ly y'rs - tim From barry at digicool.com Wed Jun 20 18:35:49 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 12:35:49 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> Message-ID: <15152.53477.212348.243592@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> What matters to the caller is irrelevant here. We're talking GE> about what matters to someone writing or reading the GE> implementation. To those people, there is a VERY big GE> difference between a regular function and a GE> generator-function -- about as big as the difference GE> between a class and a function! GE> In fact, a generator-function is in many ways much more GE> like a class than a function. Calling a generator-function GE> doesn't execute any of the code in its body; instead, it GE> creates an instance of the generator, much like calling GE> a class creates an instance of the class. Calling them GE> "generator classes" and "generator instances" would GE> perhaps be more appropriate, and more suggestive of the GE> way they actually behave. Thanks Greg, I think you've captured perfectly my discomfort with the proposal. I'm fine with return being "special" inside a generator, along with most of the other details of the pep. But it bugs me that the semantics of calling the thing created by `def' is different depending on some statement embedded deep in the body of the code. Think about it from a teaching perspective: You're taught that def creates a function, perhaps called foo. You know that calling foo starts execution at the first line in the function block. You know you can put a print statement on the first line and it will print something out when the function is called. You know that you can set a debugger break point at foo's first line and when you call the function, the debugger will leave you on that first line of code. But all that changes with a generator! My print statement isn't executed when I call the function... how weird! Hey, the debugger doesn't even break on the line when I call the function. Okay, maybe it's some /other/ foo my program is really calling. So let's hunt around for other possible foo's that my program might be calling. Hmm, no dice there. Now I'm really confused because I haven't gotten to the chapter that says "Now that you know all about functions, forget most of that if you find a yield statement in the body of the function, because it's a special kind of function called a generator. Calling such a special function doesn't execute any code, it just instantiates a built-in object called a generator object. To get any of the generator's code to execute, you have to call the generator object's next() method." Further, I print out the type of the object returned by calling foo and I see it's a . Okay, so now let me search foo for a return statement. Because I know about functions, and I know that the returned object isn't None, I know that the function isn't falling off the end. So there must be a return statement that explicitly returns a generator object (whatever that is). Hmm, nope, there's just a bare return sitting there. That's damn confusing. I wonder what those yield statements are doing. Well, I look those up in my book's index and I see that's described in chapter 57, which I haven't gotten to yet. Besides, those yields clearly have integers after them, so that can't be it. So how the heck do I get a generator object by calling this function??? You'll counter that the "search for yield to find out if the function is special" is a simple rule, once learned is easily remembered. I'll counter that it's harder for me to do an Isearch in XEmacs to find out what kind of thing foo is. :) To me, it's just bad mojo to have the behavior of the thing created by `def' determined by what's embedded in the body of the program. I don't buy the defint argument, because by searching for a return statement in the function, you can find out exactly what is being returned when the function is called. Not so with a generator. My vote is for a "generator" keyword to introduce the code block of a generator. Makes perfect sense to me, and it will be a strong indication to anybody reading my code that something special is going on. And something special /is/ going on! An informal poll of PythonLabs indicates a split on this subject, perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But who said this was a democracy anyway? :) somewhat-like-my-own-country-of-origin-ly y'rs, -Barry From tim at digicool.com Wed Jun 20 18:42:00 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 12:42:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. From fredrik at pythonware.com Wed Jun 20 18:54:22 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 20 Jun 2001 18:54:22 +0200 Subject: [Python-Dev] Suggested amendment to PEP 255 References: <3B303AD0.1884E173@cosc.canterbury.ac.nz> <15152.53477.212348.243592@anthem.wooz.org> Message-ID: <006d01c0f9a9$a879fcd0$4ffa42d5@hagrid> barry wrote: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! agreed. +1 on generator instead of def. (and +0 on suspend instead of yield, but that's me) Cheers /F From jeremy at alum.mit.edu Wed Jun 20 19:25:05 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 13:25:05 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: Why can't we discuss Python development on python-dev? please-take-replies-to-python-dev-meta-ly y'rs, Jeremy -----Original Message----- From: python-dev-admin at python.org [mailto:python-dev-admin at python.org]On Behalf Of Tim Peters Sent: Wednesday, June 20, 2001 12:42 PM To: Barry A. Warsaw Cc: python-dev at python.org Subject: RE: [Python-Dev] Suggested amendment to PEP 255 Please keep this off Python-Dev. Paul Prescod has already fwd'ed Greg's msg to the Iterators list, and let's keep it there. _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev From tim at digicool.com Wed Jun 20 20:28:17 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 20 Jun 2001 14:28:17 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: [Jeremy Hylton] > Why can't we discuss Python development on python-dev? You can, but without me in this case. The arguments aren't new (they were discussed on the Iterators list before the PEP was posted), and I don't have time to repeat them on (now three) different forums. The PEP announcement clearly said discussion belonged on the Iterators list, specifically asked that it stay off of Python-Dev, and the PEP Discussion-To field (which I assume Barry filled in -- I did not) reads Discussion-To: python-iterators at lists.sourceforge.net If you want a coherent historic record (I do), that's where this belongs. From aahz at rahul.net Wed Jun 20 20:37:49 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 20 Jun 2001 11:37:49 -0700 (PDT) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: from "Jeremy Hylton" at Jun 20, 2001 01:25:05 PM Message-ID: <20010620183749.B419E99C82@waltz.rahul.net> Jeremy Hylton wrote: > > Why can't we discuss Python development on python-dev? I'm split on this issue. I understand why Tim wants to have the discussion corralled into a single place; it's also a moderate inconvenience to have to add another mailing list every time a "critical" issue comes up. I think the best compromise is to follow the rules currently in existence for the PEP process, and if one doesn't wish to subscribe to another mailing list, e-mail one's feedback to the PEP author directly and raise bloody hell if the next PEP revision doesn't include a mention of the feedback. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry at digicool.com Wed Jun 20 21:07:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 20 Jun 2001 15:07:00 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 References: Message-ID: <15152.62548.504923.152041@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> and the PEP Discussion-To field (which I assume Barry filled TP> in -- I did not) reads Not me. I believe it was in Magnus's original version of the PEP. But I do think that now that the code is in the main CVS trunk, it is appropriate to remove the Discussion-To: header and redirect comments back to python-dev. That may be difficult in practice however. -Barry From jack at oratrix.nl Wed Jun 20 23:52:16 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 20 Jun 2001 23:52:16 +0200 Subject: [Python-Dev] _PyTrace_init declaration Message-ID: <20010620215221.1697FE267B@oratrix.oratrix.nl> I'm getting "no prototype" warnings on _PyTrace_init, and inspection shows that this routine indeed doesn't show up in an include file. As it is used elsewhere (in sysmodule.c) shouldn't it be called PyTrace_init and have it's prototype declared somewhere? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From tim.one at home.com Thu Jun 21 00:31:10 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 20 Jun 2001 18:31:10 -0400 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: [Jack Jansen] > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? It should indeed be declared in ceval.h (Fred?), but so long as it's part of the private API it should not lose the leading underscore. From thomas at xs4all.net Thu Jun 21 00:29:51 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 21 Jun 2001 00:29:51 +0200 Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <20010621002951.H8098@xs4all.nl> On Wed, Jun 20, 2001 at 11:52:16PM +0200, Jack Jansen wrote: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No, and yes. the _Py* functions are internal, but non-static (used in other files.) They should have a prototype declared somewhere, but they shouldn't be used outside of Python itself. It shouldn't be named 'PyTrace_init' unless it is a supported part of the API. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg at cosc.canterbury.ac.nz Thu Jun 21 01:39:17 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 21 Jun 2001 11:39:17 +1200 (NZST) Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: Message-ID: <200106202339.LAA04351@s454.cosc.canterbury.ac.nz> > The PEP announcement specifically asked for > discussion to occur on the Iterators list Sorry, I missed that - I was paying more attention to the PEP itself than what the announcement said. Going now to subscribe to the iterators list forthwith. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From jeremy at alum.mit.edu Thu Jun 21 01:47:28 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 20 Jun 2001 19:47:28 -0400 Subject: [Python-Dev] Suggested amendment to PEP 255 In-Reply-To: <15152.53477.212348.243592@anthem.wooz.org> Message-ID: > My vote is for a "generator" keyword to introduce the code block of a > generator. Makes perfect sense to me, and it will be a strong > indication to anybody reading my code that something special is going > on. And something special /is/ going on! > > An informal poll of PythonLabs indicates a split on this subject, > perhaps setting Jeremy up as a Sandra Day O'Conner swing vote. But > who said this was a democracy anyway? :) > > somewhat-like-my-own-country-of-origin-ly y'rs, > -Barry That's a nice analogy, Ruth Barry Ginsburg; a Supreme Court, which appoints the president, seems a closer fit to Python's dictatorship than some sort of democratic process. I wasn't present for the oral arguments, but I'm sure we all know how Tim Scalia voted and that Guido van Clarence Thomas agreed without comment. I assume, then, that Anthony Kennedy Jr. joined you, although he's often a swing vote, too. Can't wait to hear the report from Nina "Michael Hudson" Totenberg. I was originally happy with the use of def. It's not much of a stretch since the def statement defines a code block that has formal parameters and creates a new scope. I certainly wouldn't be upset if Python ended up using def to define a generator. I appreciate, though, that the definition of a generator may look an awful lot like a function. I can imagine a user reading a module, missing the yield statement, and trying to use the generator as a function. I can't imagine this would happen often. My limited experience with CLU suggests that iterators aren't going to be huge, unwieldy blocks where it's hard to see what the ultimate control flow is. If a confused user treats a generator as a regular function, he or she certainly can't expect it to return anything useful, since all the return statements are bare returns; the expected behavior would be some side-effect on global state, which seems both unlikely and unseemly for an iterator. I'm not sure how hard it will be to explain generators to new users. I expect you would teach functions and iterations via for loop, then explain that there is a special kind of function called a generator that can be used in a for loop. It uses a yield statement instead of a return statement to return values. Not all that hard. If we use a different keyword to introduce them, you'd probably explain them much the same way: A generator is a special kind of function that can be used in a for loop and is defined with generator instead of def. As other people have mentioned, Icon doesn't use special syntax to introduce generators. We might as well look at CLU, too, where a different approach. You can view the CLU Reference Manual at: http://ncstrl.mit.edu/Dienst/UI/2.0/Describe/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR -225 It uses "proc" to introduce a procedure and "iter" to introduce an iterator. See page 72 for the details: http://ncstrl.mit.edu/Dienst/UI/2.0/Page/ncstrl.mit_lcs%2fMIT%2fLCS%2fTR-225 /72 It's a toss up, then between the historical antecedents Icon and CLU. I'd tend to favor a new keyword for generators, but could be talked out of that position. Jeremy From fdrake at acm.org Thu Jun 21 01:57:57 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 20 Jun 2001 19:57:57 -0400 (EDT) Subject: [Python-Dev] _PyTrace_init declaration In-Reply-To: <20010620215221.1697FE267B@oratrix.oratrix.nl> References: <20010620215221.1697FE267B@oratrix.oratrix.nl> Message-ID: <15153.14469.903865.533713@cj42289-a.reston1.va.home.com> Jack Jansen writes: > I'm getting "no prototype" warnings on _PyTrace_init, and inspection > shows that this routine indeed doesn't show up in an include file. As > it is used elsewhere (in sysmodule.c) shouldn't it be called > PyTrace_init and have it's prototype declared somewhere? No. I thought I had a prototype for it just above the usage. Any, I'm re-working that code this week, so you can assign this to me in the bug tracker. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Thu Jun 21 16:32:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 21 Jun 2001 10:32:40 -0400 Subject: [Python-Dev] PEP 255 - BDFL Pronouncement: 'def' it stays Message-ID: <200106211432.f5LEWeA03163@odiug.digicool.com> I've thought long and hard and tried to read almost all the mail on this topic, and I cannot get myself to change my mind. No argument on either side is totally convincing, so I have consulted my language designer's intuition. It tells me that the syntax proposed in the PEP is exactly right - not too hot, not too cold. But, like the Oracle at Delphi in Greek mythology, it doesn't tell me why, so I don't have a rebuttal for the arguments against the PEP syntax. The best I can come up with (apart from agreeing with the rebuttals that Tim and others have already made) is "FUD". If this had been part of the language from day one, I very much doubt it would have made Andrew Kuchling's "Python Warts" page. So I propose that Tim and others defending 'def' save their remaining breath, and I propose that Paul and others in favor of 'gen[erator]' start diverting their energy towards thinking about how to best teach generators the PEP syntax. Tim, please add a BDFL pronouncement to the PEP to end the argument. You can also summarize the arguments on either side, for posterity -- without trying to counter them. I found one useful comment on the PEP that isn't addressed and is orthogonal to the whole discussion: try/finally. When you have a try/finally around a yield statement, it is possible that the finally clause is not executed at all when the iterator is never resumed. I find this disturbing, and am tempted to propose that yield inside try/finally be disallowed (but yield inside try/except is still allowed). Another idea might be to somehow continue the frame with an exception at this point -- but I don't have a clue what exception would be appropriate (StopIteration isn't because it goes in the other direction) and I don't know what to do if the generator catches exception and tries to yield again (maybe the exception should be raised again?). The continued execution of the frame would be part of the destructor for the generator-iterator object, so, like a __del__ method, any unhandled exceptions wouldn't be able to propagate out of it. PS I lost my personal archive of the last 18 hours of the iter mailing list, and the web archive is down, alas, so I'm writing this from memory. I *did* read most of the messages in my archive before I accidentally deleted it, though. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tdickenson at devmail.geminidataloggers.co.uk Thu Jun 21 17:02:54 2001 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Thu, 21 Jun 2001 16:02:54 +0100 Subject: [Python-Dev] Re: [Python-iterators] PEP 255 - BDFL Pronouncement: 'def' it stays In-Reply-To: <200106211432.f5LEWeA03163@odiug.digicool.com> References: <200106211432.f5LEWeA03163@odiug.digicool.com> Message-ID: On Thu, 21 Jun 2001 10:32:40 -0400, Guido van Rossum wrote: > Another idea might be to somehow continue the frame with an >exception at this point -- but I don't have a clue what exception >would be appropriate (StopIteration isn't because it goes in the other >direction) Im sure any exception is appropriate there. What about restarting the frame as if the 'yield' had been followed a 'return'? Toby Dickenson tdickenson at geminidataloggers.com From mwh at python.net Fri Jun 22 01:20:17 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 22 Jun 2001 00:20:17 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-06-07 - 2001-06-21 Message-ID: This is a summary of traffic on the python-dev mailing list between June 7 and June 21 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the tenth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 192 | [|] | [|] 30 | [|] | [|] | [|] | [|] | [|] | [|] [|] 20 | [|] [|] | [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-019-014-001-003-014-039-026-013-009-004-001-005-023-021 Thu 07| Sat 09| Mon 11| Wed 13| Fri 15| Sun 17| Tue 19| Fri 08 Sun 10 Tue 12 Thu 14 Sat 16 Mon 18 Wed 20 Quiet fortnight. * Adding .decode() method to Unicode * Marc-Andre Lemburg asked for opinions on adding a .decode method to unicode objects: He certainly got them; the responses ranged from neutral to negative, and there was a surprising amount of hostility in the air. The problem (as ever in these matters) seems to be that Python currently uses the same type for 8-bit strings and gobs of arbitrary data. Guido came to the rescue and calmed everyone down: since when discussion has vanished again. * Adding Asian codecs to the core * Marc-Andre Lemburg announced that Tamito KAJIYAMA has decided to relicense his Japanese codecs with a BSD-style license, enabling them to be included in the core: This is clearly a good thing; the only quibble is that the encodings are by their nature rather large, so they will probably go into a separate directory in CVS (probably python/dist/encodings/) and not go into the source tarball released on python.org. * Omit printing newline after newline * As readers of comp.lang.python will have noticed, Guido posted: and retracted: PEP 259, a proposal for changing the behaviour of the print statement. * sre "improvements" * Gustavo Niemeyer asked if anyone planned to add the "(?(1)blah)" re operators to Python: but Python is not perl and there wasn't much support for making regular expressions more baffling than they already are. * Generators * In a discussion that slobbered across comp.lang.python, python-dev and the python-iterators list at sf (and belongs on the latter!) there was much talk of PEP 255, Simple Generators. Most was positive; the main dissent was from people that thought it was too hard to tell a generator from a regular function (at the source level). However Guido listened to Tim's repeated claims that this is insignificant once you've actually used generators once or twice and Pronounced "'def' it is": and noticed that there are still some issues wrt try/finally blocks. However, clever people seem to be thinking about it, so I'm sure the problem's days are numbered :-) I should also note that the gen-branch has been checked into the trunk of CVS. Woohoo! Cheers, M. From arigo at ulb.ac.be Fri Jun 22 13:00:34 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Fri, 22 Jun 2001 13:00:34 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: Hello everybody, I implemented a proof-of-concept version of a "Python compiler". It is not really a compiler. I know perfectly well that you cannot compile Python into something more efficient than a bunch of calls to PyObject_xxx. Still, this very preliminary version runs the following function twice as fast as the python interpreter: def f(n): result = 0 i = 0 while i; from arigo@ulb.ac.be on Fri, Jun 22, 2001 at 01:00:34PM +0200 References: Message-ID: <20010622071846.A7014@craie.housenet> On Fri, Jun 22, 2001 at 01:00:34PM +0200, Armin Rigo wrote: > Hello everybody, > > I implemented a proof-of-concept version of a "Python compiler". It is not > really a compiler. I know perfectly well that you cannot compile Python > into something more efficient than a bunch of calls to PyObject_xxx. > Still, this very preliminary version runs the following function twice as > fast as the python interpreter: I've implemented something similar, but didn't get such favorable results yet. I was concentrating more on implementing a type system and code to infer type information, and had spent less time on the code generation. (For instance, my system could determine the result type of subscript-type operations, and infer the types of lists over a loop, as in: l1 = [1,3.14159, "tubers"] l2 = [0]*3 for j in range(3): l2[j] = l1[j-3] # Type of l2 is HeterogeneousListType([IntType, FloatType, # StringType]) You could make it run forever on a pathological case like l = [] while 1: l = [l] with the fix being to "give up" after some number of iterations, and declare the unstable object (l) as having type "ObjectType", which is always correct but overbroad. My code is still available, but my motivation has faded somewhat and I haven't had the time to work on it recently in any case. It uses "GNU Lightning" for JIT code generation, rather than using an external compiler. (If I were to approach the problem again, I might discard the JIT code generator in favor of starting over again with the python2c compiler and adding type information) It can make judgements about sequences of calls, such as def f(): return g() when g is given the "solid" attribute, and the compilation process begins by hoisting the former global load of g into a constant load, something like def make_f(): local_g = g def f(): return local_g() return f f = make_f() What are you using to generate code? How would you compare the sophistication of your type inference system to the one I've outlined above? Jeff From Greg.Wilson at baltimore.com Fri Jun 22 14:34:17 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Fri, 22 Jun 2001 08:34:17 -0400 Subject: [Python-Dev] ...und zen, ze world! Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> From pedroni at inf.ethz.ch Fri Jun 22 14:59:40 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 22 Jun 2001 14:59:40 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221259.OAA02519@core.inf.ethz.ch> Hi. Just after reading the README, it's very intriguing and interesting, (if I remember well this resemble the customization approach of the Self VM compiler) ideally it could evolve in a loadable extension, that then works together with the normal interp (unchanged up to offering some hooks*) in a trasparent way for the user ... emitting native code for the major platforms or just specialized bytecodes. I will give a serious look at it. regards, Samuele Pedroni. *: some possible useful hooks would be: - minimal profiling support in order to specialize only things called often - feedback for dynamic changing of methods, class hierarchy, ... if we want to optimize method lookup (which would make sense) - a mixed fixed slots/dict layout for instances. From nas at python.ca Fri Jun 22 16:43:17 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 07:43:17 -0700 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <20010622074317.A22058@glacier.fnational.com> Is "raise StopIteration" an abuse of exceptions? Why can we not use "return StopIteration" to signal the end of an iterator? I've done a bit of hacking and the idea seems to work. On possible problem is that the StopIteration object in the builtin module could cause some confusing behavior. For example the code: for obj in __builtin__.__dict__.values(): print obj would not work as expected. This could be fixed in most causes by changing the tp_iternext protocol. Something like: int tp_iternext(PyObject *it, PyObject **item) were the return value is 1, 0, or -1. IOW, StopIteration would not have to come into the protocol if the object implemented tp_iternext. Neil From guido at digicool.com Fri Jun 22 18:19:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:19:34 -0400 Subject: [Python-Dev] why not "return StopIteration"? Message-ID: <200106221619.f5MGJY306866@odiug.digicool.com> This is treated extensively in the discussion section of the iterators-PEP; quoting: - It has been questioned whether an exception to signal the end of the iteration isn't too expensive. Several alternatives for the StopIteration exception have been proposed: a special value End to signal the end, a function end() to test whether the iterator is finished, even reusing the IndexError exception. - A special value has the problem that if a sequence ever contains that special value, a loop over that sequence will end prematurely without any warning. If the experience with null-terminated C strings hasn't taught us the problems this can cause, imagine the trouble a Python introspection tool would have iterating over a list of all built-in names, assuming that the special End value was a built-in name! - Calling an end() function would require two calls per iteration. Two calls is much more expensive than one call plus a test for an exception. Especially the time-critical for loop can test very cheaply for an exception. - Reusing IndexError can cause confusion because it can be a genuine error, which would be masked by ending the loop prematurely. I'm not sure why you are reopening this -- special terminating values are evil IMO. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri Jun 22 18:20:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 12:20:43 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106221620.f5MGKib06875@odiug.digicool.com> Very cool, Armin! Did you announce this on c.l.py too? I wish I had time to look at this in more detail -- but please do go on developing it, and look at what others have tried... --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Fri Jun 22 18:30:44 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 22 Jun 2001 12:30:44 -0400 Subject: [Python-Dev] why not "return StopIteration"? References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <15155.29364.416545.301534@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: | - Calling an end() function would require two calls per | iteration. Two calls is much more expensive than one call | plus a test for an exception. Especially the time-critical | for loop can test very cheaply for an exception. Plus, if the exception is both raised and caught in C, it is never instantiated, so exception matching is a pointer compare. I know this isn't the case with user defined iterators (since Python's raise semantics is to instantiate the exception), but it helps. -Barry From guido at digicool.com Fri Jun 22 19:12:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 22 Jun 2001 13:12:20 -0400 Subject: [Python-Dev] Python 2.0.1 released! Message-ID: <200106221712.f5MHCLF07192@odiug.digicool.com> I'm happy to announce Python 2.0.1 -- the final release of the first Python version in a long time whose license is fully compatible with the GPL: http://www.python.org/2.0.1/ I thank Moshe Zadka who did almost all of the work to make this a useful bugfix release, and then went incommunicado for several weeks. (I hope you're OK, Moshe!) Compared to the release candidate, we've fixed a few typos in the license, tweaked the documentation a bit, and fixed an indentation error in statcache.py; other than that, the release candidate was perfect. :-) Python 2.0 users should be able to replace their 2.0 installation with the 2.0.1 release without any ill effects; apart from the license change, we've only fixed bugs that didn't require us to make feature changes. The SRE package (regular expression matching, used by the "re" module) was brought in line with the version distributed with Python 2.1; this is stable feature-wise but much improved bug-wise. For the full scoop, see the release notes on SourceForge: http://sourceforge.net/project/shownotes.php?release_id=40616 Python 2.1 users can ignore this release, unless they have an urgent need for a GPL-compatible Python version and are willing to downgrade. Rest assured that we're planning a bugfix release there too: I expect that Python 2.1.1 will be released within a month, with the same GPL-compatible license. (Right, Thomas?) We don't intend to build RPMs for 2.0.1. If someone else is interested in doing so, we can link to them. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri Jun 22 19:21:03 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 22 Jun 2001 13:21:03 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622074317.A22058@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Is "raise StopIteration" an abuse of exceptions? I only care whether it works . It certainly came as a surprise to me, though, that I'm going to need to fiddle PEP 255 to explain that return in a generator isn't really equivalent to raise StopIteration (because a return in the try-part of a try/except should not trigger the except-part if the generator is pumped again). While a minor wart, it's a wart. If this stands, I'm going to look into changing gen_iternext() to determine whether eval_frame() finished by raising StopIteration, and mark the iterator as done if so. That is, force "return" and "raise StopIteration" to act the same inside generators, and to force "raise StopIteration" inside a generator to truly *mean* "I'm done" in all cases. This would also allow to avoid the proposed special-casing of generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since it's a problem unique to generators, this simply should not be eval_frame's problem to solve -- if generators create the problem, generators should pay to solve it). > Why can we not use "return StopIteration" to signal the end of an > iterator? Just explained why not yesterday, and you did two sentences later . > .... > This could be fixed in most causes by changing the tp_iternext > protocol. Something like: > > int tp_iternext(PyObject *it, PyObject **item) > > were the return value is 1, 0, or -1. Meaning 13, 42, and 666 respectively ? That is, one for "error", one for "OK, and item is the next value", and one for "no error but no next value either -- this iterator terminated normally"? That could work. At one point during the development of the iterator PEP, Guido had some code like that in the internals, on *top* of the exception business. It was clumsy then because redundant. At the level of Python code, how would a user spell "end of iteration"? Would iterators need to return a 2-two tuple in all non-exception cases then, e.g. a (next_value, i_am_done_flag) pair? Or would Python-level iterators simply be unable to return StopIteration as a normal value? > IOW, StopIteration would not have to come into the protocol if the > object implemented tp_iternext. All iterable objects in 2.2 implement tp_iternext, although sometimes it's a Miranda tp_iternext (i.e., one created for an object that doesn't supply its own), so that shouldn't be a worry. All in all, I'm -0 on changing the exception approach -- it's worked very well so far. From thomas at xs4all.net Fri Jun 22 20:02:59 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 22 Jun 2001 20:02:59 +0200 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: References: Message-ID: <20010622200259.N8098@xs4all.nl> On Fri, Jun 22, 2001 at 01:21:03PM -0400, Tim Peters wrote: > If this stands, I'm going to look into > changing gen_iternext() to determine whether eval_frame() finished by > raising StopIteration, and mark the iterator as done if so. That is, force > "return" and "raise StopIteration" to act the same inside generators, and to > force "raise StopIteration" inside a generator to truly *mean* "I'm done" in > all cases. This would also allow to avoid the proposed special-casing of > generators at the tail end of eval_frame() (yes, I'm anal <0.9 wink>: since > it's a problem unique to generators, this simply should not be eval_frame's > problem to solve -- if generators create the problem, generators should pay > to solve it). I don't get this. Currently, (unless Just checked in his patch) generators work in exactly that way: the compiler compiles 'return' into 'raise StopIteration' if it encounters it inside a generator, and into a regular return otherwise. Why would you ask for the patch Just provided, and then change it back ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Fri Jun 22 20:11:13 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 22 Jun 2001 14:11:13 -0400 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <20010622200259.N8098@xs4all.nl> Message-ID: [Thomas Wouters] > I don't get this. Currently, (unless Just checked in his patch) > generators work in exactly that way: the compiler compiles 'return' > into 'raise StopIteration' if it encounters it inside a generator, > and into a regular return otherwise. Yes. The part about analyzing the return value inside gen_iternext() would be the only change from the status quo. > Why would you ask for the patch Just provided, and then change it back ? I wouldn't. I asked *you* for a patch (which I haven't yet applied, but will) in a different area, but Just's patch was his own initiative. I hesitated on that one for reasons beyond just lack of time to get to it, and I'm still reluctant to accept it. My msg sketched an alternative to that patch. Note that Just has also (very recently) sketched another alternative, but on the Iterators list instead. just-isn't-in-need-of-defense-because-he-isn't-being-abused-ly y'rs - tim From fdrake at beowolf.digicool.com Fri Jun 22 20:31:44 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:31:44 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010622183144.C6A5428927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. From paulp at ActiveState.com Fri Jun 22 20:37:01 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 22 Jun 2001 11:37:01 -0700 Subject: [Python-Dev] ...und zen, ze world! References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> Message-ID: <3B33904D.F821FE36@ActiveState.com> > > Interesting that there's as much Perl as assembly code, > and more Fortran than Python :-). The Fortran is basically one big package: LAPACK. A bunch of the Python is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd improve our numbers quite a bit. :) -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From esr at thyrsus.com Fri Jun 22 20:46:11 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Fri, 22 Jun 2001 14:46:11 -0400 Subject: [Python-Dev] ...und zen, ze world! In-Reply-To: <3B33904D.F821FE36@ActiveState.com>; from paulp@ActiveState.com on Fri, Jun 22, 2001 at 11:37:01AM -0700 References: <930BBCA4CEBBD411BE6500508BB3328F2E26E8@nsamcanms1.ca.baltimore.com> <3B33904D.F821FE36@ActiveState.com> Message-ID: <20010622144611.A15388@thyrsus.com> Paul Prescod : > > Interesting that there's as much Perl as assembly code, > > and more Fortran than Python :-). > > The Fortran is basically one big package: LAPACK. A bunch of the Python > is 4Suite. If we got Red Hat to ship Zope (or even Python 2.1!) we'd > improve our numbers quite a bit. :) I'm working on it. -- Eric S. Raymond The whole of the Bill [of Rights] is a declaration of the right of the people at large or considered as individuals... It establishes some rights of the individual as unalienable and which consequently, no majority has a right to deprive them of. -- Albert Gallatin, Oct 7 1789 From fdrake at beowolf.digicool.com Fri Jun 22 20:53:37 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 14:53:37 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622185337.BE51228927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Lots of smallish updates and corrections, moved the license statements to an appendix. This version includes some contributed changes to the documentation for the cmath module. To make the LaTeX to HTML conversion work, I have made the resulting HTML contain entity references for the "plus/minus" and "infinity" symbols (± and ∞, respectively). These may be problematic for some browsers. Please let me know how it looks on your browser by sending an email to python-docs at python.org. Be sure to state your browser name and version, and what operating system you are using. Thanks! http://python.sourceforge.net/devel-docs/lib/module-cmath.html From nas at python.ca Fri Jun 22 22:13:14 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 22 Jun 2001 13:13:14 -0700 Subject: [Python-Dev] why not "return StopIteration"? In-Reply-To: <200106221619.f5MGJY306866@odiug.digicool.com>; from guido@digicool.com on Fri, Jun 22, 2001 at 12:19:34PM -0400 References: <200106221619.f5MGJY306866@odiug.digicool.com> Message-ID: <20010622131314.A22978@glacier.fnational.com> Guido van Rossum wrote: > This is treated extensively in the discussion section of the > iterators-PEP Ah. I don't remember reading that part or seeing the discussion. Sorry I brought it up. Neil From fdrake at beowolf.digicool.com Fri Jun 22 22:52:48 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Fri, 22 Jun 2001 16:52:48 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010622205248.6290128927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Changed the revised cmath documentation to use "j" as a suffix for complex literals instead of using "i" as a prefix; this is more similar to Python. Changed the font of the suffix to match that used elsewhere in the documentation. This should be a little more readable, but does not change any potential browser compatibility issues, so I still need reports of compatibility or non-compatibility. See my prelimiary report on the topic at: http://mail.python.org/pipermail/doc-sig/2001-June/001940.html From arigo at ulb.ac.be Sat Jun 23 10:13:04 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Sat, 23 Jun 2001 10:13:04 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <20010622071846.A7014@craie.housenet> Message-ID: Hello Jeff, On Fri, 22 Jun 2001, Jeff Epler wrote: > What are you using to generate code? I am generating pseudo-code, which is interpreted by a C module. (With real assembler code, it would of course be much faster, but it was just simpler for the moment.) > How would you compare the > sophistication of your type inference system to the one I've outlined > above? Yours is much more complete, but runs statically. Mine works at run-time. As explained in detail in the readme file, my plan is not to make a "compiler" in the usual sense. I actually have no type inferences; I just collect at run time what types are used at what places, and generate (and possibly modify) the generated code according to that information. (More about it later.) A bientot, Armin. From tim.one at home.com Sat Jun 23 11:17:54 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 23 Jun 2001 05:17:54 -0400 Subject: [Python-Dev] PEP 255: Simple Generators, Revised Posting In-Reply-To: Message-ID: Major revision: more details about exceptions, return vs StopIteration, and interactions with try/except/finally; more Q&A; and a BDFL Pronouncement. The reference implementation appears solid and works as described here in all respects, so I expect this will be the last major revision (and so also last full posting) of this PEP. The output below is in ndiff format (see Tools/scripts/ndiff.py in your Python distribution). Just the new text can be seen in HTML form here: http://python.sf.net/peps/pep-0255.html "Feature discussions" should take place primarily on the Python Iterators list: mailto:python-iterators at lists.sourceforge.net Implementation discussions may wander in and out of Python-Dev too. PEP: 255 Title: Simple Generators - Version: $Revision: 1.3 $ ? ^ + Version: $Revision: 1.12 $ ? ^^ Author: nas at python.ca (Neil Schemenauer), tim.one at home.com (Tim Peters), magnus at hetland.org (Magnus Lie Hetland) Discussion-To: python-iterators at lists.sourceforge.net Status: Draft Type: Standards Track Requires: 234 Created: 18-May-2001 Python-Version: 2.2 - Post-History: 14-Jun-2001 + Post-History: 14-Jun-2001, 23-Jun-2001 ? +++++++++++++ Abstract This PEP introduces the concept of generators to Python, as well as a new statement used in conjunction with them, the "yield" statement. Motivation When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced. For example, tokenize.py in the standard library takes this approach: the caller must pass a "tokeneater" function to tokenize(), called whenever tokenize() finds the next token. This allows tokenize to be coded in a natural way, but programs calling tokenize are typically convoluted by the need to remember between callbacks which token(s) were seen last. The tokeneater function in tabnanny.py is a good example of that, maintaining a state machine in global variables, to remember across callbacks what it has already seen and what it hopes to see next. This was difficult to get working correctly, and is still difficult for people to understand. Unfortunately, that's typical of this approach. An alternative would have been for tokenize to produce an entire parse of the Python program at once, in a large list. Then tokenize clients could be written in a natural way, using local variables and local control flow (such as loops and nested if statements) to keep track of their state. But this isn't practical: programs can be very large, so no a priori bound can be placed on the memory needed to materialize the whole parse; and some tokenize clients only want to see whether something specific appears early in the program (e.g., a future statement, or, as is done in IDLE, just the first indented statement), and then parsing the whole program first is a severe waste of time. Another alternative would be to make tokenize an iterator[1], delivering the next token whenever its .next() method is invoked. This is pleasant for the caller in the same way a large list of results would be, but without the memory and "what if I want to get out early?" drawbacks. However, this shifts the burden on tokenize to remember *its* state between .next() invocations, and the reader need only glance at tokenize.tokenize_loop() to realize what a horrid chore that would be. Or picture a recursive algorithm for producing the nodes of a general tree structure: to cast that into an iterator framework requires removing the recursion manually and maintaining the state of the traversal by hand. A fourth option is to run the producer and consumer in separate threads. This allows both to maintain their states in natural ways, and so is pleasant for both. Indeed, Demo/threads/Generator.py in the Python source distribution provides a usable synchronized-communication class for doing that in a general way. This doesn't work on platforms without threads, though, and is very slow on platforms that do (compared to what is achievable without threads). A final option is to use the Stackless[2][3] variant implementation of Python instead, which supports lightweight coroutines. This has much the same programmatic benefits as the thread option, but is much more efficient. However, Stackless is a controversial rethinking of the Python core, and it may not be possible for Jython to implement the same semantics. This PEP isn't the place to debate that, so suffice it to say here that generators provide a useful subset of Stackless functionality in a way that fits easily into the current CPython implementation, and is believed to be relatively straightforward for other Python implementations. That exhausts the current alternatives. Some other high-level languages provide pleasant solutions, notably iterators in Sather[4], which were inspired by iterators in CLU; and generators in Icon[5], a novel language where every expression "is a generator". There are differences among these, but the basic idea is the same: provide a kind of function that can return an intermediate result ("the next value") to its caller, but maintaining the function's local state so that the function can be resumed again right where it left off. A very simple example: def fib(): a, b = 0, 1 while 1: yield b a, b = b, a+b When fib() is first invoked, it sets a to 0 and b to 1, then yields b back to its caller. The caller sees 1. When fib is resumed, from its point of view the yield statement is really the same as, say, a print statement: fib continues after the yield with all local state intact. a and b then become 1 and 1, and fib loops back to the yield, yielding 1 to its invoker. And so on. From fib's point of view it's just delivering a sequence of results, as if via callback. But from its caller's point of view, the fib invocation is an iterable object that can be resumed at will. As in the thread approach, this allows both sides to be coded in the most natural ways; but unlike the thread approach, this can be done efficiently and on all platforms. Indeed, resuming a generator should be no more expensive than a function call. The same kind of approach applies to many producer/consumer functions. For example, tokenize.py could yield the next token instead of invoking a callback function with it as argument, and tokenize clients could iterate over the tokens in a natural way: a Python generator is a kind of Python iterator[1], but of an especially powerful kind. - Specification + Specification: Yield ? ++++++++ A new statement is introduced: yield_stmt: "yield" expression_list "yield" is a new keyword, so a future statement[8] is needed to phase - this in. [XXX spell this out] + this in. [XXX spell this out -- but new keywords have ripple effects + across tools too, and it's not clear this can be forced into the future + framework at all -- it's not even clear that Python's parser alone can + be taught to swing both ways based on a future stmt] The yield statement may only be used inside functions. A function that - contains a yield statement is called a generator function. + contains a yield statement is called a generator function. A generator ? +++++++++++++ + function is an ordinary function object in all respects, but has the + new CO_GENERATOR flag set in the code object's co_flags member. When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed. Instead a generator-iterator object is returned; this conforms to the iterator protocol[6], so in particular can be used in for-loops in a natural way. Note that when the intent is clear from context, the unqualified name "generator" may be used to refer either to a generator-function or a generator- iterator. Each time the .next() method of a generator-iterator is invoked, the code in the body of the generator-function is executed until a yield or return statement (see below) is encountered, or until the end of the body is reached. If a yield statement is encountered, the state of the function is frozen, and the value of expression_list is returned to .next()'s caller. By "frozen" we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time .next() is invoked, the function can proceed exactly as if the yield statement were just another external call. + Restriction: A yield statement is not allowed in the try clause of a + try/finally construct. The difficulty is that there's no guarantee + the generator will ever be resumed, hence no guarantee that the finally + block will ever get executed; that's too much a violation of finally's + purpose to bear. + + + Specification: Return + A generator function can also contain return statements of the form: "return" Note that an expression_list is not allowed on return statements in the body of a generator (although, of course, they may appear in the bodies of non-generator functions nested within the generator). - When a return statement is encountered, nothing is returned, but a + When a return statement is encountered, control proceeds as in any + function return, executing the appropriate finally clauses (if any - StopIteration exception is raised, signalling that the iterator is ? ------------ + exist). Then a StopIteration exception is raised, signalling that the ? ++++++++++++++++ - exhausted. The same is true if control flows off the end of the + iterator is exhausted. A StopIteration exception is also raised if + control flows off the end of the generator without an explict return. + - function. Note that return means "I'm done, and have nothing ? ----------- + Note that return means "I'm done, and have nothing interesting to ? +++++++++++++++ - interesting to return", for both generator functions and non-generator ? --------------- + return", for both generator functions and non-generator functions. ? +++++++++++ - functions. + + Note that return isn't always equivalent to raising StopIteration: the + difference lies in how enclosing try/except constructs are treated. + For example, + + >>> def f1(): + ... try: + ... return + ... except: + ... yield 1 + >>> print list(f1()) + [] + + because, as in any function, return simply exits, but + + >>> def f2(): + ... try: + ... raise StopIteration + ... except: + ... yield 42 + >>> print list(f2()) + [42] + + because StopIteration is captured by a bare "except", as is any + exception. + + + Specification: Generators and Exception Propagation + + If an unhandled exception-- including, but not limited to, + StopIteration --is raised by, or passes through, a generator function, + then the exception is passed on to the caller in the usual way, and + subsequent attempts to resume the generator function raise + StopIteration. In other words, an unhandled exception terminates a + generator's useful life. + + Example (not idiomatic but to illustrate the point): + + >>> def f(): + ... return 1/0 + >>> def g(): + ... yield f() # the zero division exception propagates + ... yield 42 # and we'll never get here + >>> k = g() + >>> k.next() + Traceback (most recent call last): + File "", line 1, in ? + File "", line 2, in g + File "", line 2, in f + ZeroDivisionError: integer division or modulo by zero + >>> k.next() # and the generator cannot be resumed + Traceback (most recent call last): + File "", line 1, in ? + StopIteration + >>> + + + Specification: Try/Except/Finally + + As noted earlier, yield is not allowed in the try clause of a try/ + finally construct. A consequence is that generators should allocate + critical resources with great care. There is no restriction on yield + otherwise appearing in finally clauses, except clauses, or in the try + clause of a try/except construct: + + >>> def f(): + ... try: + ... yield 1 + ... try: + ... yield 2 + ... 1/0 + ... yield 3 # never get here + ... except ZeroDivisionError: + ... yield 4 + ... yield 5 + ... raise + ... except: + ... yield 6 + ... yield 7 # the "raise" above stops this + ... except: + ... yield 8 + ... yield 9 + ... try: + ... x = 12 + ... finally: + ... yield 10 + ... yield 11 + >>> print list(f()) + [1, 2, 4, 5, 8, 9, 10, 11] + >>> Example # A binary tree class. class Tree: def __init__(self, label, left=None, right=None): self.label = label self.left = left self.right = right def __repr__(self, level=0, indent=" "): s = level*indent + `self.label` if self.left: s = s + "\n" + self.left.__repr__(level+1, indent) if self.right: s = s + "\n" + self.right.__repr__(level+1, indent) return s def __iter__(self): return inorder(self) # Create a Tree from a list. def tree(list): n = len(list) if n == 0: return [] i = n / 2 return Tree(list[i], tree(list[:i]), tree(list[i+1:])) # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x # Show it off: create a tree. t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # Print the nodes of the tree in in-order. for x in t: print x, print # A non-recursive generator. def inorder(node): stack = [] while node: while node.left: stack.append(node) node = node.left yield node.label while not node.right: try: node = stack.pop() except IndexError: return yield node.label node = node.right # Exercise the non-recursive generator. for x in t: print x, print + Both output blocks display: + + A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + Q & A + Q. Why not a new keyword instead of reusing "def"? + + A. See BDFL Pronouncements section below. + - Q. Why a new keyword? Why not a builtin function instead? + Q. Why a new keyword for "yield"? Why not a builtin function instead? ? ++++++++++++ A. Control flow is much better expressed via keyword in Python, and yield is a control construct. It's also believed that efficient implementation in Jython requires that the compiler be able to determine potential suspension points at compile-time, and a new - keyword makes that easy. + keyword makes that easy. The CPython referrence implementation also + exploits it heavily, to detect which functions *are* generator- + functions (although a new keyword in place of "def" would solve that + for CPython -- but people asking the "why a new keyword?" question + don't want any new keyword). + + Q: Then why not some other special syntax without a new keyword? For + example, one of these instead of "yield 3": + + return 3 and continue + return and continue 3 + return generating 3 + continue return 3 + return >> , 3 + from generator return 3 + return >> 3 + return << 3 + >> 3 + << 3 + + A: Did I miss one ? Out of hundreds of messages, I counted two + suggesting such an alternative, and extracted the above from them. + It would be nice not to need a new keyword, but nicer to make yield + very clear -- I don't want to have to *deduce* that a yield is + occurring from making sense of a previously senseless sequence of + keywords or operators. Still, if this attracts enough interest, + proponents should settle on a single consensus suggestion, and Guido + will Pronounce on it. + + Q. Why allow "return" at all? Why not force termination to be spelled + "raise StopIteration"? + + A. The mechanics of StopIteration are low-level details, much like the + mechanics of IndexError in Python 2.1: the implementation needs to + do *something* well-defined under the covers, and Python exposes + these mechanisms for advanced users. That's not an argument for + forcing everyone to work at that level, though. "return" means "I'm + done" in any kind of function, and that's easy to explain and to use. + Note that "return" isn't always equivalent to "raise StopIteration" + in try/except construct, either (see the "Specification: Return" + section). + + Q. Then why not allow an expression on "return" too? + + A. Perhaps we will someday. In Icon, "return expr" means both "I'm + done", and "but I have one final useful value to return too, and + this is it". At the start, and in the absence of compelling uses + for "return expr", it's simply cleaner to use "yield" exclusively + for delivering values. + + + BDFL Pronouncements + + Issue: Introduce another new keyword (say, "gen" or "generator") in + place of "def", or otherwise alter the syntax, to distinguish + generator-functions from non-generator functions. + + Con: In practice (how you think about them), generators *are* + functions, but with the twist that they're resumable. The mechanics of + how they're set up is a comparatively minor technical issue, and + introducing a new keyword would unhelpfully overemphasize the + mechanics of how generators get started (a vital but tiny part of a + generator's life). + + Pro: In reality (how you think about them), generator-functions are + actually factory functions that produce generator-iterators as if by + magic. In this respect they're radically different from non-generator + functions, acting more like a constructor than a function, so reusing + "def" is at best confusing. A "yield" statement buried in the body is + not enough warning that the semantics are so different. + + BDFL: "def" it stays. No argument on either side is totally + convincing, so I have consulted my language designer's intuition. It + tells me that the syntax proposed in the PEP is exactly right - not too + hot, not too cold. But, like the Oracle at Delphi in Greek mythology, + it doesn't tell me why, so I don't have a rebuttal for the arguments + against the PEP syntax. The best I can come up with (apart from + agreeing with the rebuttals ... already made) is "FUD". If this had + been part of the language from day one, I very much doubt it would have + made Andrew Kuchling's "Python Warts" page. Reference Implementation - A preliminary patch against the CVS Python source is available[7]. + The current implementation, in a preliminary state (no docs and no + focused tests), is part of Python's CVS development tree[9]. + Using this requires that you build Python from source. + + This was derived from an earlier patch by Neil Schemenauer[7]. Footnotes and References [1] PEP 234, http://python.sf.net/peps/pep-0234.html [2] http://www.stackless.com/ [3] PEP 219, http://python.sf.net/peps/pep-0219.html [4] "Iteration Abstraction in Sather" Murer , Omohundro, Stoutamire and Szyperski http://www.icsi.berkeley.edu/~sather/Publications/toplas.html [5] http://www.cs.arizona.edu/icon/ [6] The concept of iterators is described in PEP 234 http://python.sf.net/peps/pep-0234.html [7] http://python.ca/nas/python/generator.diff [8] http://python.sf.net/peps/pep-0236.html + [9] To experiment with this implementation, check out Python from CVS + according to the instructions at + http://sf.net/cvs/?group_id=5470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From mal at lemburg.com Sat Jun 23 12:54:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 23 Jun 2001 12:54:27 +0200 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B347563.9BBEF858@lemburg.com> Armin Rigo wrote: > > Hello Jeff, > > On Fri, 22 Jun 2001, Jeff Epler wrote: > > What are you using to generate code? > > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) > > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Sound like you are using (re)compiling on-the-fly -- that would certainly be a very reasonable way to deal with Python's dynamic object world. It would also solve the problems of static compilers with type inference nicely. A very nice idea ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Sat Jun 23 16:11:03 2001 From: skip at pobox.com (Skip Montanaro) Date: Sat, 23 Jun 2001 09:11:03 -0500 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B347563.9BBEF858@lemburg.com> References: <3B347563.9BBEF858@lemburg.com> Message-ID: <15156.41847.86431.594106@beluga.mojam.com> mal> Sound like you are using (re)compiling on-the-fly ... This is what the Self compiler did, though I don't know if its granularity was as fine as I understand psyco's is from reading its README file. It's been awhile since I read through that stuff, but I seem to recall it would compile functions to machine code only if they were heavily executed. It also did a lot of type inferencing. Skip From guido at digicool.com Sat Jun 23 17:58:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 23 Jun 2001 11:58:40 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Sat, 23 Jun 2001 10:13:04 +0200." References: Message-ID: <20010623160024.QWCF14539.femail14.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > I am generating pseudo-code, which is interpreted by a C module. (With > real assembler code, it would of course be much faster, but it was just > simpler for the moment.) This has great promise! Once you have an interpreter for some kind of pseudo-code, it's always possible to tweak the interpreter or the pseudo-code to make it faster. And you can make another jump to machine code to make it a lot faster. There was a project (p2c or python2c) that tried to compile an entire Python program to C code that was mostly just calling the Python runtime C API functions. It also obtained about a factor of 2 in speed-up, but its problem was (if I recall) that even a small Python module translated into hundreds of thousands of lines of C -- think what that would do to locality. Since you have already obtained the same speedup with your approach, I think there's great promise. Count on sending in a paper for the next Python conference! > > How would you compare the > > sophistication of your type inference system to the one I've outlined > > above? > > Yours is much more complete, but runs statically. Mine works at run-time. > As explained in detail in the readme file, my plan is not to make a > "compiler" in the usual sense. I actually have no type inferences; I just > collect at run time what types are used at what places, and generate (and > possibly modify) the generated code according to that information. Very cool: a Python JIT compiler. > (More about it later.) Can't wait! --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at beowolf.digicool.com Sun Jun 24 04:41:04 2001 From: fdrake at beowolf.digicool.com (Fred Drake) Date: Sat, 23 Jun 2001 22:41:04 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010624024104.A757728927@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ A couple of small updates, including spelling the keywords correctly in the language reference. This version brings back the hyperlinked grammar productions I played around with earlier. They still need work, but they are somewhat better than plain text. From m.favas at per.dem.csiro.au Sun Jun 24 06:25:27 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 12:25:27 +0800 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <3B356BB7.9BE71569@per.dem.csiro.au> Socketmodule at the moment has multiple problems after the changes to handle IPv6: 1: socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These functions both use offsetof(), which is defined (on my system, at least) in stddef.h. The #include for this file is inside a #if 0 block. 2: #including this file allow the compile to complete without error. However, there is no Makefile dependency on these two files, once socketmodule.o has been built. Changes to either of the get{name,addr}info.c files will not cause socketmodule to be rebuilt. 3: The socket module still does not work, however, since it refers to an unresolved symbol inet_pton >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: inet_pton inet_pton is called in two places in getaddrinfo.c... there's likely to be other platforms besides Tru64 Unix that do not have this function. -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Sun Jun 24 06:48:32 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 24 Jun 2001 00:48:32 -0400 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B356BB7.9BE71569@per.dem.csiro.au> Message-ID: ]Mark Favas] > Socketmodule at the moment has multiple problems after the changes to > handle IPv6: > > 1: > socketmodule.c now #includes getnameinfo.c and getaddrinfo.c. These > functions both use offsetof(), which is defined (on my system, at least) > in stddef.h. The #include for this file is inside a #if 0 block. > > 2: > #including this file allow the compile to complete without error. > However, there is no Makefile dependency on these two files, once > socketmodule.o has been built. Changes to either of the > get{name,addr}info.c files will not cause socketmodule to be rebuilt. > > 3: > The socket module still does not work, however, since it refers to an > unresolved symbol inet_pton > >>> import socket > Traceback (most recent call last): > File "", line 1, in ? > File > "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Li > b/socket.py", > line 41, in ? > from _socket import * > ImportError: Unresolved symbol in > /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/bui > ld/lib.osf1-V4.0-alpha-2.2/_socket.so: > inet_pton > > inet_pton is called in two places in getaddrinfo.c... there's likely to > be other platforms besides Tru64 Unix that do not have this function. If it's any consolation, the Windows build is in worse shape: socketmodule.c Modules\addrinfo.h(123) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(125) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\addrinfo.h(129) : error C2632: 'long' followed by 'long' is illegal Modules\getaddrinfo.c(109) : warning C4013: 'offsetof' undefined; assuming extern returning int Modules\getaddrinfo.c(109) : error C2143: syntax error : missing ')' before 'type' Modules\getaddrinfo.c(109) : error C2099: initializer is not a constant Modules\getaddrinfo.c(109) : error C2059: syntax error : ')' Modules\getaddrinfo.c(111) : error C2059: syntax error : ',' Modules\getaddrinfo.c(407) : warning C4013: 'inet_pton' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_MULTICAST' undefined; assuming extern returning int Modules\getaddrinfo.c(414) : warning C4013: 'IN_EXPERIMENTAL' undefined; assuming extern returning int Modules\getaddrinfo.c(417) : error C2065: 'IN_LOOPBACKNET' : undeclared identifier Modules\getaddrinfo.c(417) : warning C4018: '==' : signed/unsigned mismatch Modules\getaddrinfo.c(531) : error C2373: 'WSAGetLastError' : redefinition; different type modifiers C:\VC98\INCLUDE\winsock.h(787) : see declaration of 'WSAGetLastError' Modules\getnameinfo.c(66) : error C2143: syntax error : missing ')' before 'type' Modules\getnameinfo.c(66) : error C2099: initializer is not a constant Modules\getnameinfo.c(66) : error C2059: syntax error : ')' Modules\getnameinfo.c(67) : error C2059: syntax error : ',' Modules\getnameinfo.c(133) : warning C4013: 'snprintf' undefined; assuming extern returning int Modules\getnameinfo.c(153) : warning C4018: '==' : signed/unsigned mismatch Modules\getnameinfo.c(167) : warning C4013: 'inet_ntop' undefined; assuming extern returning int Modules\getnameinfo.c(168) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Modules\getnameinfo.c(200) : warning C4047: '==' : 'int ' differs in levels of indirection from 'void *' Martin should revert the changes to socketmodule.c until this has a prayer of working. From est at hyperreal.org Sun Jun 24 07:38:06 2001 From: est at hyperreal.org (est at hyperreal.org) Date: Sat, 23 Jun 2001 22:38:06 -0700 (PDT) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: "from Armin Rigo at Jun 22, 2001 01:00:34 pm" Message-ID: <20010624053806.16277.qmail@hyperreal.org> Am I seeing things or does it actually speed up five to six times on my machine? Very exciting! timing specializing_call(, 2000)... result 1952145856 in 4.94 seconds timing specializing_call(, 2000)... result 1952145856 in 3.91 seconds timing f(2000,)... result 1952145856 in 25.17 seconds I wonder to what extent this approach can be applied to method calls. My analysis of my performance-bound Python apps convinces me that those are a major bottleneck for me. About a fifth of their time seems to go into creating the bound method object (reducable by caching them on the instance)..another fifth into allocating the memory for the frame object (ameliorated by pymalloc). As for the rest, I really don't know. E From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 10:34:06 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:34:06 +0200 Subject: [Python-Dev] gethostbyname2 Message-ID: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> The IPv6 patch proposes to introduce a new socket function, socket.gethostbyname2(name, af). This becomes necessary as a name might have both an IPv4 and an IPv6 address. One alternative for providing such API is to get socket.gethostbyname an optional second argument (the address family). itojun's rationale for calling it gethostbyname2 is that the C API, as defined in RFC 2133. Which of these alternatives would you prefer? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 10:20:31 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 10:20:31 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> After integrating the first chunk of IPv6 changes, Tim Peters quickly found that they won't compile on Windows - even though this was the least-critical part of the patch. Specifically, this code emulates the getaddrinfo and getnameinfo calls, which will be exposed to Python programs in a later patch. Therefore, it is essential that they are available on every system, either directly or through emulation. For Windows, one option is to use the Microsoft-provided emulation, which is available from http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp To use this emulation, only the header files of the package are required; it is not necessary to actually install the IPv6 preview on the system. The MS emulation will try to load a few DLLs which are known to provide getaddrinfo. If neither DLL is found, the code in the header file falls back to an emulation. That way, the resulting socket.pyd would use the true API function on installations that provide them, and the emulation on all other systems. The only requirement for building Python is then that the header file from the technology preview is available on the build machine (tpipv6.h). It may be that the header file is also included in recent SDK releases, I haven't checked. Is such a requirement acceptable for building the socket module on Windows? Regards, Martin From m.favas at per.dem.csiro.au Sun Jun 24 10:58:42 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sun, 24 Jun 2001 16:58:42 +0800 Subject: [Python-Dev] IPv6 support Message-ID: <3B35ABC2.11F3B261@per.dem.csiro.au> IPv6 support may be nice, and even desirable. However, supporting IPv6 should not come at the cost of causing problems either in compilation or at runtime on those platforms that do not support IPv6 natively. Requiring additional preview code or non-standardly-supplied packages to be installed is fine if people _want_ to take advantage of the new IPv6 functionality, but _not_ fine if this IPv6 functionality is not required. IPv4 support should not require the installation of additional IPv6 packages. Well, that's my 2 cent's worth (even if that's only 1 cent US ). -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From pf at artcom-gmbh.de Sun Jun 24 11:20:10 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 24 Jun 2001 11:20:10 +0200 (MEST) Subject: foobar2(), foobar3(), ... (was Re: [Python-Dev] gethostbyname2) In-Reply-To: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> from "Martin v. Loewis" at "Jun 24, 2001 10:34:06 am" Message-ID: Martin v. Loewis: > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? IMO: The possibility to add new keyword arguments with default values is one of the major strengths Python has compared to other programming languages. Especially in the scenario, where an existing mature API has to be enhanced later with added features: In such a situation I always prefer APIs with fewer functions (may be with large lists of optional arguments) compared to APIs containing a bunch of functions or methods called 'popen2()', 'gethostbyname2()' and so on. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From tim.one at home.com Sun Jun 24 12:51:40 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 24 Jun 2001 06:51:40 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. Mark Favas also reported failure on a Unix box -- we can't leave the CVS tree in an unusable state, and Mark in particular provides uniquely valuable feedback from his collection of Platforms from Mars . I #ifdef'ed out the offending includes on Windows for now, but that doesn't help Mark. > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp It says it's unsupported preview software for Win2K only. Since even the first *real* release of anything from MS sucks, I wouldn't touch this unless I absolutely had to. But I don't have any cycles for this project anyway, so this: > ... > Is such a requirement acceptable for building the socket module on > Windows? will have to be addressed by someone who does. Is anyone, e.g., at ActiveState keen on this? From mal at lemburg.com Sun Jun 24 13:06:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 13:06:19 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> Message-ID: <3B35C9AB.2D1D2185@lemburg.com> "Martin v. Loewis" wrote: > > After integrating the first chunk of IPv6 changes, Tim Peters quickly > found that they won't compile on Windows - even though this was the > least-critical part of the patch. > > Specifically, this code emulates the getaddrinfo and getnameinfo > calls, which will be exposed to Python programs in a later patch. > Therefore, it is essential that they are available on every system, > either directly or through emulation. > > For Windows, one option is to use the Microsoft-provided emulation, > which is available from > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp > > To use this emulation, only the header files of the package are > required; it is not necessary to actually install the IPv6 preview on > the system. The MS emulation will try to load a few DLLs which are > known to provide getaddrinfo. If neither DLL is found, the code in the > header file falls back to an emulation. That way, the resulting > socket.pyd would use the true API function on installations that > provide them, and the emulation on all other systems. > > The only requirement for building Python is then that the header file > from the technology preview is available on the build machine > (tpipv6.h). It may be that the header file is also included in recent > SDK releases, I haven't checked. > > Is such a requirement acceptable for building the socket module on > Windows? Isn't this the MS SDK that has the new "Open Source" license clause in it ?! If yes, I very much doubt that this approach would be feasable for Python... http://msdn.microsoft.com/downloads/eula_mit.htm Quote from a recent posting by Steven Majewski on c.l.p.: """ (c) Open Source. Recipients license rights to the Software are conditioned upon Recipient (i) not distributing such Software, in whole or in part, in conjunction with Potentially Viral Software (as defined below); and (ii) not using Potentially Viral Software (e.g. tools) to develop Recipient software which includes the Software, in whole or in part. For purposes of the foregoing, Potentially Viral Software means software which is licensed pursuant to terms that: (x) create, or purport to create, obligations for Microsoft with respect to the Software or (y) grant, or purport to grant, to any third party any rights to or immunities under Microsofts intellectual property or proprietary rights in the Software. By way of example but not limitation of the foregoing, Recipient shall not distribute the Software, in whole or in part, in conjunction with any Publicly Available Software. Publicly Available Software means each of (i) any software that contains, or is derived in any manner (in whole or in part) from, any software that is distributed as free software, open source software (e.g. Linux) or similar licensing or distribution models; and (ii) any software that requires as a condition of use, modification and/or distribution of such software that other software distributed with such software (A) be disclosed or distributed in source code form; (B) be licensed for the purpose of making derivative works; or (C) be redistributable at no charge. Publicly Available Software includes, without limitation, software licensed or distributed under any of the following licenses or distribution models, or licenses or distribution models similar to any of the following: (A) GNUs General Public License (GPL) or Lesser/Library GPL (LGPL), (B) The Artistic License (e.g., PERL), (C) the Mozilla Public License, (D) the Netscape Public License, (E) the Sun Community Source License (SCSL), and (F) the Sun Industry Standards License (SISL). """ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Sun Jun 24 15:23:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 24 Jun 2001 09:23:52 -0400 Subject: [Python-Dev] gethostbyname2 In-Reply-To: Your message of "Sun, 24 Jun 2001 10:34:06 +0200." <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> References: <200106240834.f5O8Y6t01609@mira.informatik.hu-berlin.de> Message-ID: <20010624132540.RTEI4013.femail3.sdc1.sfba.home.com@cj20424-a.reston1.va.home.com> > The IPv6 patch proposes to introduce a new socket function, > socket.gethostbyname2(name, af). This becomes necessary as a name > might have both an IPv4 and an IPv6 address. > > One alternative for providing such API is to get socket.gethostbyname > an optional second argument (the address family). itojun's rationale > for calling it gethostbyname2 is that the C API, as defined in RFC > 2133. > > Which of these alternatives would you prefer? Definitely an optional 2nd arg to gethostbyname() -- in C, you can't do tht, so they *had* to create a new function, but Python is more flexible. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Sun Jun 24 17:18:22 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sun, 24 Jun 2001 08:18:22 -0700 Subject: [Python-Dev] IPv6 and Windows References: Message-ID: <3B3604BE.7E2F6C6E@ActiveState.com> Tim Peters wrote: > > Is such a requirement acceptable for building the socket module on > > Windows? > > will have to be addressed by someone who does. Is anyone, e.g., at > ActiveState keen on this? Not as far as I know. I haven't looked at the patches, but couldn't we have the IPv6 code be #ifdef'ed out, so that those who care about IPv6 can periodically test it while the various OS-level libraries are ramped up over the next months/years, but w/o disturbing the 'current' builds? --david From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 19:00:43 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:00:43 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> (mal@lemburg.com) References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> Message-ID: <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> > > Is such a requirement acceptable for building the socket module on > > Windows? > > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No, this has a different license text, which can be seen on http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp On redistribution, it says # If you redistribute the SOFTWARE and/or your Source Modifications, # or any portion thereof as provided above, you agree: (i) to # distribute the SOFTWARE only in conjunction with, and as part of, # your Source Modifications which add significant functionality to the # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source # Modifications solely as part of your research and not in any # commercial product; (iii) the SOFTWARE and/or your Source # Modifications will not be distributed for profit; (iv) to retain all # branding, copyright and trademark notices included with the SOFTWARE # and include a copy of this EULA with any distribution of the # SOFTWARE, or any portion thereof; and (v) to indemnify, hold # harmless, and defend Microsoft from and against any claims or # lawsuits, including attorneys' fees, that arise or result from # the use or distribution of your Source Modifications. I don't know whether this is acceptable or not. Regards, Martin From mal at lemburg.com Sun Jun 24 20:08:13 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 24 Jun 2001 20:08:13 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> Message-ID: <3B362C8D.D3AECE3C@lemburg.com> "Martin v. Loewis" wrote: > > > > Is such a requirement acceptable for building the socket module on > > > Windows? > > > > Isn't this the MS SDK that has the new "Open Source" license > > clause in it ?! > > No, this has a different license text, which can be seen on > > http://msdn.microsoft.com/downloads/sdks/platform/tpipv6/download.asp > > On redistribution, it says > > # If you redistribute the SOFTWARE and/or your Source Modifications, > # or any portion thereof as provided above, you agree: (i) to > # distribute the SOFTWARE only in conjunction with, and as part of, > # your Source Modifications which add significant functionality to the > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > # Modifications solely as part of your research and not in any > # commercial product; (iii) the SOFTWARE and/or your Source > # Modifications will not be distributed for profit; (iv) to retain all > # branding, copyright and trademark notices included with the SOFTWARE > # and include a copy of this EULA with any distribution of the > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > # harmless, and defend Microsoft from and against any claims or > # lawsuits, including attorneys' fees, that arise or result from > # the use or distribution of your Source Modifications. > > I don't know whether this is acceptable or not. Most likely not: there are lots of commercial Python users out there who wouldn't like these clauses at all... we'd also lose the GPL compatibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 19:48:03 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 19:48:03 +0200 Subject: [Python-Dev] IPv6 and Windows Message-ID: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> > I haven't looked at the patches, but couldn't we have the IPv6 code > be #ifdef'ed out, so that those who care about IPv6 can periodically > test it while the various OS-level libraries are ramped up over the > next months/years, but w/o disturbing the 'current' builds? Not if we are going to introduce itojun's patch. In that patch, the IPv6 code *is* actually ifdef'ed out. It is getaddrinfo/getnameinfo that gives problems, which isn't IPv6 specific at all. The problem is that the library patches (httplib, ftplib, etc) do use getaddrinfo to find out how to contact a remote system, which is the right thing to do IMO. So even if the IPv6 support can be activated only if desired, getaddrinfo absolutely has to work. So the only question then is where we get an implementation of these functions if the system doesn't provide one. itojun has suggested the WIDE libraries; since they apparently don't compile on Windows, I've suggested the MS TP emulation. If the latter is not acceptable, we either have to fix the WIDE implementation to work on Windows also; As for the problems Mark reported: I think they can get fixed. Regards, Martin From thomas at xs4all.net Sun Jun 24 23:35:37 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Sun, 24 Jun 2001 23:35:37 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <20010624233537.R8098@xs4all.nl> On Sun, Jun 24, 2001 at 07:48:03PM +0200, Martin v. Loewis wrote: > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL support. I'm only comfortable with the IPv6 patch if it's optional, or can at least be disabled. I haven't looked at the patch, but why is getaddrinfo absolutely necessary, if the code works without it now, too ? > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; > As for the problems Mark reported: I think they can get fixed. What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I don't think it can't, it just takes more work. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 23:39:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:39:45 +0200 Subject: [Python-Dev] Problems with socketmodule (getnameinfo & getaddrinfo) Message-ID: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> > 1: socketmodule.c now #includes getnameinfo.c and > getaddrinfo.c. These functions both use offsetof(), which is defined > (on my system, at least) in stddef.h. That should be fixed now. stddef.h is included in socketmodule.c; if it is not available or does not define offsetof, an additional definition is provided. > 2. [...] Changes to either of the get{name,addr}info.c files will > not cause socketmodule to be rebuilt. I don't know how to solve this one. If distutils builds the modules, makefile dependencies won't help. > 3. The socket module still does not work, however, since it refers > to an unresolved symbol inet_pton I took the simplest solution that I could think of, delegating inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all other address families (AF_INET6 in particular). I've verified that this code does the same as the builtin functions on my Linux system; please let me know whether it compiles for you. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Sun Jun 24 23:56:48 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 24 Jun 2001 23:56:48 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <20010624233537.R8098@xs4all.nl> (message from Thomas Wouters on Sun, 24 Jun 2001 23:35:37 +0200) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> Message-ID: <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> > Why ? Why can't those parts be 'if it exists'-ed out ? We do it for SSL > support. I'm only comfortable with the IPv6 patch if it's optional, or can > at least be disabled. I haven't looked at the patch, but why is getaddrinfo > absolutely necessary, if the code works without it now, too ? getaddrinfo offers protocol-independent address lookup. It is necessary to use that API to support AF_INET and AF_INET6 transparently in application code. itojun proposes to change a number of standard library modules. Please have a look at the actual patch for details; the typical change will look like this (for httplib) diff -u -r1.35 httplib.py --- Lib/httplib.py 2001/06/01 16:25:38 1.35 +++ Lib/httplib.py 2001/06/24 04:41:48 @@ -357,10 +357,22 @@ def connect(self): """Connect to the host and port specified in __init__.""" - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - if self.debuglevel > 0: - print "connect: (%s, %s)" % (self.host, self.port) - self.sock.connect((self.host, self.port)) + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): + af, socktype, proto, canonname, sa = res + try: + self.sock = socket.socket(af, socktype, proto) + if self.debuglevel > 0: + print "connect: (%s, %s)" % (self.host, self.port) + self.sock.connect(sa) + except socket.error, msg: + if self.debuglevel > 0: + print 'connect fail:', (self.host, self.port) + self.sock.close() + self.sock = None + continue + break + if not self.sock: + raise socket.error, msg def close(self): """Close the connection to the HTTP server.""" As you can see, the modified code can simultaneously access both IPv4 and IPv6 hosts, and will pick whatever it can connect to best. Without getaddrinfo, httplib would continue to support IPv4 hosts only. The IPv6 support itself is absolutely optional. If it is not available, getaddrinfo will never return IPv6 addresses, or propose AF_INET6 as the address family. > What about the zillion other 'obscure' ports ? OS/2 ? Palm ? MacOS 9 ;) If > this patch can't be zero-impact-if-necessary, I'm a firm -1 on it. But I > don't think it can't, it just takes more work. Depends on what zero-impact-if-necessary means to you. The patch, as it stands, can be fixed to compile on all systems that are currently supported. It cannot be fixed to be taken completely out (unless you literally do that: take it out). I don't plan to fight for it too much. Please have a look at the code itself, and try to cooperate on integrating it. Don't reject it outright without having even looked at it. If I get strong rejections from everybody, I'll just withdraw it and feel sorry for the time I've already spent with it. Regards, Martin From m.favas at per.dem.csiro.au Mon Jun 25 00:16:25 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Mon, 25 Jun 2001 06:16:25 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> Message-ID: <3B3666B9.335DA17E@per.dem.csiro.au> [Martin v. Loewis] > > > 1: socketmodule.c now #includes getnameinfo.c and > > getaddrinfo.c. These functions both use offsetof(), which is defined > > (on my system, at least) in stddef.h. > > That should be fixed now. stddef.h is included in socketmodule.c; if > it is not available or does not define offsetof, an additional > definition is provided. Yes, this is fine now... > > > 2. [...] Changes to either of the get{name,addr}info.c files will > > not cause socketmodule to be rebuilt. > > I don't know how to solve this one. If distutils builds the modules, > makefile dependencies won't help. > > > 3. The socket module still does not work, however, since it refers > > to an unresolved symbol inet_pton > > I took the simplest solution that I could think of, delegating > inet_{pton,ntop} to inet_{ntoa,addr} for AF_INET, failing for all > other address families (AF_INET6 in particular). I've verified that > this code does the same as the builtin functions on my Linux system; > please let me know whether it compiles for you. > To get socketmodule.c to compile, I had to make a change to line 2963 so that the declaration of inet_pton matched the previous declaration on line 220 (changing char *src to const char *src). Still have problems though, due to the use of snprintf in getnameinfo.c: Python 2.2a0 (#444, Jun 25 2001, 05:58:17) [C] on osf1V4 Type "copyright", "credits" or "license" for more information. >>> import socket Traceback (most recent call last): File "", line 1, in ? File "/home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Lib/socket.py", line 41, in ? from _socket import * ImportError: Unresolved symbol in /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/build/lib.osf1-V4.0-alpha-2.2/_socket.so: snprintf Cheers, Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Mon Jun 25 07:02:30 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:02:30 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <3B35C9AB.2D1D2185@lemburg.com> Message-ID: >> http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp [MAL] > Isn't this the MS SDK that has the new "Open Source" license > clause in it ?! No. That was for the "Mobile Internet Toolkit" toolkit; no relation, AFAICT. > If yes, I very much doubt that this approach > would be feasable for Python... > > http://msdn.microsoft.com/downloads/eula_mit.htm From tim.one at home.com Mon Jun 25 07:14:17 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 01:14:17 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > So the only question then is where we get an implementation of these > functions if the system doesn't provide one. itojun has suggested the > WIDE libraries; since they apparently don't compile on Windows, I've > suggested the MS TP emulation. If the latter is not acceptable, we > either have to fix the WIDE implementation to work on Windows also; I don't have cycles for this, but will cheerily suggest that the WIDE problems didn't appear especially deep, just "the usual" careless brand of Unix+gcc+glibc specific coding. For example, HAVE_LONG_LONG is #define'd on Windows, but, just as in Python source, you can't *use* "long long" literally, you have to use the LONG_LONG macro instead. Then Windows doesn't have an offsetof() macro, or an snprintf() either. Etc. The code is in trouble exactly where it relies on platform-specific extensions to the std C language and library. Problems with those won't be unique to Windows, either, which is a deeper concern (but already well expressed by others). It would be nice if Python could contribue portability back to WIDE. That requires worker bees, though, and lots of x-platform testing. If it turns out we can't swing that, then support for this is premature, and we should wait, e.g., for WIDE to put more effort into porting their code. From just at letterror.com Mon Jun 25 08:55:17 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 08:55:17 +0200 Subject: [Python-Dev] os.path.normcase() in site.py Message-ID: <20010625085521-r01010600-9a6226c8@213.84.27.177> I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago. I think this is wrong: we should always try to *preserve* case. I see os.path.normcase() as a tool to be able to better compare two paths, but you shouldn't *store* paths this way. I for one am irritated when I see a path that doesn't have the proper case. The intention of makepath() in site.py seems good -- it turns all paths into absolute paths -- but is the normcase really neccesary? *** Please CC follow-ups to me, as I'm not on python-dev. Just From martin at loewis.home.cs.tu-berlin.de Mon Jun 25 08:39:44 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 25 Jun 2001 08:39:44 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: <3B3666B9.335DA17E@per.dem.csiro.au> (message from Mark Favas on Mon, 25 Jun 2001 06:16:25 +0800) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> > To get socketmodule.c to compile, I had to make a change to line 2963 > so that the declaration of inet_pton matched the previous declaration on > line 220 (changing char *src to const char *src). Still have problems > though, due to the use of snprintf in getnameinfo.c: Ok, they are printing a single number into a 512 byte buffer; that is safe even with sprintf only, so I have just remove the snprintf call. Can you please try again? Thanks for your reports, Martin From thomas at xs4all.net Mon Jun 25 09:20:53 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 09:20:53 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625085521-r01010600-9a6226c8@213.84.27.177> References: <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625092053.S8098@xs4all.nl> On Mon, Jun 25, 2001 at 08:55:17AM +0200, Just van Rossum wrote: > *** Please CC follow-ups to me, as I'm not on python-dev. Is that by choice ? It seems rather... peculiar, to me, that you have checkin access but aren't on python-dev. You'll miss all those wonderful "Don't touch CVS, I'm building a release" and "Who put CVS in an unstable state?" messages. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Mon Jun 25 09:51:00 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 03:51:00 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625092053.S8098@xs4all.nl> Message-ID: [Just van Rossum] > *** Please CC follow-ups to me, as I'm not on python-dev. [Thomas Wouters] > Is that by choice ? It seems rather... peculiar, to me, that you have > checkin access but aren't on python-dev. Well, I suppose it's supposed to be a secret, but Guido and Just haven't talked in 17 years come Wednesday. IIRC, something about a bottle of wine and a toilet seat, and a small but energetic ferret. Just hacked his way into SourceForge access (those skills just run in the family, I guess), but every time he hacks onto Python-Dev Guido detects it and locks him out again. It's very sad, really -- but also wonderfully Dutch. at-least-that's-the-best-explanation-i-can-think-of-ly y'rs - tim From thomas at xs4all.net Mon Jun 25 10:35:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 10:35:38 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: References: Message-ID: <20010625103538.T8098@xs4all.nl> On Mon, Jun 25, 2001 at 03:51:00AM -0400, Tim Peters wrote: [ Tim explains about the century-old, horrid blood feud that cost the lives of many an innocent ferret, not to mention bottles of wine, caused by Just's future attempts to join python-dev -- damn that timemachine ] Okay... how about someone takes Guido out for dinner and feeds him way too many bottles of wine and ferrets to show him such things do not necessarily lead to blood feuds ? Maybe take along some psychotropic drugs and a halfway decent hypnotist for safety's measure. Meanwhile Barry subscribes Just to python-dev and you or someone else with the pickpocket skills to get at the keys for the time machine (come on, fess up, you all practiced) make sure Guido can't get at it, lest he try and make up with Just in the past in his 'suggestable' state... Better change the Mailman admin password too, just to be on the safe side. Or if that has no chance of a prayer in hell of working, I can give Just a secret xs4all.nl address (since he has an XS4ALL account nowadays, that shouldn't be a problem) and we just never tell Guido that py-dev at xs4all.nl is really Just ;) > It's very sad, really -- but also wonderfully Dutch. No, it would only be wondefully dutch if either brother was German or Belgian in some way, or of royal blood and married to the wrong type of christian sect (Protestant or Catholic -- I keep forgetting which is which.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Mon Jun 25 11:05:23 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 25 Jun 2001 05:05:23 -0400 Subject: [Python-Dev] RE: [Python-iterators] Death by Leakage In-Reply-To: Message-ID: Here's a simpler leaker, amounting to an insanely convoluted way to generate the ints 1, 2, 3, ...: DO_NOT_LEAK = 1 class LazyList: def __init__(self, g): self.sofar = [] self.fetch = g.next def __getitem__(self, i): sofar, fetch = self.sofar, self.fetch while i >= len(sofar): sofar.append(fetch()) return sofar[i] def clear(self): self.__dict__.clear() def plus1(g): for i in g: yield i + 1 def genm23(): yield 1 for i in plus1(m23): yield i for i in range(10000): m23 = LazyList(genm23()) [m23[i] for i in range(50)] if DO_NOT_LEAK: m23.clear() Neil, it would help if genobjects had a memberlist so that the struct members were discoverable from Python code; that would also let me add appropriate methods to Cyclops.py to find cycles automatically. Anyway, m23 is a LazyList instance, where m23.fetch is genm23().next, i.e. m23.fetch is s bound method of the genm23() generator-iterator. So the frame for genm23 is reachable from m23. __dict__. That frame contains an anonymous (it's living in the frame's valuestack) generator-iterator thingie corresponding to the plus1(m23) call. *That* generator's frame in turn has m23 in its locals (m23 was an argument to plus1), and another iterator method referencing m23 in its valuestack (due to the "for i in g"). But m23 is the LazyList instance we started with, so there's a cycle, and clearing m23.__dict__ breaks it. gc doesn't chase generators or frames, so it can't clean this stuff up if we don't clear the dict. So this appears hopeless unless gc adds both generators and frames to its repertoire. OTOH, it's got to be rare -- maybe . Worth it? From loewis at informatik.hu-berlin.de Mon Jun 25 11:43:33 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 25 Jun 2001 11:43:33 +0200 (MEST) Subject: [Python-Dev] make static Message-ID: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> There is a bug report on SF that 'make static' fails for a Makefile.pre.in extension, see http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 Is that process still supported? Unless I'm mistaken, this is complicated by the fact that Makefile.pre.in packages use the Makefile.pre.in that comes with the package, not the one that comes with the Python installation. Any insights welcome, Martin From jack at oratrix.nl Mon Jun 25 12:18:40 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:18:40 +0200 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) In-Reply-To: Message by Mark Favas , Mon, 25 Jun 2001 06:16:25 +0800 , <3B3666B9.335DA17E@per.dem.csiro.au> Message-ID: <20010625101842.B6BC6303182@snelboot.oratrix.nl> I'm having a lot of problems with the new getaddrinfo stuff: no prototypes used in various routines, missing consts in routine declarations and then passing const strings to it, all routines seem to be globals (and with pretty dangerous names) even though they all look pretty static to me, etc. Could whoever put this in do a round of quality control on it, please? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon Jun 25 12:28:08 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 25 Jun 2001 12:28:08 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Message by Just van Rossum , Mon, 25 Jun 2001 08:55:17 +0200 , <20010625085521-r01010600-9a6226c8@213.84.27.177> Message-ID: <20010625102809.42357303182@snelboot.oratrix.nl> > I noticed that these days __file__ attributes of modules are case normalized > (ie. lowercased on case insensitive file systems), or at least the directory > part. Then I noticed that this is caused by the fact that all sys.path entries > are case normalized. It turns out that site.py does this, in a function called > makepath(), added by Fred about 8 months ago. > > I think this is wrong: we should always try to *preserve* case. There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters. And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting. I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fredrik at pythonware.com Mon Jun 25 14:12:23 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 25 Jun 2001 14:12:23 +0200 Subject: [Python-Dev] IPv6 and Windows References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <20010624233537.R8098@xs4all.nl> <200106242156.f5OLum222759@mira.informatik.hu-berlin.de> Message-ID: <006101c0fd70$17a6b660$0900a8c0@spiff> martin wrote: > getaddrinfo offers protocol-independent address lookup. It is > necessary to use that API to support AF_INET and AF_INET6 > transparently in application code. itojun proposes to change a number > of standard library modules. Please have a look at the actual patch > for details; the typical change will look like this (for httplib) > > diff -u -r1.35 httplib.py > --- Lib/httplib.py 2001/06/01 16:25:38 1.35 > +++ Lib/httplib.py 2001/06/24 04:41:48 > @@ -357,10 +357,22 @@ > > def connect(self): > """Connect to the host and port specified in __init__.""" > - self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > - if self.debuglevel > 0: > - print "connect: (%s, %s)" % (self.host, self.port) > - self.sock.connect((self.host, self.port)) > + for res in socket.getaddrinfo(self.host, self.port, 0, socket.SOCK_STREAM): > + af, socktype, proto, canonname, sa = res > + try: > + self.sock = socket.socket(af, socktype, proto) > + if self.debuglevel > 0: > + print "connect: (%s, %s)" % (self.host, self.port) > + self.sock.connect(sa) > + except socket.error, msg: > + if self.debuglevel > 0: > + print 'connect fail:', (self.host, self.port) > + self.sock.close() > + self.sock = None > + continue > + break > + if not self.sock: > + raise socket.error, msg instead of adding code like that to every single module, maybe we should add a convenience function to the socket module? (and make that function smart enough to work also if getaddrinfo isn't supported by the native platform...) From guido at digicool.com Mon Jun 25 15:40:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:10 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251340.f5PDeAO07244@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:40:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:47 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 10:35:38 +0200." <20010625103538.T8098@xs4all.nl> References: <20010625103538.T8098@xs4all.nl> Message-ID: <200106251340.f5PDele07256@odiug.digicool.com> No need to get me drunk. Barry & I decided to change this policy weeks ago, but (in order to avoid a flurry of subscription requests from functional-language proponents) we decided to keep the policy change a secret. :-) Just can suscribe safely now. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:40:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:40:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251340.f5PDe6e07238@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:41:46 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:41:46 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 19:48:03 +0200." <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> Message-ID: <200106251341.f5PDfkg07283@odiug.digicool.com> > The problem is that the library patches (httplib, ftplib, etc) do use > getaddrinfo to find out how to contact a remote system, which is the > right thing to do IMO. So even if the IPv6 support can be activated > only if desired, getaddrinfo absolutely has to work. Yes, but in an IPv4-only environment it would be super trivial to implement, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:42:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:42:18 -0400 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: Your message of "Sun, 24 Jun 2001 20:08:13 +0200." <3B362C8D.D3AECE3C@lemburg.com> References: <200106240820.f5O8KVN01435@mira.informatik.hu-berlin.de> <3B35C9AB.2D1D2185@lemburg.com> <200106241700.f5OH0hm01021@mira.informatik.hu-berlin.de> <3B362C8D.D3AECE3C@lemburg.com> Message-ID: <200106251342.f5PDgI107298@odiug.digicool.com> > > # If you redistribute the SOFTWARE and/or your Source Modifications, > > # or any portion thereof as provided above, you agree: (i) to > > # distribute the SOFTWARE only in conjunction with, and as part of, > > # your Source Modifications which add significant functionality to the > > # SOFTWARE; (ii) to distribute the SOFTWARE and/or your Source > > # Modifications solely as part of your research and not in any > > # commercial product; (iii) the SOFTWARE and/or your Source > > # Modifications will not be distributed for profit; (iv) to retain all > > # branding, copyright and trademark notices included with the SOFTWARE > > # and include a copy of this EULA with any distribution of the > > # SOFTWARE, or any portion thereof; and (v) to indemnify, hold > > # harmless, and defend Microsoft from and against any claims or > > # lawsuits, including attorneys' fees, that arise or result from > > # the use or distribution of your Source Modifications. > > > > I don't know whether this is acceptable or not. > > Most likely not: there are lots of commercial Python users out there > who wouldn't like these clauses at all... we'd also lose the GPL > compatibility. Don't even *think* about using code with that license. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:43:04 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:04 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: Your message of "Mon, 25 Jun 2001 12:28:08 +0200." <20010625102809.42357303182@snelboot.oratrix.nl> References: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: <200106251343.f5PDh4907304@odiug.digicool.com> > > I noticed that these days __file__ attributes of modules are case > > normalized (ie. lowercased on case insensitive file systems), or > > at least the directory part. Then I noticed that this is caused by > > the fact that all sys.path entries are case normalized. It turns > > out that site.py does this, in a function called makepath(), added > > by Fred about 8 months ago. > > > > I think this is wrong: we should always try to *preserve* case. > > There is an added problem with the makepath() stuff that I hadn't > reported here yet: it has broken MacPython on some non-western > machines. Specifically I've had reports of people running a Japanese > MacOS that things will break if they run Python from a pathname that > has any non-7-bit-ascii characters in the name. Apparently normcase > normalizes more than just ascii upper/lowercase letters. > > And aside from that I fully agree with Just: seeing a stacktrace > with all lowercase filenames is _very_ disconcerting. > > I would disable the case-normalization for MacPython, except that I > don't know whether it actually has a function. With MacPython's way > of finding the initial sys.path contents we don't have the > Windows-Python problem that we add the same directory 5 times (once > in uppercase, once in lowercase, once in mixed case, once in > mixed-case with / for \, etc:-), so if this is what it's trying to > solve we can take it out easily. I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 15:43:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 09:43:25 -0400 Subject: [Python-Dev] make static In-Reply-To: Your message of "Mon, 25 Jun 2001 11:43:33 +0200." <200106250943.LAA24576@pandora.informatik.hu-berlin.de> References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> Message-ID: <200106251343.f5PDhQ407309@odiug.digicool.com> > There is a bug report on SF that 'make static' fails for a > Makefile.pre.in extension, see > > http://sourceforge.net/tracker/index.php?func=detail&aid=435446&group_id=5470&atid=105470 > > Is that process still supported? Unless I'm mistaken, this is > complicated by the fact that Makefile.pre.in packages use the > Makefile.pre.in that comes with the package, not the one that comes > with the Python installation. > > Any insights welcome, > > Martin As long as it works, it works. I don't think there's a reason to spend more than absolutely minimal time trying to keep it working though -- we're trying to encourage everybody to migrate towards distutils. So (without having seen the SF report) I'd say "tough luck". --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Jun 25 15:50:31 2001 From: skip at pobox.com (Skip Montanaro) Date: Mon, 25 Jun 2001 08:50:31 -0500 Subject: [Python-Dev] xrange vs generators Message-ID: <15159.16807.480121.637386@beluga.mojam.com> With generators in the language, should xrange be deprecated? Skip From just at letterror.com Mon Jun 25 16:05:43 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 16:05:43 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <20010625160545-r01010600-e232a14e@213.84.27.177> Guido van Rossum wrote: > I can't think of any function besides the attempt to avoid duplicates. > > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? They are already, they already go through abspath(), which calls normpath(). > I'll leave it to Fred, Jack or Just to fix this. If it were up to me, I'd simply remove the normcase() call from makepath(). Just From arigo at ulb.ac.be Mon Jun 25 15:08:52 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:08:52 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625134824.00abde60@127.0.0.1> Hello everybody, A note about what I have in mind about Psyco... Type-sets are independent from memory representation. In other words, it is not because two variables can take the same set of values that the data is necessarily encoded in the same way in memory. In particular, I believe we won't need to change the way the current Python interpreted encodes data. For example, instances currently have a dictionary of attributes and no "fixed slots", but this is not a problem for Psyco, which can encode instances in better ways (e.g. as a C struct) as long as it is only accessed by Psyco-compiled Python code and no "legacy" code. This approach also allows Psyco to completely remove the overhead of creating bound method objects and frame objects; both are generally temporary, and so during their whole lifetime they can be represented much more efficiently in memory. For frame objects it should be clear (we probably need no frame at all as long as no exception exits the current procedure, and even in this case it could be optimized). For method objects we use "memory sharing", a technique already applied in the current Psyco. More precisely, if some (immutable) data is found at some memory location (or machine register) and Python code says it should be duplicated, we need not duplicate it at all; we can just consider that the copy is at the same location as the original. For method objects it means the following: suppose you have an instance "xyz" and query its "foo()" method. Suppose that you can (at some time) be sure that, because of the class of "xyz", "xyz.foo" will always be the Python function "f". Then the method object's representation can be simplified: all it needs to store in memory is a pointer to "xyz", because "f" is a constant part. Now a single pointer to the "xyz" instance is exactly the same memory format as the original "xyz" variable, so that this particular representation of a bound method object can share the original "xyz" pointer. No actual machine code is produced; Psyco simply notes that both "xyz" and "xyz.foo" are represented at the same location, althought "xyz" represents an instance with the given pointer, and "xyz.foo" represents the "f" function with its first argument bound to the given pointer. According to est at hyperreal.org, method and frame objects each represent 20% of the execution time... (Est, on which kind of machine did you get Psyco run the sample code 5 times faster !? It's only 2 times faster on a modern Pentium...) A bient?t, Armin. From arigo at ulb.ac.be Mon Jun 25 15:45:20 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Mon, 25 Jun 2001 15:45:20 +0200 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106221259.OAA02519@core.inf.ethz.ch> Message-ID: <4.3.1.0.20010625150819.00aa5220@127.0.0.1> Hello, At 14:59 22.06.2001 +0200, Samuele Pedroni wrote: >*: some possible useful hooks would be: >- minimal profiling support in order to specialize only things called often >- feedback for dynamic changing of methods, class hierarchy, ... if we want >to optimize method lookup (which would make sense) >- a mixed fixed slots/dict layout for instances. There is one point that you didn't mention, which I believe is important: how to handle global/builtin variables. First, a few words about the current Python semantics. * I am sorry if what follows has already been discussed; I am raising the question again because it might be important for Psyco. If you feel this should better be a PEP please just tell me so. * Complete lexical scoping was recently added, implemented with "free" and "cell" variables. These are only used for functions defined inside of other functions; top-level functions use the opcode LOAD_GLOBAL for all non-local variables. LOAD_GLOBAL performs one or two dictionary look-up (two if the variable is built-in). For simple built-ins like "len" this might be expensive (has someone measured such costs ?). I suggest generalizing the compile-time lexical scoping rules. Let's compile all functions' non-local variables (top-level and others) as "free" variables. This means the corresponding module's global variables must be "cell" variables. This is just what we would get if the module's code was one big function enclosing the definition of all the other functions. Next, the variables not defined in the module (the built-ins) are "free" variables of the module, and the built-in module provides "cell" variables for them. Remember that "free" and "cell" variables are linked together when the function (or module in this case) is defined (for functions, when "def" is executed; for modules, it would be at load-time). Benefit: not a single dictionary look-up any more; uniformity of treatment. Potential code break: global variables shadowing built-ins would behave like local variables shadowing globals, i.e. the mere presence of a global "xyz=..." would forever hide the "xyz" built-in from the module, even before the assignment or after a "del xyz". (c.f. UnboundLocalError.) To think about: what the "global" keyword would mean in this context. Implementation problems: if we want to keep the module's dictionary of global variables (and we certainly do) it would require changes to the dictionary implementation (or the creation of a different kind of dictionary). One solution is to automatically dereference cell objects and raise exceptions upon reading empty cells. Another solution is to turn dictionaries into collections of objects that all behave like cell objects (so that if "d" is any dictionary, something like "d.ref(key)" would let us get a cell object which could be read or written later to actually get or set the value associated to "key", and "d[key]" would mean "d.ref(key).cell_ref). Well, these are just proposals; they might not be a good solution. Why it is related to Psyco: the current treatment of globals/builtins makes it hard for Psyco to statically tell what function we are calling when it sees e.g. "len(a)" in the code. We would at least need some help from the interpreter; at least hooks called when the module's globals() dictionary change. The above proposal might provide a more uniform solution. Thanks for your attention. Armin. From guido at digicool.com Mon Jun 25 16:26:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:26:08 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 08:50:31 CDT." <15159.16807.480121.637386@beluga.mojam.com> References: <15159.16807.480121.637386@beluga.mojam.com> Message-ID: <200106251426.f5PEQ8907629@odiug.digicool.com> > With generators in the language, should xrange be deprecated? > > Skip No, but maybe xrange() should be changed to return an iterator. E.g. something like this: def xrange(start, stop, step): while start < stop: yield start start += stop but with the appropriate defaults, and reversal of the test if step < 0, and an error if step == 0, and type checks enforcing ints (or long ints!), and implemented in C. :-) Although xrange() objects currently support some sequence algebra, that is mostly bogus and I don't think anyone in their right mind uses it. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller at ion-tof.com Mon Jun 25 16:37:31 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 16:37:31 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> > > With generators in the language, should xrange be deprecated? > > > > Skip > > No, but maybe xrange() should be changed to return an iterator. > E.g. something like this: > > def xrange(start, stop, step): > while start < stop: > yield start > start += stop > > but with the appropriate defaults, and reversal of the test if step < > 0, and an error if step == 0, and type checks enforcing ints (or long > ints!), and implemented in C. :-) > > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I _was_ using xrange as sets representing (potentially large) ranges of ints. Example: positive = xrange(1, sys.maxint) if num in positive: ... I didt follow the iterators discussion: would this continue to work? Thomas From esr at thyrsus.com Mon Jun 25 16:41:34 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 25 Jun 2001 10:41:34 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251426.f5PEQ8907629@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 10:26:08AM -0400 References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> Message-ID: <20010625104134.B30559@thyrsus.com> Guido van Rossum : > Although xrange() objects currently support some sequence algebra, > that is mostly bogus and I don't think anyone in their right mind uses > it. I agree. As long as we make those cases fail loudly, I see no objection to dropping support for them. -- Eric S. Raymond Americans have the will to resist because you have weapons. If you don't have a gun, freedom of speech has no power. -- Yoshimi Ishikawa, Japanese author, in the LA Times 15 Oct 1992 From barry at digicool.com Mon Jun 25 16:38:20 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 25 Jun 2001 10:38:20 -0400 Subject: [Python-Dev] os.path.normcase() in site.py References: <20010625103538.T8098@xs4all.nl> Message-ID: <15159.19676.727068.217548@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> Okay... how about someone takes Guido out for dinner and feeds TW> him way too many bottles of wine and ferrets to show him such TW> things do not necessarily lead to blood feuds ? Maybe take TW> along some psychotropic drugs and a halfway decent hypnotist TW> for safety's measure. Don't forget the dentist, proctologist, and a trepanist. Actually, if you can find a holeologist it would be much more efficient (my cousin Neil, a.k.a. Dr. Finger, a.k.a. Dr Watumpka would be ideal, but he's studying in Dortmund these days). TW> Meanwhile Barry subscribes Just to python-dev I'd be glad to, and I won't even divulge the fact that python-dev is only ostensibly a closed, insular mailing list these days. TW> and you or someone else with the pickpocket skills to get at TW> the keys for the time machine No pickpocketing skill necessary. Guido leaves the keys in a small safebox magnetically adhered underneath the running boards. Just be sure to ground yourself first (learned the hard way)! TW> (come on, fess up, you all practiced) make sure Guido can't TW> get at it, lest he try and make up with Just in the past in TW> his 'suggestable' state... Better change the Mailman admin TW> password too, just to be on the safe side. I've tried that many times, but I suspect Guido has a Pybot thermetically linked to the time machine which "instantly" recedes several seconds into the past each time I change it, only to change it back. TW> Or if that has no chance of a prayer in hell of working, I can TW> give Just a secret xs4all.nl address (since he has an XS4ALL TW> account nowadays, that shouldn't be a problem) and we just TW> never tell Guido that py-dev at xs4all.nl is really Just ;) You realize it's way too "late" for that, don't you? The time machine works just as well in the forward direction as in the past direction, and long before he left the comfy environs of Amsterdam to brave it out in the harsh, unforgiving wilderness of Washington, he mapped out every moment of young Wouters' life. Why do you think I've worn aluminum foil underwear for the past 30 years? Trust me, it's not for the feeling of freshness and confidence it provides (okay, only partially). >> It's very sad, really -- but also wonderfully Dutch. TW> No, it would only be wondefully dutch if either brother was TW> German or Belgian in some way, or of royal blood and married TW> to the wrong type of christian sect (Protestant or Catholic -- TW> I keep forgetting which is which.) It would also be wonderfully American, but only if Just had trivially wronged Guido years ago by eating one of his nabisco cookies or some such. -Barry From guido at digicool.com Mon Jun 25 16:47:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 10:47:50 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 16:37:31 +0200." <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> Message-ID: <200106251447.f5PEloH07777@odiug.digicool.com> [me] > > Although xrange() objects currently support some sequence algebra, > > that is mostly bogus and I don't think anyone in their right mind uses > > it. [theller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > > I didt follow the iterators discussion: would this > continue to work? No, it would break. And I see another breakage too: r = xrange(10) for i in r: for j in r: print i, j would not do the right thing if xrange() returned an iterator (because iterators can only be used once). This is too bad; I really wish that xrange() could die or be limited entirely to for loops. I wonder if we could put warnings on xrange() uses beyond the most basic...? --Guido van Rossum (home page: http://www.python.org/~guido/) From pedroni at inf.ethz.ch Mon Jun 25 16:51:16 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Mon, 25 Jun 2001 16:51:16 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251451.QAA17756@core.inf.ethz.ch> Hi. [Armin Rigo] ... > Why it is related to Psyco: the current treatment of globals/builtins makes > it hard for Psyco to statically tell what function we are calling when it > sees e.g. "len(a)" in the code. We would at least need some help from the > interpreter; at least hooks called when the module's globals() dictionary > change. The above proposal might provide a more uniform solution. > FYI, a different proposal for opt. globals access by Jeremy Hylton. It seems, it would break fewer things ... don't know whether it can be as useful for Psyco: http://mail.python.org/pipermail/python-dev/2001-May/014995.html In any case I think Psyco will need notification support from the interpreter about dynamic changes to things that Psyco honestly assumes to be invariant in order to achieve performance. regards, Samuele Pedroni. From thomas.heller at ion-tof.com Mon Jun 25 17:05:09 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Mon, 25 Jun 2001 17:05:09 +0200 Subject: [Python-Dev] xrange vs generators References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <00e001c0fd88$3a532140$e000a8c0@thomasnotebook> > [theller] > > I _was_ using xrange as sets representing (potentially large) > > ranges of ints. > > Example: > > > > positive = xrange(1, sys.maxint) > > > > if num in positive: > > ... > > > > I didt follow the iterators discussion: would this > > continue to work? > > No, it would break. Since there was a off-by-one bug for 'if num in xrange()' in Pyhon2.0 my code already has been rewritten. Thomas From pedroni at inf.ethz.ch Mon Jun 25 17:04:45 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Mon, 25 Jun 2001 17:04:45 +0200 (MET DST) Subject: [Python-Dev] Python Specializing Compiler Message-ID: <200106251504.RAA18642@core.inf.ethz.ch> Hi. [Armin Rigo] > In particular, I believe we won't need to change the way the current Python > interpreted encodes data. For example, instances currently have a > dictionary of attributes and no "fixed slots", but this is not a problem > for Psyco, which can encode instances in better ways (e.g. as a C struct) > as long as it is only accessed by Psyco-compiled Python code and no > "legacy" code. This makes sense, but I'm asking if it is affordable to have all code executed (if we aim for usage-transparency) through Psyco-compiled code (memory foot-print, compilation vs. execution trade-offs for rarely executed code) Otherwise in a mixed execution context we would pay for conversions. I can see how a dynamic compiler can deal with methods together with the interpreter that notifies when a dynamic change to hierarchy, method defs can potetianlly invalidate compiled code. I see more problems with instance data slots, because there are no strong hints in the code about which are the "official" slots of a class, and undisciplined code can treat instances just as dicts. regards, Samuele Pedroni. From fdrake at acm.org Mon Jun 25 17:13:31 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 25 Jun 2001 11:13:31 -0400 (EDT) Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <200106251343.f5PDh4907304@odiug.digicool.com> References: <20010625102809.42357303182@snelboot.oratrix.nl> <200106251343.f5PDh4907304@odiug.digicool.com> Message-ID: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. There were two reasons for adding this code: 1. Avoid duplicates (speeds imports if there are duplicates and the modules are found on an entry after the dupes). 2. Avoid breakage when a script uses os.chdir(). This is probably unusual for large applications, but fairly common for little admin helper scripts. > I think that even on Windows, retaining case makes sense. > > I think that there's a way to avoid duplicates without case-folding > everything. (E.g. use a case-folding comparison instead.) > > I wonder if maybe path entries should be normpath'd though? > > I'll leave it to Fred, Jack or Just to fix this. I certainly agree that this can be improved; if Jack or Just would like to assign it to me on SourceForge, I'd be glad to fix it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim at digicool.com Mon Jun 25 17:39:47 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 11:39:47 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: [Thomas Heller] > I _was_ using xrange as sets representing (potentially large) > ranges of ints. > Example: > > positive = xrange(1, sys.maxint) > > if num in positive: > ... > I didt follow the iterators discussion: would this > continue to work? [Guido] > No, it would break. "x in y" works with any iterable y in 2.2, incl. generators. So e.g. >>> def xr(n): ... i = 0 ... while i < n: ... yield i ... i += 1 ... >>> 1 in xr(10) 1 >>> 9 in xr(10) 1 >>> 10 in xr(10) 0 >>> However, there's no __contains__ method here, so in the last case it actually did 10 compares. 0 in xr(sys.maxint) is very quick, but I'm still waiting for -1 in xr(sys.maxint) to complete . > And I see another breakage too: This would also apply to Thomas's example of giving a name to an xrange object, if implemented via generator: >>> small = xr(5) >>> 2 in small 1 >>> 2 in small 0 >>> > ... > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Hmm. I'd rather not endure the resulting complaints without a strong rationale for deprecating it. One that strikes close to my heart: there's more code in 2.2 to support xrange than there is to support generators! But users don't care about that. From thomas at xs4all.net Mon Jun 25 17:42:12 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 25 Jun 2001 17:42:12 +0200 Subject: [Python-Dev] xrange vs generators In-Reply-To: <200106251447.f5PEloH07777@odiug.digicool.com> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> Message-ID: <20010625174211.U8098@xs4all.nl> On Mon, Jun 25, 2001 at 10:47:50AM -0400, Guido van Rossum wrote: [ xrange can't be changed into a generator ] > This is too bad; I really wish that xrange() could die or be limited > entirely to for loops. I wonder if we could put warnings on xrange() > uses beyond the most basic...? Why do we want to do this ? xrange() is still exactly what it was: an object that pretends to be a list of integers. Besides being useful for those who work a lot with ranges, it's a wondeful example on what you can do with Python (even if it isn't actually written in Python :-) I see less reason to deprecate xrange than to deprecate the gopherlib, wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Mon Jun 25 18:07:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:07:44 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 11:39:47 EDT." References: Message-ID: <200106251607.f5PG7iq08192@odiug.digicool.com> > Hmm. I'd rather not endure the resulting complaints without a > strong rationale for deprecating it. One that strikes close to my > heart: there's more code in 2.2 to support xrange than there is to > support generators! But users don't care about that. But I do, and historically this code has often been bug-ridden without anybody noticing -- so it's not like it's needed much. I would suggest to remove most of the fancy features of xrange(), in particular the slice, contains and repeat slots. A step further would be to remove getitem also, and add a tp_getiter slot instead -- returning not itself but a new iterator that iterates through the prescribed sequence. We need a PEP for this. Anyone? Should be short and sweet. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon Jun 25 18:11:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 12:11:10 -0400 Subject: [Python-Dev] xrange vs generators In-Reply-To: Your message of "Mon, 25 Jun 2001 17:42:12 +0200." <20010625174211.U8098@xs4all.nl> References: <15159.16807.480121.637386@beluga.mojam.com> <200106251426.f5PEQ8907629@odiug.digicool.com> <006301c0fd84$5dcefad0$e000a8c0@thomasnotebook> <200106251447.f5PEloH07777@odiug.digicool.com> <20010625174211.U8098@xs4all.nl> Message-ID: <200106251611.f5PGBA608205@odiug.digicool.com> > [ xrange can't be changed into a generator ] > > > This is too bad; I really wish that xrange() could die or be limited > > entirely to for loops. I wonder if we could put warnings on xrange() > > uses beyond the most basic...? > > Why do we want to do this ? xrange() is still exactly what it was: an object > that pretends to be a list of integers. Besides being useful for those who > work a lot with ranges, it's a wondeful example on what you can do with > Python (even if it isn't actually written in Python :-) There is exactly *one* idiomatic use of xrange(): for i in xrange(...): ... All other operations supported by the xrange object are very rarely used, and historically their implementation has had obvious bugs that no-one noticed for years. > I see less reason to deprecate xrange than to deprecate the gopherlib, > wave/aifc/audiodev, mhlib, netrc and/or robotparser modules. Those are useful application-area libraries for some folks. The idiomatic xrange() object is useful too. But the advanced features of xrange() are an example of code bloat. --Guido van Rossum (home page: http://www.python.org/~guido/) From Greg.Wilson at baltimore.com Mon Jun 25 18:25:33 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Mon, 25 Jun 2001 12:25:33 -0400 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #1437 - 13 msgs Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E27F1@nsamcanms1.ca.baltimore.com> > Guido: > Since you have already obtained the same speedup with your approach, I > think there's great promise. Count on sending in a paper for the next > Python conference! Greg: "Doctor Dobb's Journal" would also be interested in an article. Who knows --- it might even be done before the ones on stackless, garbage collection, Zope acquisition, and generators... :-) Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From just at letterror.com Mon Jun 25 18:47:30 2001 From: just at letterror.com (Just van Rossum) Date: Mon, 25 Jun 2001 18:47:30 +0200 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <15159.21787.913782.751691@cj42289-a.reston1.va.home.com> Message-ID: <20010625184734-r01010600-dbd1c84a@213.84.27.177> Guido van Rossum writes: > I can't think of any function besides the attempt to avoid duplicates. Fred L. Drake, Jr. wrote: > There were two reasons for adding this code: > > 1. Avoid duplicates (speeds imports if there are duplicates and > the modules are found on an entry after the dupes). > > 2. Avoid breakage when a script uses os.chdir(). This is > probably unusual for large applications, but fairly common for > little admin helper scripts. 1) normcase(). Bad. 2) abspath(). Good. I think #2 is a ligitimate problem, but I'm not so sure of #1: is it really so common for sys.path to contain duplicates, to worry about it at all? > > I'll leave it to Fred, Jack or Just to fix this. > > I certainly agree that this can be improved; if Jack or Just would > like to assign it to me on SourceForge, I'd be glad to fix it. Here's my proposed fix: Index: site.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/site.py,v retrieving revision 1.27 diff -c -3 -r1.27 site.py *** site.py 2001/06/12 16:48:52 1.27 --- site.py 2001/06/25 16:42:33 *************** *** 67,73 **** def makepath(*paths): dir = os.path.join(*paths) ! return os.path.normcase(os.path.abspath(dir)) L = sys.modules.values() for m in L: --- 67,73 ---- def makepath(*paths): dir = os.path.join(*paths) ! return os.path.abspath(dir) L = sys.modules.values() for m in L: Just From aahz at rahul.net Mon Jun 25 19:19:48 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 25 Jun 2001 10:19:48 -0700 (PDT) Subject: [Python-Dev] 2.1.1 vs. os.normcase() Message-ID: <20010625171948.D636399C80@waltz.rahul.net> It's too late for 2.0.1, but should this bugfix go into 2.1.1? (Just to be clear, this is the problem that Just reported with site.py calling os.normcase() in makepath().) ((I'm only asking about this bug in specific because we're getting down to the wire on 2.1.1 IIUC.)) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon Jun 25 20:06:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 25 Jun 2001 14:06:02 -0400 Subject: [Python-Dev] 2.1.1 vs. os.normcase() In-Reply-To: Your message of "Mon, 25 Jun 2001 10:19:48 PDT." <20010625171948.D636399C80@waltz.rahul.net> References: <20010625171948.D636399C80@waltz.rahul.net> Message-ID: <200106251806.f5PI62L08770@odiug.digicool.com> > It's too late for 2.0.1, but should this bugfix go into 2.1.1? > > (Just to be clear, this is the problem that Just reported with site.py > calling os.normcase() in makepath().) > > ((I'm only asking about this bug in specific because we're getting down > to the wire on 2.1.1 IIUC.)) Unclear if it's purely a bugfix -- this could be considered a feature, but I don't know. What do others think? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim at digicool.com Mon Jun 25 20:47:06 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 25 Jun 2001 14:47:06 -0400 Subject: [Python-Dev] os.path.normcase() in site.py In-Reply-To: <20010625102809.42357303182@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > ... > With MacPython's way of finding the initial sys.path contents we > don't have the Windows-Python problem that we add the same directory > 5 times (once in uppercase, once in lowercase, once in mixed case, > once in mixed-case with / for \, etc:-), Happily, we don't have that problem on a stock Windows Python anymore: C:\Python21>python Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import sys, pprint >>> pprint.pprint(sys.path) ['', 'c:\\python21', 'c:\\python21\\dlls', 'c:\\python21\\lib', 'c:\\python21\\lib\\plat-win', 'c:\\python21\\lib\\lib-tk'] >>> OTOH, this is still Icky, because those don't match (wrt case) the names in the filesystem (e.g., just look at the initial prompt line: I was in Python21 when I ran this, not python21). > so if this is what it's trying to solve we can take it out easily. It's hard to believe Fred added code to solve a Windows problem ; I don't know what it's trying to do. From m.favas at per.dem.csiro.au Mon Jun 25 21:38:47 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 26 Jun 2001 03:38:47 +0800 Subject: [Python-Dev] Re: Problems with socketmodule (getnameinfo & getaddrinfo) References: <200106242139.f5OLdjR22560@mira.informatik.hu-berlin.de> <3B3666B9.335DA17E@per.dem.csiro.au> <200106250639.f5P6die01246@mira.informatik.hu-berlin.de> Message-ID: <3B379347.7E8D00EB@per.dem.csiro.au> "Martin v. Loewis" wrote: > > > To get socketmodule.c to compile, I had to make a change to line 2963 > > so that the declaration of inet_pton matched the previous declaration on > > line 220 (changing char *src to const char *src). Still have problems > > though, due to the use of snprintf in getnameinfo.c: > > Ok, they are printing a single number into a 512 byte buffer; that is > safe even with sprintf only, so I have just remove the snprintf call. > Can you please try again? > > Thanks for your reports, > Martin No trouble... The current CVS compiles (with a warning), links, and runs. The warning given is: cc: Warning: /home/gonzo1/mark/groucho1/mark/src/python/CVS/python/dist/src/Modu les/getaddrinfo.c, line 407: In this statement, the referenced type of the point er value "hostname" is const, but the referenced type of the target of this assi gnment is not. (notconstqual) if (inet_pton(gai_afdl[i].a_af, hostname, pton)) { ------------------------------------------------^ which can be fixed by declaring the second argument to inet_pton as const char* instead of char* in the two occurences of inet_pton in socketmodule.c Cheers, Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From martin at loewis.home.cs.tu-berlin.de Tue Jun 26 01:08:00 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 26 Jun 2001 01:08:00 +0200 Subject: [Python-Dev] IPv6 and Windows In-Reply-To: <200106251341.f5PDfkg07283@odiug.digicool.com> (message from Guido van Rossum on Mon, 25 Jun 2001 09:41:46 -0400) References: <200106241748.f5OHm3001480@mira.informatik.hu-berlin.de> <200106251341.f5PDfkg07283@odiug.digicool.com> Message-ID: <200106252308.f5PN80701342@mira.informatik.hu-berlin.de> > > The problem is that the library patches (httplib, ftplib, etc) do use > > getaddrinfo to find out how to contact a remote system, which is the > > right thing to do IMO. So even if the IPv6 support can be activated > > only if desired, getaddrinfo absolutely has to work. > > Yes, but in an IPv4-only environment it would be super trivial to > implement, right? Right, and getaddrinfo.c/getnameinfo.c attempt such an implementation. They might attempt to get it "more right" than necessary, but still they are "pure C", in the sense that they don't rely on any libraries except for those available in a typical IPv4 sockets implementation. At least that's the theory. It turns out that they've been using inet_pton and snprintf, which is probably because they have been mainly tested on BSD. I'm in good faith that we can reduce them to a "no funny library calls needed" minimum. If somebody wants to implement them anew from ground up, only using what the socketmodule already uses, that would be fine as well. An actual review for the code for portability problems would also be helpful. Regards, Martin From greg at cosc.canterbury.ac.nz Tue Jun 26 06:32:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 26 Jun 2001 16:32:05 +1200 (NZST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <200106251451.QAA17756@core.inf.ethz.ch> Message-ID: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Samuele Pedroni : > a different proposal for opt. globals access > by Jeremy Hylton. It seems, it would break fewer things ... I really like Jeremy's proposal. I've been having similar thoughts myself for quite a while. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Tue Jun 26 16:57:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 10:57:37 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Tue, 26 Jun 2001 16:32:05 +1200." <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> References: <200106260432.QAA04989@s454.cosc.canterbury.ac.nz> Message-ID: <200106261457.f5QEvbZ11007@odiug.digicool.com> > Samuele Pedroni : > > > a different proposal for opt. globals access > > by Jeremy Hylton. It seems, it would break fewer things ... > > I really like Jeremy's proposal. I've been having similar > thoughts myself for quite a while. > > Greg Ewing Ditto. Isn't this what I've been calling "low-hanging fruit" for ages? Apparently it's low but still out of reach. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue Jun 26 19:59:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 26 Jun 2001 13:59:55 -0400 Subject: [Python-Dev] PEP 260: simplify xrange() Message-ID: <200106261759.f5QHxtH15045@odiug.digicool.com> Here's another sweet and short PEP. What do folks think? Is xrange()'s complexity really worth having? --Guido van Rossum (home page: http://www.python.org/~guido/) PEP: 260 Title: Simplify xrange() Version: $Revision: 1.1 $ Author: guido at python.org (Guido van Rossum) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 26-Jun-2001 Post-History: 26-Jun-2001 Abstract This PEP proposes to strip the xrange() object from some rarely used behavior like x[i:j] and x*n. Problem The xrange() function has one idiomatic use: for i in xrange(...): ... However, the xrange() object has a bunch of rarely used behaviors that attempt to make it more sequence-like. These are so rarely used that historically they have has serious bugs (e.g. off-by-one errors) that went undetected for several releases. I claim that it's better to drop these unused features. This will simplify the implementation, testing, and documentation, and reduce maintenance and code size. Proposed Solution I propose to strip the xrange() object to the bare minimum. The only retained sequence behaviors are x[i], len(x), and repr(x). In particular, these behaviors will be dropped: x[i:j] (slicing) x*n, n*x (sequence-repeat) cmp(x1, x2) (comparisons) i in x (containment test) x.tolist() method x.start, x.stop, x.step attributes By implementing a custom iterator type, we could speed up the common use, but this is optional (the default sequence iterator does just fine). I expect it will take at most an hour to rip it all out; another hour to reduce the test suite and documentation. Scope This PEP only affects the xrange() built-in function. Risks Somebody's code could be relying on the extended code, and this code would break. However, given that historically bugs in the extended code have gone undetected for so long, it's unlikely that much code is affected. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: From fdrake at acm.org Tue Jun 26 22:01:41 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:01:41 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... Message-ID: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> I'd like people to run the attached C program and send the output to me. What this does is run the gettimeofday() and getrusage() functions until the time values change. The intent is to determine the quality of the available timing information. For example, on my Linux-Mandrake 7.2 installation with a stock 2.2.17 kernel, I get this: timeofday: 1 (1 calls), rusage: 10000 (2465 calls) Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Tue Jun 26 22:05:48 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 16:05:48 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> Fred L. Drake, Jr. writes: > I'd like people to run the attached C program and send the output to OK, I've attached it this time. Sorry! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: observation.c URL: From gward at python.net Tue Jun 26 22:10:09 2001 From: gward at python.net (Greg Ward) Date: Tue, 26 Jun 2001 16:10:09 -0400 Subject: [Python-Dev] make static In-Reply-To: <200106251340.f5PDeAO07244@odiug.digicool.com>; from guido@digicool.com on Mon, Jun 25, 2001 at 09:40:10AM -0400 References: <200106250943.LAA24576@pandora.informatik.hu-berlin.de> <200106251340.f5PDeAO07244@odiug.digicool.com> Message-ID: <20010626161009.B2820@gerg.ca> On 25 June 2001, Guido van Rossum said: > As long as it works, it works. I don't think there's a reason to > spend more than absolutely minimal time trying to keep it working > though -- we're trying to encourage everybody to migrate towards > distutils. So (without having seen the SF report) I'd say "tough > luck". The catch is that I never got around to implementing statically building a new interpreter via the Distutils, so (for now) Makefile.pre.in is the only way to do this. ;-( (Unless someone added it to the Distutils while I wasn't looking, which wouldn't be hard since I haven't looked in, ummm, six months or so...) Greg -- Greg Ward - just another /P(erl|ython)/ hacker gward at python.net http://starship.python.net/~gward/ "When I hear the word `culture', I reach for my gun." --Goebbels "When I hear the word `Microsoft', *I* reach for *my* gun." --me From arigo at ulb.ac.be Wed Jun 27 04:01:54 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From arigo at ulb.ac.be Wed Jun 27 04:01:54 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Tue, 26 Jun 2001 22:01:54 -0400 Subject: [Python-Dev] Python Specializing Compiler Message-ID: <3B393E92.B0719A7A@ulb.ac.be> Hi, I am considering using GNU Lightning to produce code from the Psyco compiler. Has anyone already used it from a Python program ? If so, you might already have done the necessary support module in C, and I might be interested in it ! Otherwise, I'll start from scratch. Of course, comments about whether I should use GNU Lightning at all, or any other code-producing library (or even produce machine code "by hand"), are welcome. Also, I hope to be able to continue with more fundamental work on Psyco very soon. One design decision I have to make now is about the way Psyco reads Python code. Currently, it "reverse-engeneers" byte-code. Another solution would be to compile from the source code (possibly with the help of the 'Tools/Compiler/*' modules). The current solution, althought not optimal, seems to make integration with the current interpreter easier. Indeed, based on recent discussions, I now believe that a realistic way to use Psyco would be to let the interpreter run normally while doing some kind of profiling, and work on time-critical routines only --- which at this point have already been compiled into byte-code and executed at least a few times. Armin From nas at python.ca Tue Jun 26 23:01:38 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 14:01:38 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 04:01:41PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> Message-ID: <20010626140138.A2838@glacier.fnational.com> Fred L. Drake, Jr. wrote: > timeofday: 1 (1 calls), rusage: 10000 (2465 calls) My hacked version of Linux 2.4 on an AMD-800 box: timeofday: 1 (2 calls), rusage: 976 (1792 calls) I don't quite understand the output. What does the 976 mean? Neil From fdrake at acm.org Tue Jun 26 23:23:53 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 17:23:53 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626140138.A2838@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> Message-ID: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > My hacked version of Linux 2.4 on an AMD-800 box: > > timeofday: 1 (2 calls), rusage: 976 (1792 calls) > > I don't quite understand the output. What does the 976 mean? The "1" and the "976" are the appearant resolution of the time values reported by those two calls, in microseconds. It looks like the HZ define in that header file you pointed out could be bumped a little higher. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas at csiro.au Wed Jun 27 01:21:47 2001 From: mark.favas at csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 07:21:47 +0800 Subject: [Python-Dev] latest unicode-related change causes failure in test_unicode & test_unicodedata Message-ID: <3B39190B.E7DA5B5D@csiro.au> CVS of a short while ago, Tru64 Unix: "make test" gives two unicode-related failures: test_unicode test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding test_unicodedata The actual stdout doesn't match the expected stdout. This much did match (between asterisk lines): ********************************************************************** test_unicodedata Testing Unicode Database... Methods: ********************************************************************** Then ... We expected (repr): '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' But instead we got: '374108f225e0c1488f8389ce6333902830d299fb' test test_unicodedata failed -- Writing: '374108f225e0c1488f8389ce6333902830d299fb', expected: '6c7a7c02657b69d0fdd7a7d174f573194bba2e18' Running the tests manually, test_unicode fails, test_unicodedata doesn't fail, but doesn't match the expected output for Methods: (test_unicode) Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing builtin codecs... Traceback (most recent call last): File "Lib/test/test_unicode.py", line 383, in ? verify(u'\ud800\udc02'.encode('utf-8') == \ File "./Lib/test/test_support.py", line 95, in verify raise TestFailed(reason) test_support.TestFailed: test failed (test_unicodedata) python Lib/test/test_unicodedata.py Testing Unicode Database... Methods: 374108f225e0c1488f8389ce6333902830d299fb Functions: 41e1d4792185d6474a43c83ce4f593b1bdb01f8a API: ok -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From JamesL at Lugoj.Com Wed Jun 27 02:06:23 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 17:06:23 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39237F.1A7EF3F2@Lugoj.Com> Guido van Rossum wrote: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? Are there still known bugs that will take some effort to repair? Is xrange constantly touched when changes are made elsewhere? If no to both, then I suggest don't fix what ain't broken; life is too short. (Unless it is annoying you to distraction, then do the deed and get it over with.) From tim.one at home.com Wed Jun 27 02:32:26 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 26 Jun 2001 20:32:26 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39237F.1A7EF3F2@Lugoj.Com> Message-ID: [James Logajan] > Are there still known bugs that will take some effort to repair? Is > xrange constantly touched when changes are made elsewhere? If no to > both, then I suggest don't fix what ain't broken; life is too short. > (Unless it is annoying you to distraction, then do the deed and get > it over with.) I think it's more the latter. I partly provoked this by bitterly pointing out that there's more code in the CVS tree devoted to supporting the single xrange() gimmick than Neil Schemenauer added to support the get-out-of-town more powerful new generators. Masses of crufty code nobody benefits from are a burden on the soul. although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- full-of-crufty-old-irix5-demos-in-the-std-library-ly y'rs - tim From tdelaney at avaya.com Wed Jun 27 02:36:25 2001 From: tdelaney at avaya.com (Delaney, Timothy) Date: Wed, 27 Jun 2001 10:36:25 +1000 Subject: [Python-Dev] RE: PEP 260: simplify xrange() Message-ID: > Here's another sweet and short PEP. What do folks think? Is > xrange()'s complexity really worth having? > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > PEP: 260 > Title: Simplify xrange() > Version: $Revision: 1.1 $ > Author: guido at python.org (Guido van Rossum) > Status: Draft > Type: Standards Track > Python-Version: 2.2 > Created: 26-Jun-2001 > Post-History: 26-Jun-2001 > > Abstract > > This PEP proposes to strip the xrange() object from some rarely > used behavior like x[i:j] and x*n. > > > Problem > > The xrange() function has one idiomatic use: > > for i in xrange(...): ... If this is to be done, I would also propose that xrange() and range() be changed to allow passing in a straight-out sequence such as in the following code in order to get rid of the need for range(len(seq)): import __builtin__ def range (start, stop=None, step=1, range=range): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return range(start2, stop2, step) except TypeError: assert stop is None return range(len(start)) def xrange (start, stop=None, step=1, xrange=xrange): """""" start2 = start stop2 = stop if stop is None: stop2 = start start2 = 0 try: return xrange(start2, stop2, step) except TypeError: assert stop is None return xrange(len(start)) a = [5, 'a', 'Hello, world!'] b = range(a) c = xrange(4, 6) d = xrange(b) e = range(c) print a print b print c print d print e print range(d, 2) Tim Delaney From gward at python.net Wed Jun 27 03:24:32 2001 From: gward at python.net (Greg Ward) Date: Tue, 26 Jun 2001 21:24:32 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: ; from tdelaney@avaya.com on Wed, Jun 27, 2001 at 10:36:25AM +1000 References: Message-ID: <20010626212432.A4003@gerg.ca> On 27 June 2001, Delaney, Timothy said: > If this is to be done, I would also propose that xrange() and range() be > changed to allow passing in a straight-out sequence such as in the following > code in order to get rid of the need for range(len(seq)): I'm +1 on the face of it without stopping to consider any implications. ;-) Some bits of syntactic sugar as just too good to pass up. range(len(sequence)) is syntactic cod-liver oil. Greg -- Greg Ward - programmer-at-big gward at python.net http://starship.python.net/~gward/ Blood is thicker than water, and much tastier. From nas at python.ca Wed Jun 27 03:28:29 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 26 Jun 2001 18:28:29 -0700 Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.64873.213278.925715@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Tue, Jun 26, 2001 at 05:23:53PM -0400 References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> Message-ID: <20010626182829.A3344@glacier.fnational.com> Fred L. Drake, Jr. wrote: > The "1" and the "976" are the appearant resolution of the time > values reported by those two calls, in microseconds. It looks like > the HZ define in that header file you pointed out could be bumped a > little higher. ;-) I've got it at 1024. >>> 976. / 10000 * 1024 99.942400000000006 I think yours is at the 100 default. Neil From fdrake at acm.org Wed Jun 27 04:14:00 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 26 Jun 2001 22:14:00 -0400 (EDT) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <20010626182829.A3344@glacier.fnational.com> References: <15160.59941.295880.286167@cj42289-a.reston1.va.home.com> <20010626140138.A2838@glacier.fnational.com> <15160.64873.213278.925715@cj42289-a.reston1.va.home.com> <20010626182829.A3344@glacier.fnational.com> Message-ID: <15161.16744.665259.229385@cj42289-a.reston1.va.home.com> Neil Schemenauer writes: > I've got it at 1024. > > >>> 976. / 10000 * 1024 > 99.942400000000006 > > I think yours is at the 100 default. That's correct. Yours could be bumped a bit (factor of 10? I'm not really sure where it would cause problems in practice, though I think I understand the general explanations I've seen), and mine could be bumped a good bit. But I intend to stick with a stock kernel since I expect most users will be using a stock kernel, and I don't have a pile of extra machines to play with. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From greg at cosc.canterbury.ac.nz Wed Jun 27 04:37:21 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 14:37:21 +1200 (NZST) Subject: [Python-Dev] collecting timer resolution information... In-Reply-To: <15160.60188.806308.247566@cj42289-a.reston1.va.home.com> Message-ID: <200106270237.OAA05182@s454.cosc.canterbury.ac.nz> Here are the results from a few machines around here: s454% uname -a SunOS s454 5.7 Generic_106541-10 sun4m sparc SUNW,SPARCstation-4 s454% observation timeofday: 2 (1 calls), rusage: 10000 (22 calls) oma% uname -a SunOS oma 5.7 Generic sun4u sparc SUNW,Ultra-4 oma% observation timeofday: 1 (2 calls), rusage: 10000 (115 calls) pc250% uname -a SunOS pc250 5.8 Generic_108529-03 i86pc i386 i86pc pc250% observation timeofday: 1 (1 calls), rusage: 10000 (232 calls) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From JamesL at Lugoj.Com Wed Jun 27 04:42:20 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Tue, 26 Jun 2001 19:42:20 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B39480C.F4808C1F@Lugoj.Com> Tim Peters wrote: > [James Logajan] > > Are there still known bugs that will take some effort to repair? Is > > xrange constantly touched when changes are made elsewhere? If no to > > both, then I suggest don't fix what ain't broken; life is too short. > > (Unless it is annoying you to distraction, then do the deed and get > > it over with.) > > I think it's more the latter. I partly provoked this by bitterly pointing > out that there's more code in the CVS tree devoted to supporting the single > xrange() gimmick than Neil Schemenauer added to support the get-out-of-town > more powerful new generators. Masses of crufty code nobody benefits from > are a burden on the soul. Design mistakes one has made do tend to weigh on one's soul (speaking from more than two decades of programming experience) so I understand the primal urge to correct them when one can, and even when one shouldn't. So although I'm quite annoyed by all these new-fangled gimmicks being added to the language (i.e. Python generators being added to solve California's power problems) I have no problem with xrange being fenced in. (I find the very existence of the PEP process somewhat unsettling; there are now thousands of programmers trying to use the language. Why burden them with insuring their programs remain compatible with yet-another-damn-set-of-proposals every year? Or worse: trying to rewrite their code "more elegantly" using all the latest gimmicks. Why in my day, if you wanted to, say, save execution state, you figured out how to do it and didn't go crying to the language designer. Damn these young lazy programmers. Don't know how good they have it. Wouldn't know how to save their execution state if their lives depended on it. Harumph.) Speaking of "generators", I just want to say that I think that "generator" makes for lousy terminology. If I understand correctly, "generators" are coroutines that have peer-to-peer synchronized messaging (synchronizing and communicating at the "yield" points). To my mind, "generators" does not evoke that image at all. Assuming I understand it in my early senility.... > although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- > full-of-crufty-old-irix5-demos-in-the-std-library-ly Perhaps because the Irix community would be quite Irate if they were removed? From tim.one at home.com Wed Jun 27 06:38:15 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 00:38:15 -0400 Subject: [Python-Dev] RE: PEP 260: simplify xrange() In-Reply-To: <3B39480C.F4808C1F@Lugoj.Com> Message-ID: [James Logajan] > Design mistakes one has made do tend to weigh on one's soul (speaking > from more than two decades of programming experience) so I understand > the primal urge to correct them when one can, and even when one > shouldn't. Is this a case when one shouldn't? That is, is it a specific comment on PEP 260, or just a general venting here? > So although I'm quite annoyed by all these new-fangled gimmicks being > added to the language (i.e. Python generators being added to solve > California's power problems) I have no problem with xrange being fenced > in. OK. > (I find the very existence of the PEP process somewhat unsettling; > there are now thousands of programmers trying to use the language. Why > burden them with insuring their programs remain compatible with yet- > another-damn-set-of-proposals every year? You can ask the C, C++, Fortran, Perl, COBOL (etc, etc) folks that too, but I suspect it's a rhetorical question. I wish you could ask the Java committee, but they work in secret . > Or worse: trying to rewrite their code "more elegantly" using all the > latest gimmicks. Use of new features isn't required by Guido, and neither is downloading new releases. If *you* waste your time doing that, we both know it's because you can't resist <0.5 wink>. > ... > Speaking of "generators", I just want to say that I think that > "generator" makes for lousy terminology. A generator, umm, *generates* a sequence of values. It's neither more specific nor more general than that, so we're pretty much limited to vaguely suggestive terms like "generator" and "iterator"; Python already used the latter word for something else. I'd be happy to call them pink flamingos. > If I understand correctly, "generators" are coroutines They're formally semi-coroutines; it's not symmetric. > that have peer-to-peer synchronized messaging (synchronizing and > communicating at the "yield" points). Way too highfalutin' a view. Think of a generator as a resumable function, and you're not missing anything -- not even an implementation subtlety. They *are* resumable functions. A "yield" is just a "return", but with the twist that the function can resume executing after the "yield" again. If you also think of ordinary call/return as a peer-to-peer etc etc, then I suppose you're stuck with that view here too. > To my mind, "generators" does not evoke that image at all. Good, because that image was overblown beyond recognition . >> although-it-would-be-impolite-to-ask-we-why-still-ship-a-directory- >> full-of-crufty-old-irix5-demos-in-the-std-library-ly > Perhaps because the Irix community would be quite Irate if they were > removed? Doubt it: the Irix5 library files haven't really been touched since 1993. For several years we've also shipped an Irix6 library with all the same stuff. But I suppose releasing a new OS was a symptom of SGI picking on its users too . From tim.one at home.com Wed Jun 27 07:14:29 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:14:29 -0400 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: The _winreg project no longer links: Creating library ./_winreg_d.lib and object ./_winreg_d.exp _winreg.obj : error LNK2001: unresolved external symbol __imp__PyUnicode_DecodeMBCS The compilation of PyUnicode_DecodeMBCS in unicodeobject.c is in a #if defined(MS_WIN32) && defined(HAVE_USABLE_WCHAR_T) block. But the top of unicodeobject.h now wraps the enabling # if defined(MS_WIN32) && !defined(USE_UCS4_STORAGE) # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif block inside a #ifndef PY_UNICODE_TYPE block, and a change to PC/config.h: #define PY_UNICODE_TYPE unsigned short stops all that. IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and that prevents unicodeobject.c from supplying routines _winreg.c calls. leaving-it-to-an-expert-who-thinks-they-know-what-all-these-symbols- are-supposed-to-really-mean-ly y'rs - tim From greg at cosc.canterbury.ac.nz Wed Jun 27 07:41:46 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 27 Jun 2001 17:41:46 +1200 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" Message-ID: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> I'm trying to install Python-2.1 on Windows, and I keep getting "Corrupt Installation Detected" when I run the installer. From tim.one at home.com Wed Jun 27 07:53:01 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 01:53:01 -0400 Subject: [Python-Dev] Help: Python 2.1: "Corrupt Installation Detected" In-Reply-To: <3B39721A.DED4E85A@cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I'm trying to install Python-2.1 on Windows, > and I keep getting "Corrupt Installation Detected" > when I run the installer. [but no other evidence that > it's actually corrupt] You didn't say which flavor of Windows, but should have . Ditto what it is you're running (the PythonLabs distro? ActiveState's? PythonWare's?). Known causes for this from the PythonLabs installer include (across various flavors of Windows), in decreasing order of likelihood: + Trying to install while logged in to an account with insufficient permissions (try logging in as Adminstrator, if on a version of Windows where that makes sense). + Trying to install over a network. Copy the installer to a local disk first. + Conflicts with anti-virus software (disable it -- indeed, my Win9x Life got much saner after I wiped Norton AntiVirus from my hard drive). + Conflicts with other running programs (like installer splash screens always say, close all other programs). + Insufficient memory, disk space, or magic low-level Windows resources. + There may or may not be a problem unique to French versions of Windows. Any of those apply? From martin at loewis.home.cs.tu-berlin.de Wed Jun 27 09:12:11 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 27 Jun 2001 09:12:11 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken Message-ID: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > that prevents unicodeobject.c from supplying routines _winreg.c > calls. The best thing, IMO, would be if PC/config.h defines everything available in config.h also. In this case, the proper defines would be #define Py_USING_UNICODE #define HAVE_USABLE_WCHAR_T #define Py_UNICODE_SIZE 2 #define PY_UNICODE_TYPE wchar_t If that approach is used, the defaulting in Include/unicodeobject.h could go away. Alternatively, define only Py_USING_UNICODE of this in PC/config.h, and change the block in Include/unicodeobject.h to /* Windows has a usable wchar_t type (unless we're using UCS-4) */ # ifdef MS_WIN32 # ifdef USE_UCS4_STORAGE # define Py_UNICODE_SIZE 4 # define PY_UNICODE_TYPE unsigned int # else # define Py_UNICODE_SIZE 2 # define HAVE_USABLE_WCHAR_T # define PY_UNICODE_TYPE wchar_t # endif # endif Regards, Martin From tim.one at home.com Wed Jun 27 09:39:38 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 03:39:38 -0400 Subject: [Python-Dev] New Unicode warnings Message-ID: There are 3 functions now where the prototypes in unicodeobject.h don't match the definitions in unicodeobject.c. Like, in .h, extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( register const Py_UNICODE ch /* Unicode character */ ); but in .c: Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) That is, they disagree about const (a silly language idea if ever there was one ). The others (I haven't check these for the exact reason(s), but assume they're the same deal): _PyUnicode_ToUppercase _PyUnicode_ToLowercase From Armin.Rigo at ima.unil.ch Wed Jun 27 11:01:18 2001 From: Armin.Rigo at ima.unil.ch (RIGO Armin) Date: Wed, 27 Jun 2001 11:01:18 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B391D88.305CCB4E@ActiveState.com> Message-ID: On Tue, 26 Jun 2001, Paul Prescod wrote: > Armin Rigo wrote: > > I am considering using GNU Lightning to produce code from the Psyco > > compiler. (...) > > Core Python has no GPLed components. I would hate to have you put in a > bunch of work worthy of inclusion in core Python to see it rejected on > those grounds. Good remark. Anyone else has comments about this ? Psyco would probably not be part of the core Python, but only an extension module; but your objection is nevertheless valid. Any alternatives ? I am considering a more theoretical approach, based on Tunes (http://tunes.org) as mentionned in Psyco's readme file, but this would take a lot more time -- althought it might give much more impressive results. Armin. From neal at metaslash.com Wed Jun 27 13:48:00 2001 From: neal at metaslash.com (Neal Norwitz) Date: Wed, 27 Jun 2001 07:48:00 -0400 Subject: [Python-Dev] ANN: PyChecker version 0.6.1 Message-ID: <3B39C7F0.2CA171C5@metaslash.com> A new version of PyChecker is available for your hacking pleasure. PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Comments, criticisms, new ideas, and other feedback is welcome. Here's the CHANGELOG: * Check format strings: "%s %s %s" % (v1, v2, v3, v4) for arg counts * Warn when format strings do: '%(var) %(var2)' * Fix Local variable (xxx) not used, when have: "%(xxx)s" % locals() * Warn when local variable (xxx) doesn't exist and have: "%(xxx)s" % locals() * Install script in /usr/local/bin to invoke PyChecker * Don't produce unused global warnings when using a module in parameters * Don't produce unused global warnings when using a module in class variables * Add check when using method as an attribute (if self.method and x == y:) * Add check for right # of args to object construction * Add check for right # of args to function calls in other modules * Check for returning a value from __init__ * Fix using from XX import YY ; from XX import ZZ causing re-import warning * Fix UNABLE TO IMPORT errors for files that don't end with a newline * Support for checking consistent return values -- not complete produces too many false positives (off by default, use -r/--returnvalues to enable) PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From paulp at ActiveState.com Wed Jun 27 13:53:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 27 Jun 2001 04:53:08 -0700 Subject: [Python-Dev] Python Specializing Compiler References: Message-ID: <3B39C924.E865177D@ActiveState.com> RIGO Armin wrote: > >... > > I am considering a more theoretical approach, based on Tunes > (http://tunes.org) as mentionned in Psyco's readme file, but this would > take a lot more time -- althought it might give much more impressive > results. If you are thinking about incorporating some ideas from Tunes that's one thing. But if you want to use their code I would ask "what code?" I have heard about Tunes for several years now and not seen any visible forward progress. See also: http://tunes.org/Tunes-FAQ-6.html#ss6.2 -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mark.favas at csiro.au Wed Jun 27 13:48:37 2001 From: mark.favas at csiro.au (Mark Favas) Date: Wed, 27 Jun 2001 19:48:37 +0800 Subject: [Python-Dev] More unicode blues... Message-ID: <3B39C815.E9CDF41B@csiro.au> unicodectype.c now fails to compile, because ch is declared const, and then assigned to. Tim has (apparently) had similar problems, but in his case the compiler just gives a warning, rather than an error.: cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal at lemburg.com Wed Jun 27 14:10:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 27 Jun 2001 14:10:57 +0200 Subject: [Python-Dev] Unicode Maintenance Message-ID: <3B39CD51.406C28F0@lemburg.com> Looking at the recent burst of checkins for the Unicode implementation completely bypassing the standard SF procedure and possible comments I might have on the different approaches, I guess I've been ruled out as maintainer and designer of the Unicode implementation. Well, I guess that's how things go. Was nice working for you guys, but no longer is... I'm tired of having to defend myself against meta-comments about the design, uncontrolled checkins and no true backup about my standing in all this from Guido. Perhaps I am misunderstanding the role of a maintainer and implementation designer, but as it is all respect for the work I've put into all this seems faded. That's the conclusion I draw from recent postings by Martin and Fredrik and their nightly "takeover". Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From arigo at ulb.ac.be Wed Jun 27 14:18:43 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Wed, 27 Jun 2001 14:18:43 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B39C924.E865177D@ActiveState.com> Message-ID: Hello Paul, On Wed, 27 Jun 2001, Paul Prescod wrote: > If you are thinking about incorporating some ideas from Tunes that's one > thing. But if you want to use their code I would ask "what code?" I have > heard about Tunes for several years now and not seen any visible forward > progress. Yes, I know this. I am myself a (recent) member of the Tunes project, and have made Tunes' goals mine. Armin From guido at digicool.com Wed Jun 27 16:32:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 10:32:23 -0400 Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: Your message of "Wed, 27 Jun 2001 11:01:18 +0200." References: Message-ID: <200106271432.f5REWOn19377@odiug.digicool.com> > Good remark. Anyone else has comments about this ? Not really, except to emphasize that inclusion of GPL'ed code in core Python is indeed a no-no. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed Jun 27 16:48:02 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:48:02 +0200 Subject: [Python-Dev] New Unicode warnings References: Message-ID: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> tim peters wrote: > There are 3 functions now where the prototypes in unicodeobject.h don't > match the definitions in unicodeobject.c. Like, in .h, > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > register const Py_UNICODE ch /* Unicode character */ > ); > > but in .c: > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) what's that "register" doing in a prototype? any reason we cannot just change the signature(s) to Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) to make it look more like contemporary C code? From fredrik at pythonware.com Wed Jun 27 16:49:31 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 27 Jun 2001 16:49:31 +0200 Subject: [Python-Dev] Unicode fallout: Windows _winreg project broken References: <200106270712.f5R7CBh06458@mira.informatik.hu-berlin.de> Message-ID: <00a101c0ff19$e2a19740$4ffa42d5@hagrid> martin wrote: > > IOW, HAVE_USABLE_WCHAR_T no longer gets #define'd on Windows, and > > that prevents unicodeobject.c from supplying routines _winreg.c > > calls. > > The best thing, IMO, would be if PC/config.h defines everything > available in config.h also. In this case, the proper defines would be > > #define Py_USING_UNICODE > #define HAVE_USABLE_WCHAR_T > #define Py_UNICODE_SIZE 2 > #define PY_UNICODE_TYPE wchar_t > > If that approach is used, the defaulting in Include/unicodeobject.h > could go away. my fault; I missed the HAVE_USABLE_WCHAR_T define when I tried to fix tim's fix. I'll fix it. From guido at digicool.com Wed Jun 27 17:07:47 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 11:07:47 -0400 Subject: [Python-Dev] New Unicode warnings In-Reply-To: Your message of "Wed, 27 Jun 2001 16:48:02 +0200." <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> References: <009e01c0ff19$e25b3f70$4ffa42d5@hagrid> Message-ID: <200106271507.f5RF7lq19494@odiug.digicool.com> > tim peters wrote: > > > There are 3 functions now where the prototypes in unicodeobject.h don't > > match the definitions in unicodeobject.c. Like, in .h, > > > > extern DL_IMPORT(Py_UNICODE) _PyUnicode_ToTitlecase( > > register const Py_UNICODE ch /* Unicode character */ > > ); > > > > but in .c: > > > > Py_UNICODE _PyUnicode_ToTitlecase(register Py_UNICODE ch) > > what's that "register" doing in a prototype? Enjoying a day off? > any reason we cannot just change the signature(s) to > > Py_UNICODE _PyUnicode_ToTitlecase(Py_UNICODE ch) > > to make it look more like contemporary C code? > > I cannot see how either register or const are going to make any difference in the prototype given that Py_UNICODE is a scalar type, so please just do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From JamesL at Lugoj.Com Wed Jun 27 17:58:54 2001 From: JamesL at Lugoj.Com (James Logajan) Date: Wed, 27 Jun 2001 08:58:54 -0700 Subject: [Python-Dev] Re: PEP 260: simplify xrange() References: Message-ID: <3B3A02BE.21039365@Lugoj.Com> Tim Peters wrote: > > [James Logajan] > > Design mistakes one has made do tend to weigh on one's soul (speaking > > from more than two decades of programming experience) so I understand > > the primal urge to correct them when one can, and even when one > > shouldn't. > > Is this a case when one shouldn't? That is, is it a specific comment on PEP > 260, or just a general venting here? Just a general bit of silly "" venting. Insert some non-zero fraction in the wink. I tried to insert some obvious absurdities to indicate I was not being very serious. (Yes, I know that one shouldn't try that in mixed company.) From guido at digicool.com Wed Jun 27 18:11:49 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:11:49 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 14:10:57 +0200." <3B39CD51.406C28F0@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> Message-ID: <200106271611.f5RGBn819631@odiug.digicool.com> > Looking at the recent burst of checkins for the Unicode implementation > completely bypassing the standard SF procedure and possible comments > I might have on the different approaches, I guess I've been ruled out > as maintainer and designer of the Unicode implementation. > > Well, I guess that's how things go. Was nice working for you guys, > but no longer is... I'm tired of having to defend myself against > meta-comments about the design, uncontrolled checkins and no true > backup about my standing in all this from Guido. > > Perhaps I am misunderstanding the role of a maintainer and > implementation designer, but as it is all respect for the work I've > put into all this seems faded. That's the conclusion I draw from recent > postings by Martin and Fredrik and their nightly "takeover". > > Thanks, > -- > Marc-Andre Lemburg [For those of us to whom Marc-Andre's complaint comes as a total surprise: there was a thread on i18n-sig about whether we should support Unicode surrogates, followed by a conclusion to skip surrogates and jump directly to optional support for UCS-4, followed by some checkins that enabled a configuration choice between UCS-2 and UCS-4, and code to make it work. As a side effect, surrogate support in the UCS-2 version actually improved slightly.] Now, now, Marc-Andre. The only comments I recall from you on my "surrogates: just say no" post seemed favorable, except that you proposed to to all the way and make UCS-4 mandatory. I explained why I didn't want to go that far, and why I didn't believe your arguments against giving users a choice. I didn't hear back from you then, and I didn't think you could have much of a problem with my position. Our process requires the use of the SF patch manager only for controversial changes. Based on your feedback, I didn't think there was anything controversial about the changes that Fredrik and Martin have made! (If there was, IMO it was temporarily breaking the Windows build and the test suite -- but that's all fixed now.) I don't understand where you get the idea that we lost respect for your work! In fact, the fact that it was so easy to make the changes suggested to me that the original design was well suited to this particular change (as opposed to the surrugate support proposals, which all sounded like they would require a *lot* of changes). I don't think that we have very strict roles in this community anyway. (My role as BDFL excluded -- that's why I get to write this response. :-) I'd say that Fredrik owns SRE, because he has asserted that ownership at various times: he's undone changes by others that broke the 1.5.2 support, for example. But the Unicode support in Python isn't owned by one person: many folks have contributed to that, including Fredrik, who designed and wrote the original Unicode string object implementation. If you have specific comments about the changes made, please be specific. If you feel slighted by meta-comments, please also be specific. I don't think I've said anything derogatory about you or your design. Paul Prescod offered to write a PEP on this issue. My cynical half believes that we'll never hear from him again, but my optimistic half hopes that he'll actually write one, so that we'll be able to discuss the various issues for the users with the users. I encourage you to co-author the PEP, since you have a lot of background knowledge about the issues. BTW, I think that Misc/unicode.txt should be converted to a PEP, for the historic record. It was very much a PEP before the PEP process was invented. Barry, how much work would this be? No editing needed, just formatting, and assignment of a PEP number (the lower the better). --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jun 27 18:24:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 12:24:30 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <15162.2238.720508.508081@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> BTW, I think that Misc/unicode.txt should be converted to a GvR> PEP, for the historic record. It was very much a PEP before GvR> the PEP process was invented. Barry, how much work would GvR> this be? No editing needed, just formatting, and assignment GvR> of a PEP number (the lower the better). Not much work at all, so I'll do this (and replace Misc/unicode.txt with a pointer to the PEP). Let's go with PEP 7, but stick it under the "Other Informational PEPs" category. -Barry From guido at digicool.com Wed Jun 27 18:36:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 12:36:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Wed, 27 Jun 2001 12:24:30 EDT." <15162.2238.720508.508081@anthem.wooz.org> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> Message-ID: <200106271636.f5RGa5719660@odiug.digicool.com> > GvR> BTW, I think that Misc/unicode.txt should be converted to a > GvR> PEP, for the historic record. It was very much a PEP before > GvR> the PEP process was invented. Barry, how much work would > GvR> this be? No editing needed, just formatting, and assignment > GvR> of a PEP number (the lower the better). > > Not much work at all, so I'll do this (and replace Misc/unicode.txt > with a pointer to the PEP). Let's go with PEP 7, but stick it under > the "Other Informational PEPs" category. > > -Barry Rather than informational, how about "Standard Track - Accepted (or Final)" ? That really matches the history best. I'd propose PEP number 100 -- the below-100 series is more for meta-PEPs. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Wed Jun 27 19:05:35 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 27 Jun 2001 13:05:35 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <15162.2238.720508.508081@anthem.wooz.org> <200106271636.f5RGa5719660@odiug.digicool.com> Message-ID: <15162.4703.741647.850696@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Rather than informational, how about "Standard Track - GvR> Accepted (or Final)" ? That really matches the history best. GvR> I'd propose PEP number 100 -- the below-100 series is more GvR> for meta-PEPs. Fine with me. -Barry From fdrake at acm.org Wed Jun 27 21:45:05 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 15:45:05 -0400 (EDT) Subject: [Python-Dev] New profiling interface Message-ID: <15162.14273.490573.156770@cj42289-a.reston1.va.home.com> The new core interface I checked in allows profilers and tracers (debuggers, coverage tools) to be written in C. I still need to write documentation for it; that shouldn't be too far off though. If anyone would like to have this available for Python 2.1.x, I have a version that I developed on the release20-maint branch. It can't be added to that branch since it's pretty clearly a new feature, but the patch is available at: http://starship.python.net/crew/fdrake/patches/py21-profiling.patch Enjoy! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mark.favas at csiro.au Wed Jun 27 23:45:17 2001 From: mark.favas at csiro.au (Mark Favas) Date: Thu, 28 Jun 2001 05:45:17 +0800 Subject: [Python-Dev] unicode, "const"s and lvalues Message-ID: <3B3A53ED.A8EEE265@csiro.au> Unreasonable as it may seem, my compiler really expects that entities declared as const's not be used in contexts where a modifiable lvalue is required. It gets all huffy, and refuses to continue compiling, even if I speak nicely (in unicode) to it. I'll file a bug report. On the code, not the compiler . cc -c -O -Olimit 1500 -Dss_family=__ss_family -Dss_len=__ss_len -I. -I./Include -DHAVE_CONFIG_H -o Objects/unicodectype.o Objects/unicodectype.c cc: Error: Objects/unicodectype.c, line 67: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->title; --------^ cc: Error: Objects/unicodectype.c, line 69: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; --------^ cc: Error: Objects/unicodectype.c, line 74: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 362: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->upper; ----^ cc: Error: Objects/unicodectype.c, line 366: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ cc: Error: Objects/unicodectype.c, line 378: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch += ctype->lower; ----^ cc: Error: Objects/unicodectype.c, line 382: In this statement, "ch" has const-qualified type, but occurs in a context that requires a modifiable lvalue. (neednonconst) ch -= 0x10000; --------^ make: *** [Objects/unicodectype.o] Error 1 -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From guido at digicool.com Wed Jun 27 23:57:16 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 27 Jun 2001 17:57:16 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: Your message of "Thu, 28 Jun 2001 05:45:17 +0800." <3B3A53ED.A8EEE265@csiro.au> References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <200106272157.f5RLvGo20101@odiug.digicool.com> > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. On the code, > not the compiler . VC++ also warns about this. I think the declaration of the Character Type APIs in unicodeobject.h really shouldn't include either register or char. Then their implementations should also lose the 'const'. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed Jun 27 23:58:34 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 27 Jun 2001 17:58:34 -0400 Subject: [Python-Dev] unicode, "const"s and lvalues In-Reply-To: <3B3A53ED.A8EEE265@csiro.au> Message-ID: [Mark Favas] > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. It gets all huffy, and refuses to continue compiling, even if > I speak nicely (in unicode) to it. I'll file a bug report. No real need, this was already brought up about 13 hours ago, although maybe that was only on the i18n-sig. I was left with the vague impression that Fredrik intended to fix it. If it's not fixed by tomorrow, you can make me feel guilty enough to fix it (I first reported it, so I guess it's my problem ). could've-been-yours!-ly y'rs - tim From fredrik at pythonware.com Thu Jun 28 00:42:14 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 28 Jun 2001 00:42:14 +0200 Subject: [Python-Dev] unicode, "const"s and lvalues References: <3B3A53ED.A8EEE265@csiro.au> Message-ID: <00b701c0ff5a$6ab8f660$4ffa42d5@hagrid> mark wrote: > Unreasonable as it may seem, my compiler really expects that entities > declared as const's not be used in contexts where a modifiable lvalue is > required. it's fixed now, I think. (btw, unreasonable as it may seem, your mail server refuses to accept mail sent to your reply address, even if I speak nicely to it ;-) Cheers /F From fdrake at acm.org Thu Jun 28 04:44:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 27 Jun 2001 22:44:54 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Is anyone here using NIS (Sun's old "Yellow Pages" service)? There's a bug for this on Linux that's been assigned to me for some time, but I don't have access to a network using NIS. Can anyone either confirm the bug or the fix? Or at least confirm that the suggested fix doesn't break the nis module on some other platform? (Testing this on a Sun SPARC box would be really nice!) I'd really appreciate some help on this one. The bug report is: http://sourceforge.net/tracker/index.php?func=detail&aid=233084&group_id=5470&atid=105470 Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From thomas at xs4all.net Thu Jun 28 10:13:09 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 28 Jun 2001 10:13:09 +0200 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> References: <15162.39462.75553.378146@cj42289-a.reston1.va.home.com> Message-ID: <20010628101309.X8098@xs4all.nl> On Wed, Jun 27, 2001 at 10:44:54PM -0400, Fred L. Drake, Jr. wrote: > Is anyone here using NIS (Sun's old "Yellow Pages" service)? > There's a bug for this on Linux that's been assigned to me for some > time, but I don't have access to a network using NIS. Can anyone > either confirm the bug or the fix? Or at least confirm that the > suggested fix doesn't break the nis module on some other platform? > (Testing this on a Sun SPARC box would be really nice!) > I'd really appreciate some help on this one. The bug report is: If noone else pops up, I'll setup a small NIS network at home to test it when my new computer arrives (a week or two.) We use NIS a lot at work, but not on Linux machines (the 16-bit uid limitation prevented us from using Linux for user-accessible machines for a long time.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu Jun 28 11:04:07 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 28 Jun 2001 11:04:07 +0200 Subject: [Python-Dev] Unicode Maintenance References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> Message-ID: <3B3AF307.6496AFB4@lemburg.com> Guido van Rossum wrote: > > > Looking at the recent burst of checkins for the Unicode implementation > > completely bypassing the standard SF procedure and possible comments > > I might have on the different approaches, I guess I've been ruled out > > as maintainer and designer of the Unicode implementation. > > > > Well, I guess that's how things go. Was nice working for you guys, > > but no longer is... I'm tired of having to defend myself against > > meta-comments about the design, uncontrolled checkins and no true > > backup about my standing in all this from Guido. > > > > Perhaps I am misunderstanding the role of a maintainer and > > implementation designer, but as it is all respect for the work I've > > put into all this seems faded. That's the conclusion I draw from recent > > postings by Martin and Fredrik and their nightly "takeover". > > > > Thanks, > > -- > > Marc-Andre Lemburg > > [For those of us to whom Marc-Andre's complaint comes as a total > surprise: there was a thread on i18n-sig about whether we should > support Unicode surrogates, followed by a conclusion to skip > surrogates and jump directly to optional support for UCS-4, followed > by some checkins that enabled a configuration choice between UCS-2 and > UCS-4, and code to make it work. As a side effect, surrogate support > in the UCS-2 version actually improved slightly.] > > Now, now, Marc-Andre. > > The only comments I recall from you on my "surrogates: just say no" > post seemed favorable, except that you proposed to to all the way and > make UCS-4 mandatory. I explained why I didn't want to go that far, > and why I didn't believe your arguments against giving users a choice. > I didn't hear back from you then, and I didn't think you could have > much of a problem with my position. > > Our process requires the use of the SF patch manager only for > controversial changes. Based on your feedback, I didn't think there > was anything controversial about the changes that Fredrik and Martin > have made! (If there was, IMO it was temporarily breaking the Windows > build and the test suite -- but that's all fixed now.) > > I don't understand where you get the idea that we lost respect for > your work! In fact, the fact that it was so easy to make the changes > suggested to me that the original design was well suited to this > particular change (as opposed to the surrugate support proposals, > which all sounded like they would require a *lot* of changes). > > I don't think that we have very strict roles in this community anyway. > (My role as BDFL excluded -- that's why I get to write this > response. :-) I'd say that Fredrik owns SRE, because he has asserted > that ownership at various times: he's undone changes by others that > broke the 1.5.2 support, for example. > > But the Unicode support in Python isn't owned by one person: many > folks have contributed to that, including Fredrik, who designed and > wrote the original Unicode string object implementation. > > If you have specific comments about the changes made, please be > specific. If you feel slighted by meta-comments, please also be > specific. I don't think I've said anything derogatory about you or > your design. You didn't get my point. I feel responsable for the Unicode implementation design and would like to see it become a continued success. In that sense and taking into account that I am the maintainer of all this stuff, I think it is very reasonable to ask me before making any significant changes to the implementation and also respect any comments I put forward. Currently, I have to watch the checkins list very closely to find out who changed what in the implementation and then to take actions only after the fact. Since I'm not supporting Unicode as my full-time job this is simply impossible. We have the SF manager and there is really no need to rush anything around here. If I am offline or too busy with other things for a day or two, then I want to see patches on SF and not find new versions of the implementation already checked in. This has worked just fine during the last year, so I can only explain the latest actions in this direction with an urge to bypass my comments and any discussion this might cause. Needless to say that quality control is not possible anymore. Conclusion: I am not going to continue this work if this does not change. Another other problem for me is the continued hostility I feel on i18n against parts of the design and some of my decisions. I am not talking about your feedback and the feedback from many other people on the list which was excellent and to high standards. But reading the postings of the last few months you will find notices of what I am referring to here (no, I don't want to be specific). If people don't respect my comments or decision, then how can I defend the design and how can I stop endless discussions which simply don't lead anywhere ? So either I am missing something or there is a need for a clear statement from you about my status in all this. If I don't have the right to comment on proposals and patches, possibly even rejecting them, then I simply don't see any ground for keeping the implementation in a state which I can maintain. And last but not least: The fun-factor has faded which was the main motor driving my into working on Unicode in the first place. Nothing much you can do about this, though :-/ > Paul Prescod offered to write a PEP on this issue. My cynical half > believes that we'll never hear from him again, but my optimistic half > hopes that he'll actually write one, so that we'll be able to discuss > the various issues for the users with the users. I encourage you to > co-author the PEP, since you have a lot of background knowledge about > the issues. I guess your optimistic half won :-) I think Paul already did all the work, so I'll simply comment on what he wrote. > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > the historic record. It was very much a PEP before the PEP process > was invented. Barry, how much work would this be? No editing needed, > just formatting, and assignment of a PEP number (the lower the better). Thanks for converting the text to PEP format, Barry. Thanks for reading this far, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Thu Jun 28 14:25:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 28 Jun 2001 08:25:14 -0400 Subject: [Python-Dev] Unicode Maintenance In-Reply-To: Your message of "Thu, 28 Jun 2001 11:04:07 +0200." <3B3AF307.6496AFB4@lemburg.com> References: <3B39CD51.406C28F0@lemburg.com> <200106271611.f5RGBn819631@odiug.digicool.com> <3B3AF307.6496AFB4@lemburg.com> Message-ID: <200106281225.f5SCPIr20874@odiug.digicool.com> Hi Marc-Andre, I'm dropping the i18n-sig from the distribution list. I hear you: > You didn't get my point. I feel responsable for the Unicode > implementation design and would like to see it become a continued > success. I'm sure we all share this goal! > In that sense and taking into account that I am the > maintainer of all this stuff, I think it is very reasonable to > ask me before making any significant changes to the implementation > and also respect any comments I put forward. I understand you feel that we've rushed this in without waiting for your comments. Given how close your implementation was, I still feel that the changes weren't that significant, but I understand that you get nervous. If Christian were to check in his speed hack changes to the guts of ceval.c I would be nervous too! (Heck, I got nervous when Eric checked in his library-wide string method changes without asking.) Next time I'll try to be more sensitive to situations that require your review before going forward. > Currently, I have to watch the checkins list very closely > to find out who changed what in the implementation and then to > take actions only after the fact. Since I'm not supporting Unicode > as my full-time job this is simply impossible. We have the SF manager > and there is really no need to rush anything around here. Hm, apart from the fact that you ought to be left in charge, I think that in this case the live checkins were a big win over the usual SF process. At least two people were making changes, sometimes to each other's code, and many others on at least three continents were checking out the changes on many different platforms and immediately reporting problems. We would definitely not have a patch as solid as the code that's now checked in, after two days of using SF! (We could've used a branch, but I've found that getting people to actually check out the branch is not easy.) So I think that the net result was favorable. Sometimes you just have to let people work in the spur of the moment to get the results of their best thinking, otherwise they lose interest or their train of thought. > If I am offline or too busy with other things for a day or two, > then I want to see patches on SF and not find new versions of > the implementation already checked in. That's still the general rule, but in our enthousiasm (and mine was definitely part of this!) we didn't want to wait. Also, I have to admit that I mistook your silence for consent -- I didn't think the main proposed changes (making the size of Py_UNICODE a config choice) were controversial at all, so I didn't realize you would have a problem with it. > This has worked just fine during the last year, so I can only explain > the latest actions in this direction with an urge to bypass my comments > and any discussion this might cause. I think you're projecting your own stuff here. I honestly didn't think there was much disagreement on your part and thought we were doing you a favor by implementing the consensus. IMO, Martin and and Fredrik are familiar enough with both the code and the issues to do a good job. > Needless to say that > quality control is not possible anymore. Unclear. Lots of other people looked over the changes in your absence. And CVS makes code review after it's checked in easy enough. (Hey, in many other open source projects that's the normal procedure once the rough characteristics of a feature have been agreed upon: check in first and review later!) > Conclusion: > I am not going to continue this work if this does not change. That would be sad, and I hope you will stay with us. We certainly don't plan to ignore your comments! > Another other problem for me is the continued hostility I feel on i18n > against parts of the design and some of my decisions. I am > not talking about your feedback and the feedback from many other > people on the list which was excellent and to high standards. > But reading the postings of the last few months you will > find notices of what I am referring to here (no, I don't want > to be specific). I don't know what to say about this, and obviously nobody has the time to go back and read the archives. I'm sure it's not you as a person that was attacked. If the design isn't perfect -- and hey, since Python is the 80 percent language, few things in it are quite perfect! -- then (positive) criticism is an attempt to help, to move it closer to perfection. If people have at times said "the Unicode support sucks", well, that may hurt. You can't always stay friends with everybody. I get flames occasionally for features in Python that folks don't like. I get used to them, and it doesn't affect my confidence any more. Be the same! But sometimes, after saying "it sucks", people make specific suggestions for improvements, and it's important to be open for those even from sources that use offending language. (Within reason, of course. I don't ask you to listen to somebody who is persistently hostile to you as a person.) > If people don't respect my comments or decision, then how can > I defend the design and how can I stop endless discussions which > simply don't lead anywhere ? So either I am missing something > or there is a need for a clear statement from you about > my status in all this. Do you really *want* to be the Unicode BDFL? Being something's BDFL a full-time job, and you've indicated you're too busy. (Or is that temporary?) I see you as the original coder, which means that you know that section of the code better than anyone, and whenever there's a question that others can't answer about its design, implementation, or restrictions, I refer to you. But given that you've said you wouldn't be able to work much on it, I welcome contributions by others as long as they seem knowledgeable. > If I don't have the right to comment on proposals and patches, > possibly even rejecting them, then I simply don't see any > ground for keeping the implementation in a state which I can > maintain. Nobody said you couldn't comment, and you know that. When it comes to rejecting or accepting, I feel that I am still the final arbiter, even for Unicode, until I get hit by a bus. Since I don't always understand the implementation or the issues, I'll of course defer to you in cases where I think I can't make the decision, but I do reserve the right to be convinced by others to override your judgement, occasionally, if there's a good reason. And when you're not responsive, I may try to channel you. (I'll try to be more explicit about that.) > And last but not least: The fun-factor has faded which was > the main motor driving my into working on Unicode in the first > place. Nothing much you can do about this, though :-/ Yes, that happens to all of us at times. The fun factor goes up and down, and sometimes we must look for fun elsewhere for a while. Then the fun may come back where it appeared lost. Go on vacation, read a book, tackle a new project in a totally different area! Then come back and see if you can find some fun in the old stuff again. > > Paul Prescod offered to write a PEP on this issue. My cynical half > > believes that we'll never hear from him again, but my optimistic half > > hopes that he'll actually write one, so that we'll be able to discuss > > the various issues for the users with the users. I encourage you to > > co-author the PEP, since you have a lot of background knowledge about > > the issues. > > I guess your optimistic half won :-) I think Paul already did all the > work, so I'll simply comment on what he wrote. Your suggestions were very valuable. My opinion of Paul also went up a notch! > > BTW, I think that Misc/unicode.txt should be converted to a PEP, for > > the historic record. It was very much a PEP before the PEP process > > was invented. Barry, how much work would this be? No editing needed, > > just formatting, and assignment of a PEP number (the lower the better). > > Thanks for converting the text to PEP format, Barry. > > Thanks for reading this far, You're welcome, and likewise. Just one more thing, Marc-Andre. Please know that I respect your work very much even if we don't always agree. We would get by without you, but Python would be hurt if you turned your back on us. --Guido van Rossum (home page: http://www.python.org/~guido/) From arigo at ulb.ac.be Thu Jun 28 15:04:06 2001 From: arigo at ulb.ac.be (Armin Rigo) Date: Thu, 28 Jun 2001 15:04:06 +0200 (CEST) Subject: [Python-Dev] Python Specializing Compiler In-Reply-To: <3B393E92.B0719A7A@ulb.ac.be> Message-ID: On Tue, 26 Jun 2001, Armin Rigo wrote: > I am considering using GNU Lightning to produce code from the Psyco > compiler. I just found "vcode" (http://www.pdos.lcs.mit.edu/~engler/pldi96-abstract.html), which seems very interesting for portable JIT code generation. I am considering using it for Psyco. Has someone some experience with vcode ? Or any other comments ? Armin. From gball at cfa.harvard.edu Thu Jun 28 17:26:36 2001 From: gball at cfa.harvard.edu (Greg Ball) Date: Thu, 28 Jun 2001 11:26:36 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? Message-ID: Short version: I can confirm that bug under linux, but the patch breaks nis module on solaris. Linux machine is: Linux malhar 2.2.16-3smp #1 SMP Mon Jun 19 17:37:04 EDT 2000 i686 unknown with python version from recent CVS. I see the reported bug and the suggested patch does fix the problem. Sparc box looks like this: SunOS cfa0 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-Enterprise using python2.0 source tree. The nis module works out of the box, but applying the suggested patch breaks it: 'nis.error: No such key in map'. --Greg Ball From gregor at hoffleit.de Thu Jun 28 21:56:35 2001 From: gregor at hoffleit.de (Gregor Hoffleit) Date: Thu, 28 Jun 2001 21:56:35 +0200 Subject: [Python-Dev] MAGIC after 2001 ? Message-ID: <20010628215635.A5621@53b.hoffleit.de> Correct me, but AFAICS there are only 186 days left until Python's MAGIC scheme overflows: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) I couldn't find this problem in the SF bug tracking system. Should I submit a new bug entry ? Gregor From jack at oratrix.nl Thu Jun 28 23:03:47 2001 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:03:47 +0200 Subject: [Python-Dev] Passing silly values to time.strftime Message-ID: <20010628210352.33157120260@oratrix.oratrix.nl> Just noted (that's Just-the-person, not me-just-noting:-) that on the Mac time.strftime() can blow up with an access violation if you pass silly values to it (such as 9 zeroes). Does anyone know enough of the ANSI standard to tell me how strftime should behave with out-of-range values? I.e. should I report this as a bug to MetroWerks or should we rig up time.strftime() to check that all the values are in range? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Thu Jun 28 23:12:45 2001 From: jack at oratrix.nl (Jack Jansen) Date: Thu, 28 Jun 2001 23:12:45 +0200 Subject: [Python-Dev] Passing silly values to time.strftime In-Reply-To: Message by Jack Jansen , Thu, 28 Jun 2001 23:03:47 +0200 , <20010628210352.33157120260@oratrix.oratrix.nl> Message-ID: <20010628211250.4A6BC120260@oratrix.oratrix.nl> Recently, Jack Jansen said: > Just noted (that's Just-the-person, not me-just-noting:-) that on the > Mac time.strftime() can blow up with an access violation if you pass > silly values to it (such as 9 zeroes). Following up to myself, after I just noticed (just-me-noticing, not Just-the-person this time) that all zeros is a legal C value: gettmarg() converts this all-zeroes tuple to (0, 0, 0, 0, -1, 100, 0, -1, 0) Fine with me, apparently Python wants to have human-understandable (1-based) monthnumbers and yeardaynumbers, but then I think it really should also check that the values are in-range. What do others think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Jason.Tishler at dothill.com Thu Jun 28 23:17:15 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Thu, 28 Jun 2001 17:17:15 -0400 Subject: [Python-Dev] Threaded Cygwin Python Import Problem Message-ID: <20010628171715.P488@dothill.com> Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now provides enough pthreads support so that Cygwin Python builds OOTB *and* functions reasonably well even with threads enabled. Unfortunately, there are still a few issues that need to be resolved. The one that I would like to address in this posting prevents a threaded Cygwin Python from building the standard extension modules (without some kind of intervention). :,( Specifically, the build would frequently hang during the Distutils part when Cygwin Python is attempting to execvp a gcc process. See the first attachment, test.py, for a minimal Python script that exhibits the hang. See the second attachment, test.c, for a rewrite of test.py in C. Since test.c did not hang, I was able to conclude that this was not just a straight Cygwin problem. Further tracing uncovered that the hang occurs in _execvpe() (in os.py), when the child tries to import tempfile. If I apply the third attachment, os.py.patch, then the hang is avoided. Hence, it appears that importing a module (or specifically the tempfile module) in a threaded Cygwin Python child cause a hang. I saw the following comment in _execvpe(): # Process handling (fork, wait) under BeOS (up to 5.0) # doesn't interoperate reliably with the thread interlocking # that happens during an import. The actual error we need # is the same on BeOS for posix.open() et al., ENOENT. The above makes me think that possibly Cygwin is having a similar problem. Can anyone offer suggestions on how to further debug this problem? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com -------------- next part -------------- import os cmd = ['ls', '-l'] pid = os.fork() if pid == 0: print 'child execvp-ing' os.execvp(cmd[0], cmd) else: (pid, status) = os.waitpid(pid, 0) print 'status =', status print 'parent done' -------------- next part -------------- #include #include char* const cmd[] = {"ls", "-l", 0}; int main() { int status; pid_t pid = fork(); if (pid == 0) { printf("child execvp-ing\n"); execvp(cmd[0], cmd); } else { waitpid(pid, &status, 0); printf("status = %d\n", status); printf("parent done\n"); } } -------------- next part -------------- --- os.py.orig Thu Jun 28 16:14:28 2001 +++ os.py Thu Jun 28 16:30:12 2001 @@ -329,8 +329,9 @@ def _execvpe(file, args, env=None): try: unlink('/_#.# ## #.#') except error, _notfound: pass else: - import tempfile - t = tempfile.mktemp() + #import tempfile + #t = tempfile.mktemp() + t = '/mnt/c/TEMP/@279.3' # Exec a file that is guaranteed not to exist try: execv(t, ('blah',)) except error, _notfound: pass From tim at digicool.com Thu Jun 28 23:24:17 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 28 Jun 2001 17:24:17 -0400 Subject: [Python-Dev] MAGIC after 2001 ? In-Reply-To: <20010628215635.A5621@53b.hoffleit.de> Message-ID: [Gregor Hoffleit] > Correct me, Can't: you're correct. > but AFAICS there are only 186 days left until Python's MAGIC scheme > overflows: > > /* XXX Perhaps the magic number should be frozen and a version field > added to the .pyc file header? */ > /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ > #define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24)) > > I couldn't find this problem in the SF bug tracking system. Should I > submit a new bug entry ? Somebody should! It's a known problem, but the last crusade to redefine it ended up with 85% of a spec but no worker bees. If that continues, note that it has no effect on whether existing Python releases will continue to run, it just means we can't release new versions -- but now that the licensing issue is settled, I think we'll just close down the project instead . fun-while-it-lasted-ly y'rs - tim From paulp at ActiveState.com Fri Jun 29 04:59:45 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 28 Jun 2001 19:59:45 -0700 Subject: [Python-Dev] [Fwd: PEP: Support for "wide" Unicode characters] Message-ID: <3B3BEF21.63411C4C@ActiveState.com> Slow python-dev day...consider this exiting new proposal to allow deal with important new characters like the Japanese dentristy symbols and ecological symbols (but not Klingon) -------- Original Message -------- Subject: PEP: Support for "wide" Unicode characters Date: Thu, 28 Jun 2001 15:33:00 -0700 From: Paul Prescod Organization: ActiveState To: "python-list at python.org" PEP: 261 Title: Support for "wide" Unicode characters Version: $Revision: 1.3 $ Author: paulp at activestate.com (Paul Prescod) Status: Draft Type: Standards Track Created: 27-Jun-2001 Python-Version: 2.2 Post-History: 27-Jun-2001, 28-Jun-2001 Abstract Python 2.1 unicode characters can have ordinals only up to 2**16 -1. These characters are known as Basic Multilinual Plane characters. There are now characters in Unicode that live on other "planes". The largest addressable character in Unicode has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR and call characters in this range "wide characters". Glossary Character Used by itself, means the addressable units of a Python Unicode string. Code point If you imagine Unicode as a mapping from integers to characters, each integer represents a code point. Some are really used for characters. Some will someday be used for characters. Some are guaranteed never to be used for characters. Unicode character A code point defined in the Unicode standard whether it is already assigned or not. Identified by an integer. Code unit An integer representing a character in some encoding. Surrogate pair Two code units that represnt a single Unicode character. Proposed Solution One solution would be to merely increase the maximum ordinal to a larger value. Unfortunately the only straightforward implementation of this idea is to increase the character code unit to 4 bytes. This has the effect of doubling the size of most Unicode strings. In order to avoid imposing this cost on every user, Python 2.2 will allow 4-byte Unicode characters as a build-time option. Users can choose whether they care about wide characters or prefer to preserve memory. The 4-byte option is called "wide Py_UNICODE". The 2-byte option is called "narrow Py_UNICODE". Most things will behave identically in the wide and narrow worlds. * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a length-one string. * unichr(i) for 2**16 <= i <= TOPCHAR will return a length-one string representing the character on wide Python builds. On narrow builds it will return ValueError. ISSUE: Python currently allows \U literals that cannot be represented as a single character. It generates two characters known as a "surrogate pair". Should this be disallowed on future narrow Python builds? ISSUE: Should Python allow the construction of characters that do not correspond to Unicode characters? Unassigned Unicode characters should obviously be legal (because they could be assigned at any time). But code points above TOPCHAR are guaranteed never to be used by Unicode. Should we allow access to them anyhow? * ord() is always the inverse of unichr() * There is an integer value in the sys module that describes the largest ordinal for a Unicode character on the current interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds of Python and TOPCHAR on wide builds. ISSUE: Should there be distinct constants for accessing TOPCHAR and the real upper bound for the domain of unichr (if they differ)? There has also been a suggestion of sys.unicodewith which can take the values 'wide' and 'narrow'. * codecs will be upgraded to support "wide characters" (represented directly in UCS-4, as surrogate pairs in UTF-16 and as multi-byte sequences in UTF-8). On narrow Python builds, the codecs will generate surrogate pairs, on wide Python builds they will generate a single character. This is the main part of the implementation left to be done. * there are no restrictions on constructing strings that use code points "reserved for surrogates" improperly. These are called "isolated surrogates". The codecs should disallow reading these but you could construct them using string literals or unichr(). unichr() is not restricted to values less than either TOPCHAR nor sys.maxunicode. Implementation There is a new (experimental) define: #define PY_UNICODE_SIZE 2 There is a new configure options: --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses wchar_t if it fits --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses whchar_t if it fits --enable-unicode same as "=ucs2" The intention is that --disable-unicode, or --enable-unicode=no removes the Unicode type altogether; this is not yet implemented. Notes This PEP does NOT imply that people using Unicode need to use a 4-byte encoding. It only allows them to do so. For example, ASCII is still a legitimate (7-bit) Unicode-encoding. Rationale for Surrogate Creation Behaviour Python currently supports the construction of a surrogate pair for a large unicode literal character escape sequence. This is basically designed as a simple way to construct "wide characters" even in a narrow Python build. ISSUE: surrogates can be created this way but the user still needs to be careful about slicing, indexing, printing etc. Another option is to remove knowledge of surrogates from everything other than the codecs. Rejected Suggestions There were two primary solutions that were rejected. The first was more or less the status-quo. We could officially say that Python characters represent UTF-16 code units and require programmers to implement wide characters in their application logic. This is a heavy burden because emulating 32-bit characters is likely to be very inefficient if it is coded entirely in Python. Plus these abstracted pseudo-strings would not be legal as input to the regular expression engine. The other class of solution is to use some efficient storage internally but present an abstraction of wide characters to the programmer. Any of these would require a much more complex implementation than the accepted solution. For instance consider the impact on the regular expression engine. In theory, we could move to this implementation in the future without breaking Python code. A future Python could "emulate" wide Python semantics on narrow Python. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- http://mail.python.org/mailman/listinfo/python-list From fdrake at acm.org Fri Jun 29 16:03:28 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 29 Jun 2001 10:03:28 -0400 (EDT) Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: References: Message-ID: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Greg Ball writes: > Short version: I can confirm that bug under linux, but the patch breaks > nis module on solaris. I'm presuming that these were using the same NIS server? I'm wondering if this may be an endianess-related problem. I don't understand enough about the NIS protocols to know what's going on in that module. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mal at egenix.com Fri Jun 29 16:51:04 2001 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 29 Jun 2001 16:51:04 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> Message-ID: <3B3C95D8.518E5175@egenix.com> Paul Prescod wrote: > > Slow python-dev day...consider this exiting new proposal to allow deal > with important new characters like the Japanese dentristy symbols and > ecological symbols (but not Klingon) More comments... > -------- Original Message -------- > Subject: PEP: Support for "wide" Unicode characters > Date: Thu, 28 Jun 2001 15:33:00 -0700 > From: Paul Prescod > Organization: ActiveState > To: "python-list at python.org" > > PEP: 261 > Title: Support for "wide" Unicode characters > Version: $Revision: 1.3 $ > Author: paulp at activestate.com (Paul Prescod) > Status: Draft > Type: Standards Track > Created: 27-Jun-2001 > Python-Version: 2.2 > Post-History: 27-Jun-2001, 28-Jun-2001 > > Abstract > > Python 2.1 unicode characters can have ordinals only up to 2**16-1. > These characters are known as Basic Multilinual Plane characters. > There are now characters in Unicode that live on other "planes". > The largest addressable character in Unicode has the ordinal 17 * > 2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR > and call characters in this range "wide characters". > > Glossary > > Character > > Used by itself, means the addressable units of a Python > Unicode string. > > Code point > > If you imagine Unicode as a mapping from integers to > characters, each integer represents a code point. Some are > really used for characters. Some will someday be used for > characters. Some are guaranteed never to be used for > characters. > > Unicode character > > A code point defined in the Unicode standard whether it is > already assigned or not. Identified by an integer. You're mixing terms here: being a character in Unicode is a property which is defined by the Unicode specs; not all code points are characters ! I'd suggest not to use the term character in this PEP at all; this is also what Mark Davis recommends in his paper on Unicode. That way people reading the PEP won't even start to confuse things since they will most likely have to read this glossary to understand what code point and code units are. Also, a link to the Unicode glossary would be a good thing. > Code unit > > An integer representing a character in some encoding. A code unit is the basic storage unit used by Unicode strings, e.g. u[0], not necessarily a character. > Surrogate pair > > Two code units that represnt a single Unicode character. Please add Unicode string A sequence of code units. and a note that on wide builds: code unit == code point. > Proposed Solution > > One solution would be to merely increase the maximum ordinal to a > larger value. Unfortunately the only straightforward > implementation of this idea is to increase the character code unit > to 4 bytes. This has the effect of doubling the size of most > Unicode strings. In order to avoid imposing this cost on every > user, Python 2.2 will allow 4-byte Unicode characters as a > build-time option. Users can choose whether they care about > wide characters or prefer to preserve memory. > > The 4-byte option is called "wide Py_UNICODE". The 2-byte option > is called "narrow Py_UNICODE". > > Most things will behave identically in the wide and narrow worlds. > > * unichr(i) for 0 <= i < 2**16 (0x10000) always returns a > length-one string. > > * unichr(i) for 2**16 <= i <= TOPCHAR will return a > length-one string representing the character on wide Python > builds. On narrow builds it will return ValueError. > > ISSUE: Python currently allows \U literals that cannot be > represented as a single character. It generates two > characters known as a "surrogate pair". Should this be > disallowed on future narrow Python builds? Why not make the codec used by Python to convert Unicode literals to Unicode strings an option just like the default encoding ? That way we could have a version of the unicode-escape codec which supports surrogates and one which doesn't. > ISSUE: Should Python allow the construction of characters > that do not correspond to Unicode characters? > Unassigned Unicode characters should obviously be legal > (because they could be assigned at any time). But > code points above TOPCHAR are guaranteed never to > be used by Unicode. Should we allow access to them > anyhow? I wouldn't count on that last point ;-) Please note that you are mixing terms: you don't construct characters, you construct code points. Whether the concatenation of these code points makes a valid Unicode character string is an issue which applications and codecs have to decide. > * ord() is always the inverse of unichr() > > * There is an integer value in the sys module that describes the > largest ordinal for a Unicode character on the current > interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds > of Python and TOPCHAR on wide builds. > > ISSUE: Should there be distinct constants for accessing > TOPCHAR and the real upper bound for the domain of > unichr (if they differ)? There has also been a > suggestion of sys.unicodewith which can take the > values 'wide' and 'narrow'. > > * codecs will be upgraded to support "wide characters" > (represented directly in UCS-4, as surrogate pairs in UTF-16 and > as multi-byte sequences in UTF-8). On narrow Python builds, the > codecs will generate surrogate pairs, on wide Python builds they > will generate a single character. This is the main part of the > implementation left to be done. > > * there are no restrictions on constructing strings that use > code points "reserved for surrogates" improperly. These are > called "isolated surrogates". The codecs should disallow reading > these but you could construct them using string literals or > unichr(). unichr() is not restricted to values less than either > TOPCHAR nor sys.maxunicode. > > Implementation > > There is a new (experimental) define: > > #define PY_UNICODE_SIZE 2 > > There is a new configure options: > > --enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses > wchar_t if it fits > --enable-unicode=ucs4 configures a wide Py_UNICODE, and uses > whchar_t if it fits > --enable-unicode same as "=ucs2" > > The intention is that --disable-unicode, or --enable-unicode=no > removes the Unicode type altogether; this is not yet implemented. > > Notes > > This PEP does NOT imply that people using Unicode need to use a > 4-byte encoding. It only allows them to do so. For example, > ASCII is still a legitimate (7-bit) Unicode-encoding. > > Rationale for Surrogate Creation Behaviour > > Python currently supports the construction of a surrogate pair > for a large unicode literal character escape sequence. This is > basically designed as a simple way to construct "wide characters" > even in a narrow Python build. > > ISSUE: surrogates can be created this way but the user still > needs to be careful about slicing, indexing, printing > etc. Another option is to remove knowledge of > surrogates from everything other than the codecs. +1 on removing knowledge about surrogates from the Unicode implementation core (it's also the easiest: there is none :-) We should provide a new module which provides a few handy utilities though: functions which provide code point-, character-, word- and line- based indexing into Unicode strings. > Rejected Suggestions > > There were two primary solutions that were rejected. The first was > more or less the status-quo. We could officially say that Python > characters represent UTF-16 code units and require programmers to > implement wide characters in their application logic. This is a > heavy burden because emulating 32-bit characters is likely to be > very inefficient if it is coded entirely in Python. Plus these > abstracted pseudo-strings would not be legal as input to the > regular expression engine. > > The other class of solution is to use some efficient storage > internally but present an abstraction of wide characters > to the programmer. Any of these would require a much more complex > implementation than the accepted solution. For instance consider > the impact on the regular expression engine. In theory, we could > move to this implementation in the future without breaking Python > code. A future Python could "emulate" wide Python semantics on > narrow Python. > > Copyright > > This document has been placed in the public domain. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jepler at inetnebr.com Fri Jun 29 17:04:18 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Fri, 29 Jun 2001 10:04:18 -0500 Subject: [Python-Dev] NIS on Linux, others? In-Reply-To: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Fri, Jun 29, 2001 at 10:03:28AM -0400 References: <15164.35504.255476.169687@cj42289-a.reston1.va.home.com> Message-ID: <20010629100416.A24069@inetnebr.com> On Fri, Jun 29, 2001 at 10:03:28AM -0400, Fred L. Drake, Jr. wrote: > > Greg Ball writes: > > Short version: I can confirm that bug under linux, but the patch breaks > > nis module on solaris. > > I'm presuming that these were using the same NIS server? I'm > wondering if this may be an endianess-related problem. I don't > understand enough about the NIS protocols to know what's going on in > that module. It's my suspicion that it depends how the "aliases" map is built. The patch that "broke" things for the Linux systems includes the comment /* created with 'makedbm -a' */ which makes me suspect that it's dependant on the way the map is constructed. (I couldn't find an online makedbm manpage which documents a -a option) Endian issues should not exist, the protocol below NIS/YP takes care of this. Jeff From guido at digicool.com Fri Jun 29 17:24:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 29 Jun 2001 11:24:56 -0400 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Your message of "Fri, 29 Jun 2001 16:51:04 +0200." <3B3C95D8.518E5175@egenix.com> References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <200106291525.f5TFP0H29410@odiug.digicool.com> > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. I like this idea! I know that I *still* have a hard time not to think "C 'char' datatype, i.e. an 8-bit byte" when I read "character"... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Smart idea, but how practical is this? Can you spec this out a bit more? > +1 on removing knowledge about surrogates from the Unicode > implementation core (it's also the easiest: there is none :-) Except for \U currently -- or is that not part of the implementation core? > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. But its design is outside the scope of this PEP, I'd say. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Sat Jun 30 03:16:25 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 29 Jun 2001 18:16:25 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> Message-ID: <3B3D2869.5C1DDCF1@ActiveState.com> "M.-A. Lemburg" wrote: > >... > > I'd suggest not to use the term character in this PEP at all; > this is also what Mark Davis recommends in his paper on Unicode. That's fine, but Python does have a concept of character and I'm going to use the term character for discussing these. > Also, a link to the Unicode glossary would be a good thing. Funny how these little PEPs grow... >... > Why not make the codec used by Python to convert Unicode > literals to Unicode strings an option just like the default > encoding ? > > That way we could have a version of the unicode-escape codec > which supports surrogates and one which doesn't. Adding more and more knobs to tweak just adds up to Python code being non-portable from one machine to another. > > ISSUE: Should Python allow the construction of characters > > that do not correspond to Unicode characters? > > Unassigned Unicode characters should obviously be legal > > (because they could be assigned at any time). But > > code points above TOPCHAR are guaranteed never to > > be used by Unicode. Should we allow access to them > > anyhow? > > I wouldn't count on that last point ;-) > > Please note that you are mixing terms: you don't construct > characters, you construct code points. Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. unichr() does not construct code points. It constructs 1-char Python Unicode strings...also known as Python Unicode characters. > ... Whether the concatenation > of these code points makes a valid Unicode character string > is an issue which applications and codecs have to decide. The concatenation of true code points would *always* make a valid Unicode string, right? It's code units that cannot be blindly concatenated. >... > We should provide a new module which provides a few handy > utilities though: functions which provide code point-, > character-, word- and line- based indexing into Unicode > strings. Okay, I'll add: It has been proposed that there should be a module for working with UTF-16 strings in narrow Python builds through some sort of abstraction that handles surrogates for you. If someone wants to implement that, it will be another PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh at python.net Sat Jun 30 11:32:34 2001 From: mwh at python.net (Michael Hudson) Date: 30 Jun 2001 10:32:34 +0100 Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: Paul Prescod's message of "Fri, 29 Jun 2001 18:16:25 -0700" References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: Paul Prescod writes: > "M.-A. Lemburg" wrote: > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. As a Unicode Idiot (tm) can I please beg you to reconsider? There are so many possible meanings for "character" that I really think it's best to avoid the word altogether. Call Python characters "length 1 strings" or even "length 1 Python strings". [...] > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings This is what I think you should be saying. > ...also known as Python Unicode characters. Which I'm suggesting you forget! Cheers, M. -- I'm a keen cyclist and I stop at red lights. Those who don't need hitting with a great big slapping machine. -- Colin Davidson, cam.misc From paulp at ActiveState.com Sat Jun 30 13:28:28 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 04:28:28 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DB7DC.511A3D8@ActiveState.com> Michael Hudson wrote: > >... > > As a Unicode Idiot (tm) can I please beg you to reconsider? There are > so many possible meanings for "character" that I really think it's > best to avoid the word altogether. Call Python characters "length 1 > strings" or even "length 1 Python strings". Do you really feel that there are many possible meanings for the word "Python Unicode character?" This is a PEP: I have to assume a certain degree of common understanding. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at egenix.com Sat Jun 30 13:52:38 2001 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 30 Jun 2001 13:52:38 +0200 Subject: [Python-Dev] Support for "wide" Unicode characters References: <3B3BEF21.63411C4C@ActiveState.com> <3B3C95D8.518E5175@egenix.com> <3B3D2869.5C1DDCF1@ActiveState.com> Message-ID: <3B3DBD86.81F80D06@egenix.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > >... > > > > I'd suggest not to use the term character in this PEP at all; > > this is also what Mark Davis recommends in his paper on Unicode. > > That's fine, but Python does have a concept of character and I'm going > to use the term character for discussing these. The term "character" in Python should really only be used for the 8-bit strings. In Unicode a "character" can mean any of: """ Unfortunately the term character is vastly overloaded. At various times people can use it to mean any of these things: - An image on paper (glyph) - What an end-user thinks of as a character (grapheme) - What a character encoding standard encodes (code point) - A memory storage unit in a character encoding (code unit) Because of this, ironically, it is best to avoid the use of the term character entirely when discussing character encodings, and stick to the term code point. """ Taken from Mark Davis' paper: http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ > > Also, a link to the Unicode glossary would be a good thing. > > Funny how these little PEPs grow... Is that a problem ? The Unicode glossary is very useful in providing a common base for understanding the different terms and tries very hard to avoid ambiguity in meaning. This discussion is partly caused by exactly these different understanding of the terms used in the PEP. I will update the Unicode PEP to the Unicode terminology too. > >... > > Why not make the codec used by Python to convert Unicode > > literals to Unicode strings an option just like the default > > encoding ? > > > > That way we could have a version of the unicode-escape codec > > which supports surrogates and one which doesn't. > > Adding more and more knobs to tweak just adds up to Python code being > non-portable from one machine to another. Not necessarily so; I'll write a more precise spec next week. The idea is to put the codec information into the Python source code, so that it is bound to the literals that way with the result of the Python source code being portable across platforms. Currently this is just an idea and still have to check how far this can go... > > > ISSUE: Should Python allow the construction of characters > > > that do not correspond to Unicode characters? > > > Unassigned Unicode characters should obviously be legal > > > (because they could be assigned at any time). But > > > code points above TOPCHAR are guaranteed never to > > > be used by Unicode. Should we allow access to them > > > anyhow? > > > > I wouldn't count on that last point ;-) > > > > Please note that you are mixing terms: you don't construct > > characters, you construct code points. Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > unichr() does not construct code points. It constructs 1-char Python > Unicode strings...also known as Python Unicode characters. > > > ... Whether the concatenation > > of these code points makes a valid Unicode character string > > is an issue which applications and codecs have to decide. > > The concatenation of true code points would *always* make a valid > Unicode string, right? It's code units that cannot be blindly > concatenated. Both wrong :-) U+D800 is a valid Unicode code point and can occur as code unit in both narrow and wide builds. Concatenating this with e.g. U+0020 will still make it a valid Unicode code point sequence (aka Unicode object), but not a valid Unicode character string (since the U+D800 is not a character). The same is true for e.g. U+FFFF. Note that the Unicode type should happily store these values, while the codecs complain. As a result and like I said above, dealing with these problems is left to the applications which use these Unicode objects. > >... > > We should provide a new module which provides a few handy > > utilities though: functions which provide code point-, > > character-, word- and line- based indexing into Unicode > > strings. > > Okay, I'll add: > > It has been proposed that there should be a module for working > with UTF-16 strings in narrow Python builds through some sort of > abstraction that handles surrogates for you. If someone wants > to implement that, it will be another PEP. Uhm, narrow builds don't support UTF-16... it's UCS-2 which is supported (basically: store everything in range(0x10000)); the codecs can map code points to surrogates, but it is solely their responsibility and the responsibility of the application using them to take care of dealing with surrogates. Also, the module will be useful for both narrow and wide builds, since the notion of an encoded character can involve multiple code points. In that sense Unicode is always a variable length encoding for characters and that's the application field of this module. Here's the adjusted text: It has been proposed that there should be a module for working with Unicode objects using character-, word- and line- based indexing. The details of the implementation is left to another PEP. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From bckfnn at worldonline.dk Sat Jun 30 15:07:55 2001 From: bckfnn at worldonline.dk (Finn Bock) Date: Sat, 30 Jun 2001 13:07:55 GMT Subject: [Python-Dev] Corrupt Jython CVS (off topic). Message-ID: <3b3dccf6.26562024@mail.wanadoo.dk> A week ago I posted this on jython-dev, but no-one was able to give any advise on the best way to fix it. Maybe you can help. For some time now, our [jython] web CVS have not worked correctly: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ Finally I managed to track the problem to the Java2Accessibility.py,v file in the CVS repository. The "rlog" command cannot be executed on this file. From nhv at cape.com Sat Jun 30 15:16:48 2001 From: nhv at cape.com (Norman Vine) Date: Sat, 30 Jun 2001 09:16:48 -0400 Subject: [Python-Dev] RE: Threaded Cygwin Python Import Problem In-Reply-To: <20010628171715.P488@dothill.com> Message-ID: <015601c10166$eb79bb00$a300a8c0@nhv> Jason Tishler > >Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now >provides enough pthreads support so that Cygwin Python builds OOTB *and* >functions reasonably well even with threads enabled. Unfortunately, >there are still a few issues that need to be resolved. > >The one that I would like to address in this posting prevents a threaded >Cygwin Python from building the standard extension modules (without some >kind of intervention). :,( Specifically, the build would frequently >hang during the Distutils part when Cygwin Python is attempting to execvp >a gcc process. > >See the first attachment, test.py, for a minimal Python script that >exhibits the hang. See the second attachment, test.c, for a rewrite >of test.py in C. Since test.c did not hang, I was able to conclude that >this was not just a straight Cygwin problem. > >Further tracing uncovered that the hang occurs in _execvpe() (in os.py), >when the child tries to import tempfile. If I apply the third >attachment, >os.py.patch, then the hang is avoided. Hence, it appears that importing a >module (or specifically the tempfile module) in a threaded Cygwin Python >child cause a hang. > >I saw the following comment in _execvpe(): > > # Process handling (fork, wait) under BeOS (up to 5.0) > # doesn't interoperate reliably with the thread interlocking > # that happens during an import. The actual error we need > # is the same on BeOS for posix.open() et al., ENOENT. > >The above makes me think that possibly Cygwin is having a >similar problem. > >Can anyone offer suggestions on how to further debug this problem? I was experiencing the same problems as Jason with Win2k sp1 and had used the same work-around successfully. < I believe Jason is working with NT 4.0 sp 5 > Curiously after applying the Win2k sp2 I no longer need to do this and the original Python code works fine. Leading me to believe that this may be but a symptom of a another Windows mystery. Regards Norman Vine From aahz at rahul.net Sat Jun 30 16:15:24 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sat, 30 Jun 2001 07:15:24 -0700 (PDT) Subject: [Python-Dev] Support for "wide" Unicode characters In-Reply-To: <3B3DB7DC.511A3D8@ActiveState.com> from "Paul Prescod" at Jun 30, 2001 04:28:28 AM Message-ID: <20010630141524.E029999C80@waltz.rahul.net> Paul Prescod wrote: > Michael Hudson wrote: >> >>... >> >> As a Unicode Idiot (tm) can I please beg you to reconsider? There are >> so many possible meanings for "character" that I really think it's >> best to avoid the word altogether. Call Python characters "length 1 >> strings" or even "length 1 Python strings". > > Do you really feel that there are many possible meanings for the word > "Python Unicode character?" This is a PEP: I have to assume a certain > degree of common understanding. After reading Michael's and MA's arguments, I'm +1 on making the change they're requesting. But what really triggered my posting this was your use of the phrase "common understanding"; IME, Python's "explicit is better than implicit" rule is truly critical in documentation. Particularly if "character" has been deprecated in standard Unicode documentation, I think sticking to a common vocabulary makes more sense. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From Jason.Tishler at dothill.com Sat Jun 30 17:20:19 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Sat, 30 Jun 2001 11:20:19 -0400 Subject: [Python-Dev] Re: Threaded Cygwin Python Import Problem In-Reply-To: <015601c10166$eb79bb00$a300a8c0@nhv> Message-ID: <20010630112019.B626@dothill.com> Norman, On Sat, Jun 30, 2001 at 09:16:48AM -0400, Norman Vine wrote: > Jason Tishler > >The one that I would like to address in this posting prevents a threaded > >Cygwin Python from building the standard extension modules (without some > >kind of intervention). :,( Specifically, the build would frequently > >hang during the Distutils part when Cygwin Python is attempting to execvp > >a gcc process. > I was experiencing the same problems as Jason with Win2k sp1 and > had used the same work-around successfully. > < I believe Jason is working with NT 4.0 sp 5 > > > Curiously after applying the Win2k sp2 I no longer need to do this > and the original Python code works fine. > > Leading me to believe that this may be but a symptom of a another > Windows mystery. After further reflection, I feel that I have found another race/deadlock issue with the Cygwin's pthreads implementation. If I'm correct, this would explain why you experienced it intermittently with Windows 2000 SP1 and it is "gone" with SP2. Probably SP2 slows down your machine so much that the problem is not triggered. :,) I am going to reconfigure --with-pydebug and set THREADDEBUG. Hopefully, the hang will still be reproducible under these conditions. If so, then I will attempt to produce a minimal C test case for Rob to use to isolate and solve this problem. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: 732.264.8770 x235 Dot Hill Systems Corp. Fax: 732.264.8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From guido at digicool.com Sat Jun 30 20:06:35 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 30 Jun 2001 14:06:35 -0400 Subject: [Python-Dev] Corrupt Jython CVS (off topic). In-Reply-To: Your message of "Sat, 30 Jun 2001 13:07:55 GMT." <3b3dccf6.26562024@mail.wanadoo.dk> References: <3b3dccf6.26562024@mail.wanadoo.dk> Message-ID: <200106301806.f5UI6Zq30293@odiug.digicool.com> > A week ago I posted this on jython-dev, but no-one was able to give any > advise on the best way to fix it. Maybe you can help. > > > For some time now, our [jython] web CVS have not worked correctly: > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/org/python/core/ > > Finally I managed to track the problem to the Java2Accessibility.py,v > file in the CVS repository. The "rlog" command cannot be executed on > this file. > > >From the start of the Java2Accessibility.py,v: > > head 2.4; > access; > symbols > Release_2_1alpha1:2.4 > Release_2_0:2.2 > Release_2_0rc1:2.2 > Release_2_0beta2:2.2 > Release_2_0beta1:2.2 > Release_2_0alpha3:2.2 > Release_2_0alpha2:2.2 > Release_2_0alpha1:2.2 > Release_1_1rc1:2.2 > Release_1_1beta4:2.2 > Release_1_1beta3:2.2 > 2.0:1.1.0.2; > locks; strict; > > > As an experiment, I tried to remove the strange "2.0:1.1.0.2;" line from > the file and then I could run rlog on the file. Make sure to move the semicolon to the end of the previous line. > Does anyone know if/how we can fix this? > > As a last resort I suppose I can attach my hand edited version to a SF > support request where I ask them to copy my file to the CVS server. To > this day I have never been very successful whenever I have tried to edit > files in a CVS repository so I'm reluctant to do this. > > regards, > finn Yes, I think a SF request should be the way to go. I don't know how this could have happened; the "2.0" is illegal as a symbolic tag name... --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Sat Jun 30 21:09:07 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 30 Jun 2001 12:09:07 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> Message-ID: <3B3E23D3.69D591DD@ActiveState.com> Aahz Maruch wrote: > > > After reading Michael's and MA's arguments, I'm +1 on making the change > they're requesting. But what really triggered my posting this was your > use of the phrase "common understanding"; IME, Python's "explicit is > better than implicit" rule is truly critical in documentation. The spec starts of with an absolutely water tight definition of the term: "the addressable units of a Python Unicode string." I can't get more explicit than that. Expanding every usage of the word to "length 1 Python Unicode string" does not make the document more explicit any more than this is a "more explicit" equation than Ensteins: "The Energy is the mass of the object times the speed of light times two." > Particularly if "character" has been deprecated in standard Unicode > documentation, I think sticking to a common vocabulary makes more sense. "Character" is still a central term in all unicode documentation. Go to their web page and look. It's right on the front page. "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." But I'm not using it in the Unicode sense anyhow, so it doesn't matter. If ISO deprecates the use of the word integer in some standard will we stop talking about Python integers as integers? The addressable unit of a Python string is a character. If it is a Python Unicode String then it is a Python Unicode character. The term "Python Unicode character" is not going away: http://www.python.org/doc/current/tut/node5.html#SECTION005120000000000000000 I will be alot more concerned about this issue when someone reads the PEP and is actually confused by something as opposed to worrying that somebody might be confused by something. If I start using a bunch of technical terms and obfuscatory expansions, it will just dissuade people from reading the PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From DavidA at ActiveState.com Sat Jun 30 23:28:39 2001 From: DavidA at ActiveState.com (David Ascher) Date: Sat, 30 Jun 2001 14:28:39 -0700 Subject: [Python-Dev] Support for "wide" Unicode characters References: <20010630141524.E029999C80@waltz.rahul.net> <3B3E23D3.69D591DD@ActiveState.com> Message-ID: <3B3E4487.40054EAE@ActiveState.com> > "The Energy is the mass of the object times the speed of light times > two." Actually, it's "squared", not times two. At least in my universe =) --david-Unicode-idiot-much-to-Paul's-dismay-ascher